DESeq2: QQ plot of P-values
1
0
Entering edit mode
nikmehr22 • 0
@nikmehr22-13526
Last seen 7.5 years ago

Dear DESeq2 Experts,

I am new to the rna-seq data and I have started to learn DESeq2.

I have used the following example to run DESeq2

http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#count-matrix-input

Then I extracted the p-values and generated the qq plot of observed vs. expected p-value, as described in:

http://genome.sph.umich.edu/wiki/Code_Sample:_Generating_QQ_Plots_in_R

Enclosed is the plot:

My question is, why the observed p-values are inflated? is this expected with the analysis of rna-seq data?

Thanks for any comments/suggestions.

Nikmehr

deseq2 rnaseq rna-seq • 22k views
ADD COMMENT
0
Entering edit mode

Hi Michael,

I understand my first plot was based on example data; however, I see the same pattern with my actual data. Below is the graph using my data:

I wonder, what could be the reason for this inflated pattern of observed p-values?

Thanks for any comments/suggestions.

Nikmehr

 

 

ADD REPLY
0
Entering edit mode

I don't follow, are you saying that all the genes are null?

Why do you expect all the genes to be null? In many RNA-seq datasets, the null (log fold change = 0) is obviously not the case for many genes.

This motivated our focus on log fold change threshold tests and accurate estimation of log fold change in the DESeq2 paper.

ADD REPLY
0
Entering edit mode

I expect in an experiment most of the genes are null and there are only small subset of genes (dots) that significantly deviate from the solid line (matching X=Y).

or a plot like this:

https://www.nature.com/ng/journal/v48/n9/fig_tab/ng.3620_SF2.html

I am trying to identify that small subset of genes that represent the genuine associations, but at the moment I have many significant hits.

ADD REPLY
3
Entering edit mode

In RNA-seq, for a well-designed experiment there will be many differentially expressed genes. And then there is a tail of genes which likely do not have log fold change = 0, but maybe have a small effect. For this, we recommend you to use the lfcThreshold argument of results(). We discuss this in depth in the DESeq2 paper, but you can just take a look at the vignette for example usage:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#tests-of-log2-fold-change-above-or-below-a-threshold

The link you post above is a GWAS experiment, often underpowered for all but the largest effects. Most of the genomic loci are explanatory for, if anything, minuscule fractions of variance in the trait. In RNA-seq we typically have much, much larger effect sizes (in terms of population SD if you like) than in GWAS.

ADD REPLY
0
Entering edit mode

Thank you very much for the information, I found it very helpful !

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

The experiment has many differentially expressed genes, and the differences for some genes are much greater than the biological and technical variability. Take a look at a histogram of the p-values, which focuses on the p-values in a non-log-scale from 0 to 1.

ADD COMMENT

Login before adding your answer.

Traffic: 917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6