Question

DESeq2: QQ plot of P-values

0

Entering edit mode

nikmehr22 • 0

@nikmehr22-13526

Last seen 6.8 years ago

Dear DESeq2 Experts,

I am new to the rna-seq data and I have started to learn DESeq2.

I have used the following example to run DESeq2

http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#count-matrix-input

Then I extracted the p-values and generated the qq plot of observed vs. expected p-value, as described in:

http://genome.sph.umich.edu/wiki/Code_Sample:_Generating_QQ_Plots_in_R

Enclosed is the plot:

My question is, why the observed p-values are inflated? is this expected with the analysis of rna-seq data?

Thanks for any comments/suggestions.

Nikmehr

deseq2 rnaseq rna-seq • 20k views

ADD COMMENT • link 6.7 years ago nikmehr22 • 0

0

Entering edit mode

Hi Michael,

I understand my first plot was based on example data; however, I see the same pattern with my actual data. Below is the graph using my data:

I wonder, what could be the reason for this inflated pattern of observed p-values?

Thanks for any comments/suggestions.

Nikmehr

ADD REPLY • link 6.7 years ago nikmehr22 • 0

0

Entering edit mode

I don't follow, are you saying that all the genes are null?

Why do you expect all the genes to be null? In many RNA-seq datasets, the null (log fold change = 0) is obviously not the case for many genes.

This motivated our focus on log fold change threshold tests and accurate estimation of log fold change in the DESeq2 paper.

ADD REPLY • link 6.7 years ago Michael Love 41k

0

Entering edit mode

I expect in an experiment most of the genes are null and there are only small subset of genes (dots) that significantly deviate from the solid line (matching X=Y).

or a plot like this:

https://www.nature.com/ng/journal/v48/n9/fig_tab/ng.3620_SF2.html

I am trying to identify that small subset of genes that represent the genuine associations, but at the moment I have many significant hits.

ADD REPLY • link 6.7 years ago nikmehr22 • 0

3

Entering edit mode

In RNA-seq, for a well-designed experiment there will be many differentially expressed genes. And then there is a tail of genes which likely do not have log fold change = 0, but maybe have a small effect. For this, we recommend you to use the lfcThreshold argument of results(). We discuss this in depth in the DESeq2 paper, but you can just take a look at the vignette for example usage:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#tests-of-log2-fold-change-above-or-below-a-threshold

The link you post above is a GWAS experiment, often underpowered for all but the largest effects. Most of the genomic loci are explanatory for, if anything, minuscule fractions of variance in the trait. In RNA-seq we typically have much, much larger effect sizes (in terms of population SD if you like) than in GWAS.

ADD REPLY • link 6.7 years ago Michael Love 41k

0

Entering edit mode

Thank you very much for the information, I found it very helpful !

ADD REPLY • link 6.7 years ago nikmehr22 • 0

score 0 · Answer 1 · 2017-08-02

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 3 hours ago

United States

The experiment has many differentially expressed genes, and the differences for some genes are much greater than the biological and technical variability. Take a look at a histogram of the p-values, which focuses on the p-values in a non-log-scale from 0 to 1.

ADD COMMENT • link 6.7 years ago Michael Love 41k