Question

p-values histogram in case of no effect

0

Entering edit mode

itamarkanter • 0

@itamarkanter-7736

Last seen 9.0 years ago

European Union

Hi all,

I have simple experiment of treated vs. untreated cells with two biological replicates.
When I contrast between the biological replicates I find many genes that differentially expressed and the p-values histogram is nicely flat expect a pick in 0.
However when I contrast the treated vs the untreated the p-value histogram have a maximum at 1 and it no longer flat (as expect by chance)
In the PCA plot, 100% of the variance (and 97% when I set ntop=Inf) correspond to the axis that relate to the difference between the biological replicates.
I'm wondering, even in case where the treatment have no effect on the cells, don't the p-value histogram should be flat?

Thanks,
Itamar

ddsATRT <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable[c(-1,-2),],
directory = directory,
design= ~ PS+treatment)%PS stand for biological replicate ("2" and "17")
ddsATRT$treatment<-relevel(ddsATRT$treatment,'UT')
ddsATRT<-DESeq(ddsATRT)
resATRT <- results( ddsATRT, contrast = c("treatment", "BAPN", "UT") ) #BAPN/UT
resATRT_PS <- results( ddsATRT, contrast = c("PS", "2", "17") ) #17/2

> as.data.frame(colData(ddsATRT))
treatment PS sizeFactor
ATRT_A2_UT UT 2 0.9171904
ATRT_A2_BAPN BAPN 2 0.9629325
ATRT_A17_UT UT 17 0.9984706
ATRT_A17_BAPN BAPN 17 1.1456955

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] DESeq2_1.8.1 RcppArmadillo_0.5.100.1.0 Rcpp_0.11.6 GenomicRanges_1.20.3 GenomeInfoDb_1.4.0
[6] IRanges_2.2.1 S4Vectors_0.6.0 BiocGenerics_0.14.0 BiocInstaller_1.18.1

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.2 XVector_0.8.0 futile.options_1.0.0 tools_3.2.0
[7] rpart_4.1-9 digest_0.6.8 RSQLite_1.0.0 annotate_1.46.0 gtable_0.1.2 lattice_0.20-31
[13] DBI_0.3.1 proto_0.3-10 gridExtra_0.9.1 genefilter_1.50.0 stringr_1.0.0 cluster_2.0.1
[19] locfit_1.5-9.1 nnet_7.3-9 grid_3.2.0 Biobase_2.28.0 AnnotationDbi_1.30.1 XML_3.98-1.1
[25] survival_2.38-1 BiocParallel_1.2.1 foreign_0.8-63 latticeExtra_0.6-26 Formula_1.2-1 geneplotter_1.46.0
[31] ggplot2_1.0.1 reshape2_1.4.1 lambda.r_1.1.7 magrittr_1.5 scales_0.2.4 Hmisc_3.16-0
[37] MASS_7.3-39 splines_3.2.0 xtable_1.7-4 colorspace_1.2-6 labeling_0.3 stringi_0.4-1
[43] acepack_1.3-3.3 munsell_0.4.2

deseq2 • 823 views

ADD COMMENT • link 9.0 years ago itamarkanter • 0

score 0 · Answer 1 · 2015-05-12

The p-value histogram should typically be flat under the null hypothesis (and if you subset out the very low count genes which produce discrete spikes).

"In the PCA plot, 100% of the variance (and 97% when I set ntop=Inf) correspond to the axis that relate to the difference between the biological replicates."

So it sounds like the effect of PS is not null? Then the p-value histogram is not expected to be flat.

score 0 · Answer 2 · 2015-05-13

The PS factor relates to the the biological replicates which look pretty different both in the DE analysis and in the PCA plot.

But the treatment factor which look that almost does not affect the samples(based on the PCA) provide a strange p-value distribution where most of the genes concentrating around p-value=1 (even if the null is true and the treatment have no effect on the samples the distribution should be flat and not to concentrate around 1).

If there is any way in this forum the attach figures? I wish to show the two p-value histograms.