edgeR: very low p-value and very high variance within the group of replicates
0
0
Entering edit mode
@gordon-smyth
Last seen 47 minutes ago
WEHI, Melbourne, Australia
Hi Valentina, Well it's not all that surprising, because 6666 is a much bigger number than 0! What you mean of course, is that the large count 6666 represents some sort of outlier process that you don't want to exclude. The quick solution, as Mark has already suggested, is simply to reduce the prior.df in the call to estimateTagwiseDisp(). If you make prior.df sufficiently small, then the offending genes will drop out of your DE list. However we have recently implemented a robust empirical Bayes procedure into edgeR and limma that I think you will find far more effective. It identifies outlier genes with very high variances while maintaining and even increasing power for the remaining non-outlier genes. To use it, you will need to switch to the quasi-likelihood routines of edgeR. You will also need to update to the Bioconductor 2.12 versions of edgeR and limma rather than the older Bioconductor release 2.11 versions that you are currently using. Assuming you have executed the code you give, you can continue: design <- model.matrix(~group) y <- estimateGLMTrendedDisp(y,design) fit <- glmQLFTest(y,design,robust=TRUE) topTags(fit,coef=2) You could also try glmQLFTest(y,design,robust=TRUE,plot=TRUE) to get a diagnostic plot. This methodology uses the quasi-likelihood approach http://www.ncbi.nlm.nih.gov/pubmed/23104842 http://www.statsci.org/smyth/pubs/QuasiSeqPreprint.pdf combined with robust empirical Bayes: http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf Note that the quasi-likelihood approach is (deliberately) somewhat more conservative than the classic edgeR exact test, but it should still have plenty of power to get good results for a 3 vs 3 experiment. Best wishes Gordon On Wed, 19 Jun 2013, Valentina Indio wrote: > I have a question about you edgeR, in order to fix a problem that may be > very simple for you. > > I'm using `edgeR` to perform differential expression analysis from RNA-seq > experiment. > > I have 6 samples of tumor cell, same tumor and same treatment: 3 patient > with good prognosis and 3 patient with bad prognosis. I want to compare > the gene expression among the two groups. > > I ran the `edgeR` pakage like follow: > > x <- read.delim("my_reads_count.txt", row.names="GENE") > group <- factor(c(1,1,1,2,2,2)) > y <- DGEList(counts=x,group=group) > y <- calcNormFactors(y) > y <- estimateCommonDisp(y) > y <- estimateTagwiseDisp(y) > et <- exactTest(y) > > I obtained a very odd results: in some cases I had a very low p-value > and FDR but looking at the raw data it is obvious that the difference > between the two groups can't be significant. > > This is an example for `my_reads_count.txt`: > > GENE sample1_1 sample1_2 sample1_3 sample2_1 sample2_2 sample2_3 > ENSG00000198842 0 3 2 2 6666 3 > ENSG00000257017 3 3 25 2002 29080 4 > > And for `my_edgeR_results.txt`: > > GENE logFC logCPM PValue FDR > ENSG00000198842 9.863211e+00 5.4879462930 5.368843e-07 1.953612e-04 > ENSG00000257017 9.500927e+00 7.7139869397 8.072384e-10 7.171947e-07 > > It seems very odd that "0 3 2" versus "2 6666 3" is significant.... > I would like that the variance within the group is considered. Can you > help me? Some suggestion? > Are there some alternative function in edgeR that is capable to fix my > problem?? -- output of sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.0.8 limma_3.12.3 tweeDEseqCountData_1.0.8 Biobase_2.16.0 BiocGenerics_0.2.0 BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] tools_2.15.0 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
limma PROcess edgeR limma PROcess edgeR • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6