setting contrast in edgeR
1
0
Entering edit mode
Alpesh Querer ▴ 220
@alpesh-querer-4895
Last seen 9.6 years ago
Hi, I`m trying to find significant DE genes using the contrast (-1,.5,.5). I want to test {cond1-mean(cond2,cond3)} I created a data-set where all conditions have exact same values, divided into 3 conditions and 3 replicates/condition. Here is the data I`m using. 150 150 150 150 150 150 150 150 150 510 510 510 510 510 510 510 510 510 550 550 550 550 550 550 550 550 550 500 500 500 500 500 500 500 500 500 501 501 501 501 501 501 501 501 501 505 505 505 505 505 505 505 505 505 50 50 50 50 50 50 50 50 50 If the run the following code, I get very significant p-values (in lrt table) and all genes are declared to be up-regulated. Am I interpreting the output incorrectly or do I need to set up the contrast in a different way? h <- read.delim("test10.txt") h <-h[rowSums(h)>0,] grp <- factor(c(1,1,1,2,2,2,3,3,3)) design <- model.matrix(~grp) h <- DGEList(counts=h,lib.size=colSums(h),group=grp) h <- calcNormFactors(h) h <- estimateGLMCommonDisp(h, design, verbose=TRUE) h <- estimateGLMTrendedDisp(h, design) h <- estimateGLMTagwiseDisp(h, design) fit <- glmFit(h, design) lrt <- glmLRT(fit,contrast=c(-1,0.5,0.5)) summary(de <- decideTestsDGE(lrt, p=0.05, adjust="BH")) [,1] -1 0 0 0 1 6 > sessionInfo() R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] epicalc_2.15.1.0 nnet_7.3-5 MASS_7.3-22 survival_2.36-14 foreign_0.8-51 edgeR_3.0.4 [7] limma_3.14.3 goseq_1.10.0 geneLenDataBase_0.99.10 BiasedUrn_1.04 GenomicFeatures_1.10.1 AnnotationDbi_1.20.3 [13] Biobase_2.18.0 GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] biomaRt_2.14.0 Biostrings_2.26.2 bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 grid_2.15.2 lattice_0.20-10 Matrix_1.0-10 [9] mgcv_1.7-22 nlme_3.1-105 parallel_2.15.2 RCurl_1.95-3 Rsamtools_1.10.2 RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 [17] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0 Thanks, Al [[alternative HTML version deleted]]
• 3.3k views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 7 months ago
Scripps Research, La Jolla, CA
Hello Alpesh, If you inspect your design matrix, you should see that it contains an intercept column (named "(Intercept)" and two contrast columns (named "grp2" and "grp3"). It looks like you are rather expecting one column per group. With the design matrix that you specify, the intercept column represents cond1, and the other two columns represent "cond2-cond1" and "cond3-cond1" respectively, Therefore your specified contrast adds up thusly: -1 * (cond1) + 0.5 * (cond2-cond1) + 0.5 * (cond3 - cond1) = (-cond1) + 0.5 * (cond2 + cond3) - cond1 = (cond2+cond3)/2 - 2 * cond1. In your artificial case where all conditions are equal, this clearly adds up to -cond1, not zero, so you will see significant calls. If you were expecting a design matrix with one column for each condition, you can get it by including a zero in the call to model.matirx, which will leave out the "(Intercept)" term, like this: design <- model.matrix(~0+grp) Then the contrast that you are using should be correct. Alternatively, if I understand you correctly, you can use the existing design matrix with the intercept column and change your contrast to c(0, .5, .5). Also, if you want "cond1 - mean(cond2, cond3)", I think you will want to swap the signs of the elements in your contrast. This won't change the p-values, but it will make the fold changes have the opposite sign. Please check your work, because I may misremember some of the details above, but I'm pretty sure the root of your problem is using a design matrix with an intercept when you expected one without an intercept. Regards, -Ryan Thompson On Fri 04 Jan 2013 02:53:25 PM PST, Alpesh Querer wrote: > Hi, > > I`m trying to find significant DE genes using the contrast (-1,.5,.5). > I want to test {cond1-mean(cond2,cond3)} > I created a data-set where all conditions have exact same values, divided > into 3 conditions and 3 replicates/condition. > > Here is the data I`m using. > > 150 150 150 150 150 150 150 150 150 > 510 510 510 510 510 510 510 510 510 > 550 550 550 550 550 550 550 550 550 > 500 500 500 500 500 500 500 500 500 > 501 501 501 501 501 501 501 501 501 > 505 505 505 505 505 505 505 505 505 > 50 50 50 50 50 50 50 50 50 > > > If the run the following code, I get very significant p-values (in lrt > table) and all genes are declared to be up-regulated. > Am I interpreting the output incorrectly or do I need to set up the > contrast in a different way? > > h <- read.delim("test10.txt") > h <-h[rowSums(h)>0,] > grp <- factor(c(1,1,1,2,2,2,3,3,3)) > design <- model.matrix(~grp) > h <- DGEList(counts=h,lib.size=colSums(h),group=grp) > h <- calcNormFactors(h) > h <- estimateGLMCommonDisp(h, design, verbose=TRUE) > h <- estimateGLMTrendedDisp(h, design) > h <- estimateGLMTagwiseDisp(h, design) > fit <- glmFit(h, design) > lrt <- glmLRT(fit,contrast=c(-1,0.5,0.5)) > summary(de <- decideTestsDGE(lrt, p=0.05, adjust="BH")) > > [,1] > -1 0 > 0 0 > 1 6 > > > > > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] epicalc_2.15.1.0 nnet_7.3-5 > MASS_7.3-22 survival_2.36-14 foreign_0.8-51 > edgeR_3.0.4 > [7] limma_3.14.3 goseq_1.10.0 > geneLenDataBase_0.99.10 BiasedUrn_1.04 GenomicFeatures_1.10.1 > AnnotationDbi_1.20.3 > [13] Biobase_2.18.0 GenomicRanges_1.10.5 > IRanges_1.16.4 BiocGenerics_0.4.0 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] biomaRt_2.14.0 Biostrings_2.26.2 bitops_1.0-5 > BSgenome_1.26.1 DBI_0.2-5 grid_2.15.2 lattice_0.20-10 > Matrix_1.0-10 > [9] mgcv_1.7-22 nlme_3.1-105 parallel_2.15.2 > RCurl_1.95-3 Rsamtools_1.10.2 RSQLite_0.11.2 rtracklayer_1.18.1 > stats4_2.15.2 > [17] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0 > > Thanks, > Al > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6