edgeR topTags output (logFC)
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
I'm just wanting to confirm my understanding of the edgeR topTags() output. I'm using a design matrix with many coefficients (76 genotypes, 2 treatments[sun, shade]). 1) In the topTags output, I am looking at logFC which represents the estimated or predicted means -- NOT coefficients, correct? 2) I have selected one parental genotype to be my reference genotype, and I am comfortable interpreting the logFC values for the other genotypes as comparisons to this genotype. I have selected one treatment as the reference (sun). When I see the logFC values for 'trtshade', I assume this is also in comparison to the reference genotype in the reference treatment. My question is about looking at 'geno2.trtshade' values - are these values in comparison to 'trtshade' or 'geno2' or the reference genotype/reference treatment values? For example: this is calling glmLRT with coef = grep("trt", colnames(Design)) to identify treatment effects Row.names logFC.trtsha logFC.genoIL1.1.trtsha logCPM LR PValue FDR 1 Solyc08g078300.2.1 0.9440264 -0.379030370 4.207443 1135.8061 1.450273e-188 1.637938e-184 2 Solyc06g060830.2.1 0.9630343 0.038988710 4.511893 1095.1525 2.539194e-180 1.433883e-176 3 Solyc06g067910.2.1 1.2468599 -0.096499016 4.795789 1040.5813 2.722511e-169 1.024935e-165 4 Solyc06g008580.2.1 1.1500683 -0.451731914 3.300982 696.1677 6.043868e-101 1.706486e-97 5 Solyc07g017600.2.1 0.5026292 0.007208458 5.209938 694.2070 1.451776e-100 3.279271e-97 6 Solyc10g083760.1.1 -0.2877568 0.045952894 7.033394 651.8629 2.232759e-92 4.202796e-89 Thanks so much! -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C [8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_0.9.3.1 reshape_0.8.4 plyr_1.8 edgeR_3.2.3 limma_3.16.4 loaded via a namespace (and not attached): [1] MASS_7.3-26 RColorBrewer_1.0-5 colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2 labeling_0.1 [9] munsell_0.4 proto_0.3-10 reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.1 -- Sent via the guest posting facility at bioconductor.org.
edgeR edgeR • 1.9k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia
Dear Susan, > Date: Mon, 26 Aug 2013 18:28:03 -0700 (PDT) > From: "Susan [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, smbush at ucdavis.edu > Subject: [BioC] edgeR topTags output (logFC) > > > I'm just wanting to confirm my understanding of the edgeR topTags() > output. I'm using a design matrix with many coefficients (76 genotypes, > 2 treatments[sun, shade]). > > 1) In the topTags output, I am looking at logFC which represents the > estimated or predicted means -- NOT coefficients, correct? Incorrect. As the abbreviation suggests, logFC gives you log-fold-changes. logFC are indeed coefficients, not group means. > 2) I have selected one parental genotype to be my reference genotype, > and I am comfortable interpreting the logFC values for the other > genotypes as comparisons to this genotype. I have selected one > treatment as the reference (sun). When I see the logFC values for > 'trtshade', I assume this is also in comparison to the reference > genotype in the reference treatment. Yes, assuming that trt is what you have called the treatment factor, then trtshade would the coefficient estimating the log-fold-change between the shade and sun treatments. > My question is about looking at 'geno2.trtshade' values - are these > values in comparison to 'trtshade' or 'geno2' or the reference > genotype/reference treatment values? Without knowing what model you have fitted, or which coefficients you have tested for, it is impossible to tell. Best wishes Gordon > For example: this is calling glmLRT with coef = grep("trt", colnames(Design)) to identify treatment effects > > Row.names logFC.trtsha logFC.genoIL1.1.trtsha logCPM LR PValue FDR > 1 Solyc08g078300.2.1 0.9440264 -0.379030370 4.207443 1135.8061 1.450273e-188 1.637938e-184 > 2 Solyc06g060830.2.1 0.9630343 0.038988710 4.511893 1095.1525 2.539194e-180 1.433883e-176 > 3 Solyc06g067910.2.1 1.2468599 -0.096499016 4.795789 1040.5813 2.722511e-169 1.024935e-165 > 4 Solyc06g008580.2.1 1.1500683 -0.451731914 3.300982 696.1677 6.043868e-101 1.706486e-97 > 5 Solyc07g017600.2.1 0.5026292 0.007208458 5.209938 694.2070 1.451776e-100 3.279271e-97 > 6 Solyc10g083760.1.1 -0.2877568 0.045952894 7.033394 651.8629 2.232759e-92 4.202796e-89 > > Thanks so much! > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C > [8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ggplot2_0.9.3.1 reshape_0.8.4 plyr_1.8 edgeR_3.2.3 limma_3.16.4 > > loaded via a namespace (and not attached): > [1] MASS_7.3-26 RColorBrewer_1.0-5 colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2 labeling_0.1 > [9] munsell_0.4 proto_0.3-10 reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.1 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Okay, sorry. Here's more information: The model design I am using is *design <- model.matrix( ~ geno * trt)*, where "geno" has 76 values and "trt" has 2 (sun, shade) Using glmLRT, I am testing "trt", so * lrt <- glmLRT(dge, fit , coef = grep("trt", colnames(design))* The logFC values returned by topTags, then, are for columns *trtshade, geno2.trtshade, geno3.trtshade *, etc. I also test "geno", where the logFC values are for columns *geno2, geno3, geno2.trtshade, geno3.trtshade*, etc. Ultimately I test the interaction, so logFC values are for columns *geno2.trtshade, geno3.trtshade, geno4.trtshade*, etc. I am trying to figure out what these logFC coefficients are compared to, in each case. If I am interested in specific genotype effects, will I get a more appropriate answer by testing "geno2" individually, rather than "geno" all at once? Thanks - I do appreciate it. ***** Susan Bush, Ph. D. Postdoctoral Scholar Maloof Lab, Dept of Plant Biology University of California, Davis 1 Shields Avenue Davis, CA 95616 email: smbush@ucdavis.edu lab: 530-752-8086 cell: 507-828-8958 On Tue, Aug 27, 2013 at 5:05 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Susan, > > Date: Mon, 26 Aug 2013 18:28:03 -0700 (PDT) >> From: "Susan [guest]" <guest@bioconductor.org> >> To: bioconductor@r-project.org, smbush@ucdavis.edu >> Subject: [BioC] edgeR topTags output (logFC) >> >> >> I'm just wanting to confirm my understanding of the edgeR topTags() >> output. I'm using a design matrix with many coefficients (76 genotypes, 2 >> treatments[sun, shade]). >> >> 1) In the topTags output, I am looking at logFC which represents the >> estimated or predicted means -- NOT coefficients, correct? >> > > Incorrect. As the abbreviation suggests, logFC gives you > log-fold-changes. logFC are indeed coefficients, not group means. > > 2) I have selected one parental genotype to be my reference genotype, and >> I am comfortable interpreting the logFC values for the other genotypes as >> comparisons to this genotype. I have selected one treatment as the >> reference (sun). When I see the logFC values for 'trtshade', I assume this >> is also in comparison to the reference genotype in the reference treatment. >> > > Yes, assuming that trt is what you have called the treatment factor, then > trtshade would the coefficient estimating the log-fold-change between the > shade and sun treatments. > > My question is about looking at 'geno2.trtshade' values - are these >> values in comparison to 'trtshade' or 'geno2' or the reference >> genotype/reference treatment values? >> > > Without knowing what model you have fitted, or which coefficients you have > tested for, it is impossible to tell. > > Best wishes > Gordon > > For example: this is calling glmLRT with coef = grep("trt", >> colnames(Design)) to identify treatment effects >> >> Row.names logFC.trtsha logFC.genoIL1.1.trtsha logCPM >> LR PValue FDR >> 1 Solyc08g078300.2.1 0.9440264 -0.379030370 4.207443 >> 1135.8061 1.450273e-188 1.637938e-184 >> 2 Solyc06g060830.2.1 0.9630343 0.038988710 4.511893 >> 1095.1525 2.539194e-180 1.433883e-176 >> 3 Solyc06g067910.2.1 1.2468599 -0.096499016 4.795789 >> 1040.5813 2.722511e-169 1.024935e-165 >> 4 Solyc06g008580.2.1 1.1500683 -0.451731914 3.300982 >> 696.1677 6.043868e-101 1.706486e-97 >> 5 Solyc07g017600.2.1 0.5026292 0.007208458 5.209938 >> 694.2070 1.451776e-100 3.279271e-97 >> 6 Solyc10g083760.1.1 -0.2877568 0.045952894 7.033394 >> 651.8629 2.232759e-92 4.202796e-89 >> >> Thanks so much! >> >> >> -- output of sessionInfo(): >> >> sessionInfo() >>> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C >> LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C >> [8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C >> LC_MEASUREMENT=C LC_IDENTIFICATION=C >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] ggplot2_0.9.3.1 reshape_0.8.4 plyr_1.8 edgeR_3.2.3 >> limma_3.16.4 >> >> loaded via a namespace (and not attached): >> [1] MASS_7.3-26 RColorBrewer_1.0-5 colorspace_1.2-2 >> dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2 >> labeling_0.1 >> [9] munsell_0.4 proto_0.3-10 reshape2_1.2.2 scales_0.2.3 >> stringr_0.6.2 tools_3.0.1 >> > > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
Hi Susan, Thanks, I see now. Your experiment has 152 genotype x treatment combinations, and all of them occur in your experiment. I strongly suggest that, instead of using an interaction model formula, you simply define a factor that takes 152 levels and then make comparisons between the levels. This fits the same model as you have fitted, but is parametrized in a way that is more interpretable and useful. See Section 3.3.1 of the edgeR User's Guide. This is almost always the best approach because you know exactly what each comparison means. I am not a fan of interaction model formulas like geno*trt for genomic experiments. I don't think that the coefficients have useful meanings for most experiments. In particular, the linear effect terms (like trtshade) cannot be interpreted because of the present of the interactions. On Tue, 27 Aug 2013, Susan Bush wrote: > Okay, sorry. Here's more information: > > The model design I am using is *design <- model.matrix( ~ geno * trt)*, > where "geno" has 76 values and "trt" has 2 (sun, shade) > > Using glmLRT, I am testing "trt", so * lrt <- glmLRT(dge, fit , coef > = grep("trt", colnames(design))* > > The logFC values returned by topTags, then, are for columns > *trtshade, geno2.trtshade, geno3.trtshade > *, etc. This will give you a test of whether any of the genotypes show a treatment effect. This test is on 76 degrees of freedom, so there should be 76 columns for this test. I could explain what the individual coefficients mean mathematically, but I don't think that they are useful for you. > I also test "geno", where the logFC values are for columns *geno2, > geno3, geno2.trtshade, geno3.trtshade*, etc. This is a test of whether there any differences between any of the genotypes in either of the treatment conditions (on 150 degrees of freedom). > Ultimately I test the interaction, so logFC values are for columns > *geno2.trtshade, > geno3.trtshade, geno4.trtshade*, etc. All the genotypes are versus geno1, and treatment is versus sun, so the coefficient "geno2.trtshade" measures how much larger the logFC for shade vs sun is for geno2 as compared to geno1. In other words, the coefficient means: geno2.trtshade = (geno2.shade - geno2.sun) - (geno1.shade - geno1.sun) Best wishes Gordon > I am trying to figure out what these logFC coefficients are compared to, in > each case. If I am interested in specific genotype effects, will I get a > more appropriate answer by testing "geno2" individually, rather than "geno" > all at once? > > Thanks - I do appreciate it. > > > > > > > > ***** > Susan Bush, Ph. D. > Postdoctoral Scholar > Maloof Lab, Dept of Plant Biology > University of California, Davis > 1 Shields Avenue > Davis, CA 95616 > > email: smbush at ucdavis.edu > lab: 530-752-8086 > cell: 507-828-8958 > > > On Tue, Aug 27, 2013 at 5:05 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Susan, >> >> Date: Mon, 26 Aug 2013 18:28:03 -0700 (PDT) >>> From: "Susan [guest]" <guest at="" bioconductor.org=""> >>> To: bioconductor at r-project.org, smbush at ucdavis.edu >>> Subject: [BioC] edgeR topTags output (logFC) >>> >>> >>> I'm just wanting to confirm my understanding of the edgeR topTags() >>> output. I'm using a design matrix with many coefficients (76 genotypes, 2 >>> treatments[sun, shade]). >>> >>> 1) In the topTags output, I am looking at logFC which represents the >>> estimated or predicted means -- NOT coefficients, correct? >>> >> >> Incorrect. As the abbreviation suggests, logFC gives you >> log-fold-changes. logFC are indeed coefficients, not group means. >> >> 2) I have selected one parental genotype to be my reference genotype, and >>> I am comfortable interpreting the logFC values for the other genotypes as >>> comparisons to this genotype. I have selected one treatment as the >>> reference (sun). When I see the logFC values for 'trtshade', I assume this >>> is also in comparison to the reference genotype in the reference treatment. >>> >> >> Yes, assuming that trt is what you have called the treatment factor, then >> trtshade would the coefficient estimating the log-fold-change between the >> shade and sun treatments. >> >> My question is about looking at 'geno2.trtshade' values - are these >>> values in comparison to 'trtshade' or 'geno2' or the reference >>> genotype/reference treatment values? >>> >> >> Without knowing what model you have fitted, or which coefficients you have >> tested for, it is impossible to tell. >> >> Best wishes >> Gordon >> >> For example: this is calling glmLRT with coef = grep("trt", >>> colnames(Design)) to identify treatment effects >>> >>> Row.names logFC.trtsha logFC.genoIL1.1.trtsha logCPM >>> LR PValue FDR >>> 1 Solyc08g078300.2.1 0.9440264 -0.379030370 4.207443 >>> 1135.8061 1.450273e-188 1.637938e-184 >>> 2 Solyc06g060830.2.1 0.9630343 0.038988710 4.511893 >>> 1095.1525 2.539194e-180 1.433883e-176 >>> 3 Solyc06g067910.2.1 1.2468599 -0.096499016 4.795789 >>> 1040.5813 2.722511e-169 1.024935e-165 >>> 4 Solyc06g008580.2.1 1.1500683 -0.451731914 3.300982 >>> 696.1677 6.043868e-101 1.706486e-97 >>> 5 Solyc07g017600.2.1 0.5026292 0.007208458 5.209938 >>> 694.2070 1.451776e-100 3.279271e-97 >>> 6 Solyc10g083760.1.1 -0.2877568 0.045952894 7.033394 >>> 651.8629 2.232759e-92 4.202796e-89 >>> >>> Thanks so much! >>> >>> >>> -- output of sessionInfo(): >>> >>> sessionInfo() >>>> >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C >>> LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C >>> [8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C >>> LC_MEASUREMENT=C LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] splines stats graphics grDevices utils datasets methods >>> base >>> >>> other attached packages: >>> [1] ggplot2_0.9.3.1 reshape_0.8.4 plyr_1.8 edgeR_3.2.3 >>> limma_3.16.4 >>> >>> loaded via a namespace (and not attached): >>> [1] MASS_7.3-26 RColorBrewer_1.0-5 colorspace_1.2-2 >>> dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2 >>> labeling_0.1 >>> [9] munsell_0.4 proto_0.3-10 reshape2_1.2.2 scales_0.2.3 >>> stringr_0.6.2 tools_3.0.1 >>> >> >> >> ______________________________**______________________________**___ _______ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________**______________________________**___ _______ >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY

Login before adding your answer.

Traffic: 708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6