edgeR - multiple comparisions

0

Entering edit mode

Sridhara Gupta Kunjeti ▴ 320

@sridhara-gupta-kunjeti-4449

Last seen 9.1 years ago

United States

Hello, I have used edgeR for DGE analysis and I have few questions regarding the model and comparisions. 1) What kind of statistical model is taken into account to analyze treatment structure and conduct analysis of variance? 2) How does the edgeR correct the multiple comparisions? 3) I am assuming that the calculated p-values in the output after performing the tagwiseDispersion are after adjusting for multiple testing. Please correct me if I am wrong? If so, what kind of multiple testing is taken into account? The arguments that I passed are as follows: >raw.data <- read.delim("c33_con3.txt") >raw.data.2a <- read.delim ("2c33_con3.txt") >d2a <- raw.data.2a[, 2:5] >rownames(d2a) <- raw.data.2a[,1] >group2a <- c(rep("c33", 2), rep("con3", 2)) >d2a <- DGEList(counts = d2a, group = group2a) >d2a <- estimateCommonDisp(d2a) >d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) >prior.n2a <- estimateSmoothing(d2a) >de2a.tgw <- exactTest(d2a, common.disp = FALSE) > de2a.tgw An object of class "DGEExact" $table logConc logFC p.value MGG_00005 | Mo hypothetical protein (1014 nt) -16.67772 0.05248378 0.9394668 MGG_00015 | Mo catechol O-methyltransferase (1102 nt) -14.68066 0.36189877 0.2786389 MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) -13.50677 0.32379041 0.3759259 MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) -14.28686 -0.35747999 0.3040601 MGG_00018 | Mo integral membrane protein (2504 nt) -14.56791 0.45187243 0.1701996 11452 more rows ... $comparison [1] "c33" "con3" $genes NULL > sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.0.3 loaded via a namespace (and not attached): [1] limma_3.6.9 tools_2.12.1 I would really appreciate your comments or suggestions. Many thanks! Sridhara -- Sridhara G Kunjeti PhD Candidate University of Delaware Department of Plant and Soil Science email- sridhara@udel.edu Ph: 832-566-0011 [[alternative HTML version deleted]]

edgeR edgeR • 1.7k views

ADD COMMENT • link updated 12.9 years ago by Mark Robinson ★ 1.1k • written 12.9 years ago by Sridhara Gupta Kunjeti ▴ 320

0

Entering edit mode

Mark Robinson ★ 1.1k

@mark-robinson-2171

Last seen 9.6 years ago

Hi Sridhara, If you haven't already, you might have a solid read of the edgeR user's guide, it has answers to some of your questions. On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote: > Hello, > I have used edgeR for DGE analysis and I have few questions regarding the > model and comparisions. > > 1) What kind of statistical model is taken into account to analyze treatment > structure and conduct analysis of variance? For the example you show below (a 2-group comparison), the 'Negative binomial models' Section in the user's guide covers this. Of course, the package has facility for more complicated "treatment structure" through generalized linear models (See the 'Experiment with multiple factors' Section, for example). > 2) How does the edgeR correct the multiple comparisions? See ?topTags; its also mentioned in the user's guide. ---- topTags(object, n=10, adjust.method="BH", sort.by="p.value") ... adjust.method: character string stating the method used to adjust p-values for multiple testing, passed on to ?p.adjust? ... ---- > 3) I am assuming that the calculated p-values in the output after > performing the tagwiseDispersion are after adjusting for multiple testing. > Please correct me if I am wrong? If so, what kind of multiple testing is > taken into account? exactTest() doesn't do the multiple testing correction, but topTags() does. HTH, Mark > > The arguments that I passed are as follows: >> raw.data <- read.delim("c33_con3.txt") >> raw.data.2a <- read.delim ("2c33_con3.txt") >> d2a <- raw.data.2a[, 2:5] >> rownames(d2a) <- raw.data.2a[,1] >> group2a <- c(rep("c33", 2), rep("con3", 2)) >> d2a <- DGEList(counts = d2a, group = group2a) >> d2a <- estimateCommonDisp(d2a) >> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) >> prior.n2a <- estimateSmoothing(d2a) >> de2a.tgw <- exactTest(d2a, common.disp = FALSE) >> de2a.tgw > An object of class "DGEExact" > $table > > logConc logFC p.value > MGG_00005 | Mo hypothetical protein (1014 nt) > -16.67772 0.05248378 0.9394668 > MGG_00015 | Mo catechol O-methyltransferase (1102 nt) > -14.68066 0.36189877 0.2786389 > MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) > -13.50677 0.32379041 0.3759259 > MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) -14.28686 > -0.35747999 0.3040601 > MGG_00018 | Mo integral membrane protein (2504 nt) > -14.56791 0.45187243 0.1701996 > 11452 more rows ... > $comparison > [1] "c33" "con3" > $genes > NULL > > >> sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: i386-pc-mingw32/i386 (32-bit) > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] edgeR_2.0.3 > loaded via a namespace (and not attached): > [1] limma_3.6.9 tools_2.12.1 > > I would really appreciate your comments or suggestions. > > Many thanks! > > Sridhara > > -- > Sridhara G Kunjeti > PhD Candidate > University of Delaware > Department of Plant and Soil Science > email- sridhara at udel.edu > Ph: 832-566-0011 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobinson at wehi.edu.au e: m.robinson at garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 12.9 years ago Mark Robinson ★ 1.1k

0

Entering edit mode

Hello Mark, Thanks for your email. I have one quick question. Is it possible to export all the 10,427 genes after passing exactTest()? what argument do I need to use to do that? Basically I wanted the complete list of genes with the following info: > topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value") Comparison of groups: T6-P18 logConc logFC PValue FDR PITG_08841 | Pi conserved hypothetical protein (129 nt) -28.79463 42.442850 1.032735e-11 1.076833e-07 PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt) -12.93992 9.148329 1.288618e-09 6.193586e-06 If I use the following argument, it is showing an error message. fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by ="p.value") write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA, qmethod="double") Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, : arguments imply differing number of rows: 10427, 1, 2 If I do the same with n = 10426, it is executinig without any error. Except that I am missing one row. Any suggetions on how to export all the columns for all the rows will be a great help. Many thanks! Sridhara On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <mrobinson@wehi.edu.au>wrote: > Hi Sridhara, > > If you haven't already, you might have a solid read of the edgeR user's > guide, it has answers to some of your questions. > > > On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote: > > > Hello, > > I have used edgeR for DGE analysis and I have few questions regarding the > > model and comparisions. > > > > 1) What kind of statistical model is taken into account to analyze > treatment > > structure and conduct analysis of variance? > > For the example you show below (a 2-group comparison), the 'Negative > binomial models' Section in the user's guide covers this. Of course, the > package has facility for more complicated "treatment structure" through > generalized linear models (See the 'Experiment with multiple factors' > Section, for example). > > > > 2) How does the edgeR correct the multiple comparisions? > > See ?topTags; its also mentioned in the user's guide. > > ---- > topTags(object, n=10, adjust.method="BH", sort.by="p.value") > ... > adjust.method: character string stating the method used to adjust > p-values for multiple testing, passed on to p.adjust > ... > ---- > > > > 3) I am assuming that the calculated p-values in the output after > > performing the tagwiseDispersion are after adjusting for multiple > testing. > > Please correct me if I am wrong? If so, what kind of multiple testing is > > taken into account? > > exactTest() doesn't do the multiple testing correction, but topTags() does. > > HTH, > Mark > > > > > > The arguments that I passed are as follows: > >> raw.data <- read.delim("c33_con3.txt") > >> raw.data.2a <- read.delim ("2c33_con3.txt") > >> d2a <- raw.data.2a[, 2:5] > >> rownames(d2a) <- raw.data.2a[,1] > >> group2a <- c(rep("c33", 2), rep("con3", 2)) > >> d2a <- DGEList(counts = d2a, group = group2a) > >> d2a <- estimateCommonDisp(d2a) > >> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) > >> prior.n2a <- estimateSmoothing(d2a) > >> de2a.tgw <- exactTest(d2a, common.disp = FALSE) > >> de2a.tgw > > An object of class "DGEExact" > > $table > > > > logConc logFC p.value > > MGG_00005 | Mo hypothetical protein (1014 nt) > > -16.67772 0.05248378 0.9394668 > > MGG_00015 | Mo catechol O-methyltransferase (1102 nt) > > -14.68066 0.36189877 0.2786389 > > MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) > > -13.50677 0.32379041 0.3759259 > > MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) > -14.28686 > > -0.35747999 0.3040601 > > MGG_00018 | Mo integral membrane protein (2504 nt) > > -14.56791 0.45187243 0.1701996 > > 11452 more rows ... > > $comparison > > [1] "c33" "con3" > > $genes > > NULL > > > > > >> sessionInfo() > > R version 2.12.1 (2010-12-16) > > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > > States.1252 LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C LC_TIME=English_United > > States.1252 > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > > [1] edgeR_2.0.3 > > loaded via a namespace (and not attached): > > [1] limma_3.6.9 tools_2.12.1 > > > > I would really appreciate your comments or suggestions. > > > > Many thanks! > > > > Sridhara > > > > -- > > Sridhara G Kunjeti > > PhD Candidate > > University of Delaware > > Department of Plant and Soil Science > > email- sridhara@udel.edu > > Ph: 832-566-0011 > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: mrobinson@wehi.edu.au > e: m.robinson@garvan.org.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:20}}

ADD REPLY • link 12.9 years ago Sridhara Gupta Kunjeti ▴ 320

0

Entering edit mode

Hi Sridhara, The problem here is that the output of topTags() (your 'fdr06') is not a data.frame or matrix, which is what write.table() works best on. Instead, try: fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH", sort.by="p.value") write.table(fdr06$table, file = "FDR06.csv", sep=",") Cheers, Mark On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote: > Hello Mark, > Thanks for your email. I have one quick question. Is it possible to export all the 10,427 genes after passing exactTest()? what argument do I need to use to do that? Basically I wanted the complete list of genes with the following info: > > topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value") > Comparison of groups: T6-P18 > logConc logFC PValue FDR > PITG_08841 | Pi conserved hypothetical protein (129 nt) -28.79463 42.442850 1.032735e-11 1.076833e-07 > PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt) -12.93992 9.148329 1.288618e-09 6.193586e-06 > > If I use the following argument, it is showing an error message. > > fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by="p.value") > write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA, qmethod="double") > Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, : arguments imply differing number of rows: 10427, 1, 2 > > If I do the same with n = 10426, it is executinig without any error. Except that I am missing one row. > > Any suggetions on how to export all the columns for all the rows will be a great help. > > Many thanks! > Sridhara > > > > > On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <mrobinson at="" wehi.edu.au=""> wrote: > Hi Sridhara, > > If you haven't already, you might have a solid read of the edgeR user's guide, it has answers to some of your questions. > > > On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote: > > > Hello, > > I have used edgeR for DGE analysis and I have few questions regarding the > > model and comparisions. > > > > 1) What kind of statistical model is taken into account to analyze treatment > > structure and conduct analysis of variance? > > For the example you show below (a 2-group comparison), the 'Negative binomial models' Section in the user's guide covers this. Of course, the package has facility for more complicated "treatment structure" through generalized linear models (See the 'Experiment with multiple factors' Section, for example). > > > > 2) How does the edgeR correct the multiple comparisions? > > See ?topTags; its also mentioned in the user's guide. > > ---- > topTags(object, n=10, adjust.method="BH", sort.by="p.value") > ... > adjust.method: character string stating the method used to adjust > p-values for multiple testing, passed on to ?p.adjust? > ... > ---- > > > > 3) I am assuming that the calculated p-values in the output after > > performing the tagwiseDispersion are after adjusting for multiple testing. > > Please correct me if I am wrong? If so, what kind of multiple testing is > > taken into account? > > exactTest() doesn't do the multiple testing correction, but topTags() does. > > HTH, > Mark > > > > > > The arguments that I passed are as follows: > >> raw.data <- read.delim("c33_con3.txt") > >> raw.data.2a <- read.delim ("2c33_con3.txt") > >> d2a <- raw.data.2a[, 2:5] > >> rownames(d2a) <- raw.data.2a[,1] > >> group2a <- c(rep("c33", 2), rep("con3", 2)) > >> d2a <- DGEList(counts = d2a, group = group2a) > >> d2a <- estimateCommonDisp(d2a) > >> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) > >> prior.n2a <- estimateSmoothing(d2a) > >> de2a.tgw <- exactTest(d2a, common.disp = FALSE) > >> de2a.tgw > > An object of class "DGEExact" > > $table > > > > logConc logFC p.value > > MGG_00005 | Mo hypothetical protein (1014 nt) > > -16.67772 0.05248378 0.9394668 > > MGG_00015 | Mo catechol O-methyltransferase (1102 nt) > > -14.68066 0.36189877 0.2786389 > > MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) > > -13.50677 0.32379041 0.3759259 > > MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) -14.28686 > > -0.35747999 0.3040601 > > MGG_00018 | Mo integral membrane protein (2504 nt) > > -14.56791 0.45187243 0.1701996 > > 11452 more rows ... > > $comparison > > [1] "c33" "con3" > > $genes > > NULL > > > > > >> sessionInfo() > > R version 2.12.1 (2010-12-16) > > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > > States.1252 LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C LC_TIME=English_United > > States.1252 > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > > [1] edgeR_2.0.3 > > loaded via a namespace (and not attached): > > [1] limma_3.6.9 tools_2.12.1 > > > > I would really appreciate your comments or suggestions. > > > > Many thanks! > > > > Sridhara > > > > -- > > Sridhara G Kunjeti > > PhD Candidate > > University of Delaware > > Department of Plant and Soil Science > > email- sridhara at udel.edu > > Ph: 832-566-0011 > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: mrobinson at wehi.edu.au > e: m.robinson at garvan.org.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ > > > ______________________________________________________________________ > The information in this email is confidential and intended solely for the addressee. > You must not disclose, forward, print or use it without the permission of the sender. > ______________________________________________________________________ > > > > -- > Sridhara G Kunjeti > PhD Candidate > University of Delaware > Department of Plant and Soil Science > email- sridhara at udel.edu > Ph: 832-566-0011 ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobinson at wehi.edu.au e: m.robinson at garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender.

ADD REPLY • link 12.9 years ago Mark Robinson ★ 1.1k

0

Entering edit mode

Hello Mark, Thank you very much for you email. It greatly helped me to export the FDR, p-value, logFC and logConc into csv format. I have one real quick question, this is more of statistical question. After exporting the FDR, I started analyzing pair by pair. In the below example, what I noticed is when comparing the group A - B, I got p-value and FDR that make sense. But, when I checked for the group A- group C comparision. all the 10,000 genes had FDR and p-value of 1, then I counted the number of genes that had "0" in both the groups for both the replicates, it turned out to be about 400 genes. So, my question is why the other genes (9600) had FDR and p-value of "1". Do you think the 400 genes with "0" counts would affect the analysis? Do I need to delete these 400 genes for the pair (gp A - gp C) comparison and then run and edgeR analysis individually? groupA Group B Group C Genes A1 A2 B1 B2 C1 C2 1 0 0 11 12 0 0 2 120 102 45 38 30 40 Any help or comments will be appreciated. Many thanks! Sridhara On Sun, May 22, 2011 at 4:24 PM, Mark Robinson <mrobinson@wehi.edu.au>wrote: > Hi Sridhara, > > The problem here is that the output of topTags() (your 'fdr06') is not a > data.frame or matrix, which is what write.table() works best on. Instead, > try: > > fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH", > sort.by="p.value") > write.table(fdr06$table, file = "FDR06.csv", sep=",") > > Cheers, > Mark > > On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote: > > > Hello Mark, > > Thanks for your email. I have one quick question. Is it possible to > export all the 10,427 genes after passing exactTest()? what argument do I > need to use to do that? Basically I wanted the complete list of genes with > the following info: > > > topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value") > > Comparison of groups: T6-P18 > > > logConc logFC PValue FDR > > PITG_08841 | Pi conserved hypothetical protein (129 nt) > -28.79463 42.442850 1.032735e-11 1.076833e-07 > > PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt) > -12.93992 9.148329 1.288618e-09 6.193586e-06 > > > > If I use the following argument, it is showing an error message. > > > > fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by > ="p.value") > > write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA, > qmethod="double") > > Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, : > arguments imply differing number of rows: 10427, 1, 2 > > > > If I do the same with n = 10426, it is executinig without any error. > Except that I am missing one row. > > > > Any suggetions on how to export all the columns for all the rows will be > a great help. > > > > Many thanks! > > Sridhara > > > > > > > > > > On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <mrobinson@wehi.edu.au> > wrote: > > Hi Sridhara, > > > > If you haven't already, you might have a solid read of the edgeR user's > guide, it has answers to some of your questions. > > > > > > On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote: > > > > > Hello, > > > I have used edgeR for DGE analysis and I have few questions regarding > the > > > model and comparisions. > > > > > > 1) What kind of statistical model is taken into account to analyze > treatment > > > structure and conduct analysis of variance? > > > > For the example you show below (a 2-group comparison), the 'Negative > binomial models' Section in the user's guide covers this. Of course, the > package has facility for more complicated "treatment structure" through > generalized linear models (See the 'Experiment with multiple factors' > Section, for example). > > > > > > > 2) How does the edgeR correct the multiple comparisions? > > > > See ?topTags; its also mentioned in the user's guide. > > > > ---- > > topTags(object, n=10, adjust.method="BH", sort.by="p.value") > > ... > > adjust.method: character string stating the method used to adjust > > p-values for multiple testing, passed on to p.adjust > > ... > > ---- > > > > > > > 3) I am assuming that the calculated p-values in the output after > > > performing the tagwiseDispersion are after adjusting for multiple > testing. > > > Please correct me if I am wrong? If so, what kind of multiple testing > is > > > taken into account? > > > > exactTest() doesn't do the multiple testing correction, but topTags() > does. > > > > HTH, > > Mark > > > > > > > > > > The arguments that I passed are as follows: > > >> raw.data <- read.delim("c33_con3.txt") > > >> raw.data.2a <- read.delim ("2c33_con3.txt") > > >> d2a <- raw.data.2a[, 2:5] > > >> rownames(d2a) <- raw.data.2a[,1] > > >> group2a <- c(rep("c33", 2), rep("con3", 2)) > > >> d2a <- DGEList(counts = d2a, group = group2a) > > >> d2a <- estimateCommonDisp(d2a) > > >> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) > > >> prior.n2a <- estimateSmoothing(d2a) > > >> de2a.tgw <- exactTest(d2a, common.disp = FALSE) > > >> de2a.tgw > > > An object of class "DGEExact" > > > $table > > > > > > logConc logFC p.value > > > MGG_00005 | Mo hypothetical protein (1014 nt) > > > -16.67772 0.05248378 0.9394668 > > > MGG_00015 | Mo catechol O-methyltransferase (1102 nt) > > > -14.68066 0.36189877 0.2786389 > > > MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) > > > -13.50677 0.32379041 0.3759259 > > > MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) > -14.28686 > > > -0.35747999 0.3040601 > > > MGG_00018 | Mo integral membrane protein (2504 nt) > > > -14.56791 0.45187243 0.1701996 > > > 11452 more rows ... > > > $comparison > > > [1] "c33" "con3" > > > $genes > > > NULL > > > > > > > > >> sessionInfo() > > > R version 2.12.1 (2010-12-16) > > > Platform: i386-pc-mingw32/i386 (32-bit) > > > locale: > > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > > > States.1252 LC_MONETARY=English_United States.1252 > > > [4] LC_NUMERIC=C LC_TIME=English_United > > > States.1252 > > > attached base packages: > > > [1] stats graphics grDevices utils datasets methods base > > > other attached packages: > > > [1] edgeR_2.0.3 > > > loaded via a namespace (and not attached): > > > [1] limma_3.6.9 tools_2.12.1 > > > > > > I would really appreciate your comments or suggestions. > > > > > > Many thanks! > > > > > > Sridhara > > > > > > -- > > > Sridhara G Kunjeti > > > PhD Candidate > > > University of Delaware > > > Department of Plant and Soil Science > > > email- sridhara@udel.edu > > > Ph: 832-566-0011 > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > ------------------------------ > > Mark Robinson, PhD (Melb) > > Epigenetics Laboratory, Garvan > > Bioinformatics Division, WEHI > > e: mrobinson@wehi.edu.au > > e: m.robinson@garvan.org.au > > p: +61 (0)3 9345 2628 > > f: +61 (0)3 9347 0852 > > ------------------------------ > > > > > > ______________________________________________________________________ > > The information in this email is confidential and intended solely for the > addressee. > > You must not disclose, forward, print or use it without the permission of > the sender. > > ______________________________________________________________________ > > > > > > > > -- > > Sridhara G Kunjeti > > PhD Candidate > > University of Delaware > > Department of Plant and Soil Science > > email- sridhara@udel.edu > > Ph: 832-566-0011 > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: mrobinson@wehi.edu.au > e: m.robinson@garvan.org.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:20}}

ADD REPLY • link 12.9 years ago Sridhara Gupta Kunjeti ▴ 320

0

Entering edit mode

Hi Sridhara I do not think it is genes with all zero counts for group A and group C are causing the results you see. I just tested this on a dataset with 9 groups, and comparing two groups, A and B, with 285 genes with all zero counts in groups A and B yielded "expected" p-values and FDRs. Therefore I do not think that your p-values all being 1 is driven by these all-zero genes. Is there truly very little difference in expression between groups A and C relative to biological variability in your data? You could have a look at the counts (raw, normalized or counts per million) for the top-ranked (even if not significant) genes for your group A - group C comparison. If you see little difference in expression between the groups for the top genes then you may have no differential expression between these groups. If, on the other hand, there does look to be large differences in expression between the groups then you may have found a bug in the p-values that are being output and we can go ahead and try to fix the issue. I notice that you are using R 2.12 and edgeR version 2.0.3. I would recommend updating to R 2.13 and the latest release of edgeR---there have been many improvements made to the package since version 2.0.3 and any bug fixes (if required) will roll out to the current release and devel versions, not legacy versions of the package. Cheers Davis On May 26, 2011, at 6:16 AM, Sridhara Gupta Kunjeti wrote: > Hello Mark, > Thank you very much for you email. It greatly helped me to export the FDR, > p-value, logFC and logConc into csv format. > I have one real quick question, this is more of statistical question. > After exporting the FDR, I started analyzing pair by pair. In the below > example, what I noticed is when comparing the group A - B, I got p-value and > FDR that make sense. But, when I checked for the group A- group C > comparision. all the 10,000 genes had FDR and p-value of 1, then I counted > the number of genes that had "0" in both the groups for both the replicates, > it turned out to be about 400 genes. So, my question is why the other genes > (9600) had FDR and p-value of "1". Do you think the 400 genes with "0" > counts would affect the analysis? Do I need to delete these 400 genes for > the pair (gp A - gp C) comparison and then run and edgeR analysis > individually? > > groupA Group B Group > C > Genes A1 A2 B1 B2 C1 C2 > 1 0 0 11 12 0 > 0 > 2 120 102 45 38 30 > 40 > > > Any help or comments will be appreciated. > > Many thanks! > Sridhara > > > On Sun, May 22, 2011 at 4:24 PM, Mark Robinson <mrobinson at="" wehi.edu.au="">wrote: > >> Hi Sridhara, >> >> The problem here is that the output of topTags() (your 'fdr06') is not a >> data.frame or matrix, which is what write.table() works best on. Instead, >> try: >> >> fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH", >> sort.by="p.value") >> write.table(fdr06$table, file = "FDR06.csv", sep=",") >> >> Cheers, >> Mark >> >> On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote: >> >>> Hello Mark, >>> Thanks for your email. I have one quick question. Is it possible to >> export all the 10,427 genes after passing exactTest()? what argument do I >> need to use to do that? Basically I wanted the complete list of genes with >> the following info: >>>> topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value") >>> Comparison of groups: T6-P18 >>> >> logConc logFC PValue FDR >>> PITG_08841 | Pi conserved hypothetical protein (129 nt) >> -28.79463 42.442850 1.032735e-11 1.076833e-07 >>> PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt) >> -12.93992 9.148329 1.288618e-09 6.193586e-06 >>> >>> If I use the following argument, it is showing an error message. >>> >>> fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by >> ="p.value") >>> write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA, >> qmethod="double") >>> Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, : >> arguments imply differing number of rows: 10427, 1, 2 >>> >>> If I do the same with n = 10426, it is executinig without any error. >> Except that I am missing one row. >>> >>> Any suggetions on how to export all the columns for all the rows will be >> a great help. >>> >>> Many thanks! >>> Sridhara >>> >>> >>> >>> >>> On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <mrobinson at="" wehi.edu.au=""> >> wrote: >>> Hi Sridhara, >>> >>> If you haven't already, you might have a solid read of the edgeR user's >> guide, it has answers to some of your questions. >>> >>> >>> On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote: >>> >>>> Hello, >>>> I have used edgeR for DGE analysis and I have few questions regarding >> the >>>> model and comparisions. >>>> >>>> 1) What kind of statistical model is taken into account to analyze >> treatment >>>> structure and conduct analysis of variance? >>> >>> For the example you show below (a 2-group comparison), the 'Negative >> binomial models' Section in the user's guide covers this. Of course, the >> package has facility for more complicated "treatment structure" through >> generalized linear models (See the 'Experiment with multiple factors' >> Section, for example). >>> >>> >>>> 2) How does the edgeR correct the multiple comparisions? >>> >>> See ?topTags; its also mentioned in the user's guide. >>> >>> ---- >>> topTags(object, n=10, adjust.method="BH", sort.by="p.value") >>> ... >>> adjust.method: character string stating the method used to adjust >>> p-values for multiple testing, passed on to ?p.adjust? >>> ... >>> ---- >>> >>> >>>> 3) I am assuming that the calculated p-values in the output after >>>> performing the tagwiseDispersion are after adjusting for multiple >> testing. >>>> Please correct me if I am wrong? If so, what kind of multiple testing >> is >>>> taken into account? >>> >>> exactTest() doesn't do the multiple testing correction, but topTags() >> does. >>> >>> HTH, >>> Mark >>> >>> >>>> >>>> The arguments that I passed are as follows: >>>>> raw.data <- read.delim("c33_con3.txt") >>>>> raw.data.2a <- read.delim ("2c33_con3.txt") >>>>> d2a <- raw.data.2a[, 2:5] >>>>> rownames(d2a) <- raw.data.2a[,1] >>>>> group2a <- c(rep("c33", 2), rep("con3", 2)) >>>>> d2a <- DGEList(counts = d2a, group = group2a) >>>>> d2a <- estimateCommonDisp(d2a) >>>>> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) >>>>> prior.n2a <- estimateSmoothing(d2a) >>>>> de2a.tgw <- exactTest(d2a, common.disp = FALSE) >>>>> de2a.tgw >>>> An object of class "DGEExact" >>>> $table >>>> >>>> logConc logFC p.value >>>> MGG_00005 | Mo hypothetical protein (1014 nt) >>>> -16.67772 0.05248378 0.9394668 >>>> MGG_00015 | Mo catechol O-methyltransferase (1102 nt) >>>> -14.68066 0.36189877 0.2786389 >>>> MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) >>>> -13.50677 0.32379041 0.3759259 >>>> MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) >> -14.28686 >>>> -0.35747999 0.3040601 >>>> MGG_00018 | Mo integral membrane protein (2504 nt) >>>> -14.56791 0.45187243 0.1701996 >>>> 11452 more rows ... >>>> $comparison >>>> [1] "c33" "con3" >>>> $genes >>>> NULL >>>> >>>> >>>>> sessionInfo() >>>> R version 2.12.1 (2010-12-16) >>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>> States.1252 LC_MONETARY=English_United States.1252 >>>> [4] LC_NUMERIC=C LC_TIME=English_United >>>> States.1252 >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> other attached packages: >>>> [1] edgeR_2.0.3 >>>> loaded via a namespace (and not attached): >>>> [1] limma_3.6.9 tools_2.12.1 >>>> >>>> I would really appreciate your comments or suggestions. >>>> >>>> Many thanks! >>>> >>>> Sridhara >>>> >>>> -- >>>> Sridhara G Kunjeti >>>> PhD Candidate >>>> University of Delaware >>>> Department of Plant and Soil Science >>>> email- sridhara at udel.edu >>>> Ph: 832-566-0011 >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> ------------------------------ >>> Mark Robinson, PhD (Melb) >>> Epigenetics Laboratory, Garvan >>> Bioinformatics Division, WEHI >>> e: mrobinson at wehi.edu.au >>> e: m.robinson at garvan.org.au >>> p: +61 (0)3 9345 2628 >>> f: +61 (0)3 9347 0852 >>> ------------------------------ >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the >> addressee. >>> You must not disclose, forward, print or use it without the permission of >> the sender. >>> ______________________________________________________________________ >>> >>> >>> >>> -- >>> Sridhara G Kunjeti >>> PhD Candidate >>> University of Delaware >>> Department of Plant and Soil Science >>> email- sridhara at udel.edu >>> Ph: 832-566-0011 >> >> ------------------------------ >> Mark Robinson, PhD (Melb) >> Epigenetics Laboratory, Garvan >> Bioinformatics Division, WEHI >> e: mrobinson at wehi.edu.au >> e: m.robinson at garvan.org.au >> p: +61 (0)3 9345 2628 >> f: +61 (0)3 9347 0852 >> ------------------------------ >> >> >> ______________________________________________________________________ >> The information in this email is confidential and inte...{{dropped:20}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- -- Davis J McCarthy Research Technician Bioinformatics Division Walter and Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville, Vic 3052, Australia dmccarthy at wehi.edu.au http://www.wehi.edu.au ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 12.9 years ago Davis McCarthy ▴ 260

0

Entering edit mode

Hello Davis, Thank you very much for your email. After looking at one of my comparisons, it makes total sense about the p-value. But, I did notice that out of 10827 genes, most of them (10820) had an *FDR* of 1 and rest others had an FDR of 0.5, 0.6, 0.7, and 0.8 so on.... I was wondering if "0" in the data will cause this FDR? I will also install latest version of R 2.13 and also the edgeR. Could you please let me know the latest version of edgeR that is available for me to download? I am assuming I can still follow the same manual (from version 2.0.3) for the new version of edgeR. Many thanks! Sridhara On Thu, May 26, 2011 at 10:52 PM, Davis McCarthy <dmccarthy@wehi.edu.au>wrote: > Hi Sridhara > > I do not think it is genes with all zero counts for group A and group C are > causing the results you see. > > I just tested this on a dataset with 9 groups, and comparing two groups, A > and B, with 285 genes with all zero counts in groups A and B yielded > "expected" p-values and FDRs. Therefore I do not think that your p-values > all being 1 is driven by these all-zero genes. > > Is there truly very little difference in expression between groups A and C > relative to biological variability in your data? You could have a look at > the counts (raw, normalized or counts per million) for the top- ranked (even > if not significant) genes for your group A - group C comparison. > > If you see little difference in expression between the groups for the top > genes then you may have no differential expression between these groups. If, > on the other hand, there does look to be large differences in expression > between the groups then you may have found a bug in the p-values that are > being output and we can go ahead and try to fix the issue. > > I notice that you are using R 2.12 and edgeR version 2.0.3. I would > recommend updating to R 2.13 and the latest release of edgeR---there have > been many improvements made to the package since version 2.0.3 and any bug > fixes (if required) will roll out to the current release and devel versions, > not legacy versions of the package. > > Cheers > Davis > > > > On May 26, 2011, at 6:16 AM, Sridhara Gupta Kunjeti wrote: > > > Hello Mark, > > Thank you very much for you email. It greatly helped me to export the > FDR, > > p-value, logFC and logConc into csv format. > > I have one real quick question, this is more of statistical question. > > After exporting the FDR, I started analyzing pair by pair. In the below > > example, what I noticed is when comparing the group A - B, I got p-value > and > > FDR that make sense. But, when I checked for the group A- group C > > comparision. all the 10,000 genes had FDR and p-value of 1, then I > counted > > the number of genes that had "0" in both the groups for both the > replicates, > > it turned out to be about 400 genes. So, my question is why the other > genes > > (9600) had FDR and p-value of "1". Do you think the 400 genes with "0" > > counts would affect the analysis? Do I need to delete these 400 genes for > > the pair (gp A - gp C) comparison and then run and edgeR analysis > > individually? > > > > groupA Group B > Group > > C > > Genes A1 A2 B1 B2 C1 > C2 > > 1 0 0 11 12 > 0 > > 0 > > 2 120 102 45 38 30 > > 40 > > > > > > Any help or comments will be appreciated. > > > > Many thanks! > > Sridhara > > > > > > On Sun, May 22, 2011 at 4:24 PM, Mark Robinson <mrobinson@wehi.edu.au> >wrote: > > > >> Hi Sridhara, > >> > >> The problem here is that the output of topTags() (your 'fdr06') is not a > >> data.frame or matrix, which is what write.table() works best on. > Instead, > >> try: > >> > >> fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH", > >> sort.by="p.value") > >> write.table(fdr06$table, file = "FDR06.csv", sep=",") > >> > >> Cheers, > >> Mark > >> > >> On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote: > >> > >>> Hello Mark, > >>> Thanks for your email. I have one quick question. Is it possible to > >> export all the 10,427 genes after passing exactTest()? what argument do > I > >> need to use to do that? Basically I wanted the complete list of genes > with > >> the following info: > >>>> topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value") > >>> Comparison of groups: T6-P18 > >>> > >> logConc logFC PValue FDR > >>> PITG_08841 | Pi conserved hypothetical protein (129 nt) > >> -28.79463 42.442850 1.032735e-11 1.076833e-07 > >>> PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt) > >> -12.93992 9.148329 1.288618e-09 6.193586e-06 > >>> > >>> If I use the following argument, it is showing an error message. > >>> > >>> fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by > >> ="p.value") > >>> write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA, > >> qmethod="double") > >>> Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, : > >> arguments imply differing number of rows: 10427, 1, 2 > >>> > >>> If I do the same with n = 10426, it is executinig without any error. > >> Except that I am missing one row. > >>> > >>> Any suggetions on how to export all the columns for all the rows will > be > >> a great help. > >>> > >>> Many thanks! > >>> Sridhara > >>> > >>> > >>> > >>> > >>> On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <mrobinson@wehi.edu.au> > >> wrote: > >>> Hi Sridhara, > >>> > >>> If you haven't already, you might have a solid read of the edgeR user's > >> guide, it has answers to some of your questions. > >>> > >>> > >>> On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote: > >>> > >>>> Hello, > >>>> I have used edgeR for DGE analysis and I have few questions regarding > >> the > >>>> model and comparisions. > >>>> > >>>> 1) What kind of statistical model is taken into account to analyze > >> treatment > >>>> structure and conduct analysis of variance? > >>> > >>> For the example you show below (a 2-group comparison), the 'Negative > >> binomial models' Section in the user's guide covers this. Of course, > the > >> package has facility for more complicated "treatment structure" through > >> generalized linear models (See the 'Experiment with multiple factors' > >> Section, for example). > >>> > >>> > >>>> 2) How does the edgeR correct the multiple comparisions? > >>> > >>> See ?topTags; its also mentioned in the user's guide. > >>> > >>> ---- > >>> topTags(object, n=10, adjust.method="BH", sort.by="p.value") > >>> ... > >>> adjust.method: character string stating the method used to adjust > >>> p-values for multiple testing, passed on to p.adjust > >>> ... > >>> ---- > >>> > >>> > >>>> 3) I am assuming that the calculated p-values in the output after > >>>> performing the tagwiseDispersion are after adjusting for multiple > >> testing. > >>>> Please correct me if I am wrong? If so, what kind of multiple testing > >> is > >>>> taken into account? > >>> > >>> exactTest() doesn't do the multiple testing correction, but topTags() > >> does. > >>> > >>> HTH, > >>> Mark > >>> > >>> > >>>> > >>>> The arguments that I passed are as follows: > >>>>> raw.data <- read.delim("c33_con3.txt") > >>>>> raw.data.2a <- read.delim ("2c33_con3.txt") > >>>>> d2a <- raw.data.2a[, 2:5] > >>>>> rownames(d2a) <- raw.data.2a[,1] > >>>>> group2a <- c(rep("c33", 2), rep("con3", 2)) > >>>>> d2a <- DGEList(counts = d2a, group = group2a) > >>>>> d2a <- estimateCommonDisp(d2a) > >>>>> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) > >>>>> prior.n2a <- estimateSmoothing(d2a) > >>>>> de2a.tgw <- exactTest(d2a, common.disp = FALSE) > >>>>> de2a.tgw > >>>> An object of class "DGEExact" > >>>> $table > >>>> > >>>> logConc logFC p.value > >>>> MGG_00005 | Mo hypothetical protein (1014 nt) > >>>> -16.67772 0.05248378 0.9394668 > >>>> MGG_00015 | Mo catechol O-methyltransferase (1102 nt) > >>>> -14.68066 0.36189877 0.2786389 > >>>> MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) > >>>> -13.50677 0.32379041 0.3759259 > >>>> MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) > >> -14.28686 > >>>> -0.35747999 0.3040601 > >>>> MGG_00018 | Mo integral membrane protein (2504 nt) > >>>> -14.56791 0.45187243 0.1701996 > >>>> 11452 more rows ... > >>>> $comparison > >>>> [1] "c33" "con3" > >>>> $genes > >>>> NULL > >>>> > >>>> > >>>>> sessionInfo() > >>>> R version 2.12.1 (2010-12-16) > >>>> Platform: i386-pc-mingw32/i386 (32-bit) > >>>> locale: > >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > >>>> States.1252 LC_MONETARY=English_United States.1252 > >>>> [4] LC_NUMERIC=C LC_TIME=English_United > >>>> States.1252 > >>>> attached base packages: > >>>> [1] stats graphics grDevices utils datasets methods base > >>>> other attached packages: > >>>> [1] edgeR_2.0.3 > >>>> loaded via a namespace (and not attached): > >>>> [1] limma_3.6.9 tools_2.12.1 > >>>> > >>>> I would really appreciate your comments or suggestions. > >>>> > >>>> Many thanks! > >>>> > >>>> Sridhara > >>>> > >>>> -- > >>>> Sridhara G Kunjeti > >>>> PhD Candidate > >>>> University of Delaware > >>>> Department of Plant and Soil Science > >>>> email- sridhara@udel.edu > >>>> Ph: 832-566-0011 > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@r-project.org > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >>> ------------------------------ > >>> Mark Robinson, PhD (Melb) > >>> Epigenetics Laboratory, Garvan > >>> Bioinformatics Division, WEHI > >>> e: mrobinson@wehi.edu.au > >>> e: m.robinson@garvan.org.au > >>> p: +61 (0)3 9345 2628 > >>> f: +61 (0)3 9347 0852 > >>> ------------------------------ > >>> > >>> > >>> ______________________________________________________________________ > >>> The information in this email is confidential and intended solely for > the > >> addressee. > >>> You must not disclose, forward, print or use it without the permission > of > >> the sender. > >>> ______________________________________________________________________ > >>> > >>> > >>> > >>> -- > >>> Sridhara G Kunjeti > >>> PhD Candidate > >>> University of Delaware > >>> Department of Plant and Soil Science > >>> email- sridhara@udel.edu > >>> Ph: 832-566-0011 > >> > >> ------------------------------ > >> Mark Robinson, PhD (Melb) > >> Epigenetics Laboratory, Garvan > >> Bioinformatics Division, WEHI > >> e: mrobinson@wehi.edu.au > >> e: m.robinson@garvan.org.au > >> p: +61 (0)3 9345 2628 > >> f: +61 (0)3 9347 0852 > >> ------------------------------ > >> > >> > >> ______________________________________________________________________ > >> The information in this email is confidential and inte...{{dropped:20}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -------------------------------------------------------------------- ---- > Davis J McCarthy > Research Technician > Bioinformatics Division > Walter and Eliza Hall Institute of Medical Research > 1G Royal Parade, Parkville, Vic 3052, Australia > dmccarthy@wehi.edu.au > http://www.wehi.edu.au > > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:20}}

ADD REPLY • link 12.9 years ago Sridhara Gupta Kunjeti ▴ 320

0

Entering edit mode

Hi Sridhara I'm not sure I completely follow what you're saying about the FDRs being 0.5, 0.6 etc. Can you show us a top table? Output of topTags(). Actually it would be good to see all of your edgeR function calls to get a better idea of how you're carrying out your analysis. In principle I don't think that "0" in the data will have any adverse effects on your analysis, so I'm not really sure what the results are that you're trying to describe. If you are in an R 2.13 session and enter the commands: source("http://www.bioconductor.org/biocLite.R") biocLite("edgeR") then edgeR version 2.2.5 will be installed on your system. I would recommend following the latest version of the edgeR User's Guide, which was released with edgeR 2.2.x. You can get it from edgeR's Bioconductor page: http://www.bioconductor.org/packages/2.8/bioc/html/edgeR.html Hope that helps. Cheers Davis On May 27, 2011, at 9:36 PM, Sridhara Gupta Kunjeti wrote: > Hello Davis, > Thank you very much for your email. After looking at one of my comparisons, it makes total sense about the p-value. But, I did notice that out of 10827 genes, most of them (10820) had an FDR of 1 and rest others had an FDR of 0.5, 0.6, 0.7, and 0.8 so on.... I was wondering if "0" in the data will cause this FDR? > > I will also install latest version of R 2.13 and also the edgeR. Could you please let me know the latest version of edgeR that is available for me to download? I am assuming I can still follow the same manual (from version 2.0.3) for the new version of edgeR. > > Many thanks! > Sridhara > > > On Thu, May 26, 2011 at 10:52 PM, Davis McCarthy <dmccarthy@wehi.edu.au> wrote: > Hi Sridhara > > I do not think it is genes with all zero counts for group A and group C are causing the results you see. > > I just tested this on a dataset with 9 groups, and comparing two groups, A and B, with 285 genes with all zero counts in groups A and B yielded "expected" p-values and FDRs. Therefore I do not think that your p-values all being 1 is driven by these all-zero genes. > > Is there truly very little difference in expression between groups A and C relative to biological variability in your data? You could have a look at the counts (raw, normalized or counts per million) for the top-ranked (even if not significant) genes for your group A - group C comparison. > > If you see little difference in expression between the groups for the top genes then you may have no differential expression between these groups. If, on the other hand, there does look to be large differences in expression between the groups then you may have found a bug in the p-values that are being output and we can go ahead and try to fix the issue. > > I notice that you are using R 2.12 and edgeR version 2.0.3. I would recommend updating to R 2.13 and the latest release of edgeR---there have been many improvements made to the package since version 2.0.3 and any bug fixes (if required) will roll out to the current release and devel versions, not legacy versions of the package. > > Cheers > Davis > > > > On May 26, 2011, at 6:16 AM, Sridhara Gupta Kunjeti wrote: > > > Hello Mark, > > Thank you very much for you email. It greatly helped me to export the FDR, > > p-value, logFC and logConc into csv format. > > I have one real quick question, this is more of statistical question. > > After exporting the FDR, I started analyzing pair by pair. In the below > > example, what I noticed is when comparing the group A - B, I got p-value and > > FDR that make sense. But, when I checked for the group A- group C > > comparision. all the 10,000 genes had FDR and p-value of 1, then I counted > > the number of genes that had "0" in both the groups for both the replicates, > > it turned out to be about 400 genes. So, my question is why the other genes > > (9600) had FDR and p-value of "1". Do you think the 400 genes with "0" > > counts would affect the analysis? Do I need to delete these 400 genes for > > the pair (gp A - gp C) comparison and then run and edgeR analysis > > individually? > > > > groupA Group B Group > > C > > Genes A1 A2 B1 B2 C1 C2 > > 1 0 0 11 12 0 > > 0 > > 2 120 102 45 38 30 > > 40 > > > > > > Any help or comments will be appreciated. > > > > Many thanks! > > Sridhara > > > > > > On Sun, May 22, 2011 at 4:24 PM, Mark Robinson <mrobinson@wehi.edu.au>wrote: > > > >> Hi Sridhara, > >> > >> The problem here is that the output of topTags() (your 'fdr06') is not a > >> data.frame or matrix, which is what write.table() works best on. Instead, > >> try: > >> > >> fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH", > >> sort.by="p.value") > >> write.table(fdr06$table, file = "FDR06.csv", sep=",") > >> > >> Cheers, > >> Mark > >> > >> On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote: > >> > >>> Hello Mark, > >>> Thanks for your email. I have one quick question. Is it possible to > >> export all the 10,427 genes after passing exactTest()? what argument do I > >> need to use to do that? Basically I wanted the complete list of genes with > >> the following info: > >>>> topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value") > >>> Comparison of groups: T6-P18 > >>> > >> logConc logFC PValue FDR > >>> PITG_08841 | Pi conserved hypothetical protein (129 nt) > >> -28.79463 42.442850 1.032735e-11 1.076833e-07 > >>> PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt) > >> -12.93992 9.148329 1.288618e-09 6.193586e-06 > >>> > >>> If I use the following argument, it is showing an error message. > >>> > >>> fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by > >> ="p.value") > >>> write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA, > >> qmethod="double") > >>> Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, : > >> arguments imply differing number of rows: 10427, 1, 2 > >>> > >>> If I do the same with n = 10426, it is executinig without any error. > >> Except that I am missing one row. > >>> > >>> Any suggetions on how to export all the columns for all the rows will be > >> a great help. > >>> > >>> Many thanks! > >>> Sridhara > >>> > >>> > >>> > >>> > >>> On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <mrobinson@wehi.edu.au> > >> wrote: > >>> Hi Sridhara, > >>> > >>> If you haven't already, you might have a solid read of the edgeR user's > >> guide, it has answers to some of your questions. > >>> > >>> > >>> On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote: > >>> > >>>> Hello, > >>>> I have used edgeR for DGE analysis and I have few questions regarding > >> the > >>>> model and comparisions. > >>>> > >>>> 1) What kind of statistical model is taken into account to analyze > >> treatment > >>>> structure and conduct analysis of variance? > >>> > >>> For the example you show below (a 2-group comparison), the 'Negative > >> binomial models' Section in the user's guide covers this. Of course, the > >> package has facility for more complicated "treatment structure" through > >> generalized linear models (See the 'Experiment with multiple factors' > >> Section, for example). > >>> > >>> > >>>> 2) How does the edgeR correct the multiple comparisions? > >>> > >>> See ?topTags; its also mentioned in the user's guide. > >>> > >>> ---- > >>> topTags(object, n=10, adjust.method="BH", sort.by="p.value") > >>> ... > >>> adjust.method: character string stating the method used to adjust > >>> p-values for multiple testing, passed on to p.adjust > >>> ... > >>> ---- > >>> > >>> > >>>> 3) I am assuming that the calculated p-values in the output after > >>>> performing the tagwiseDispersion are after adjusting for multiple > >> testing. > >>>> Please correct me if I am wrong? If so, what kind of multiple testing > >> is > >>>> taken into account? > >>> > >>> exactTest() doesn't do the multiple testing correction, but topTags() > >> does. > >>> > >>> HTH, > >>> Mark > >>> > >>> > >>>> > >>>> The arguments that I passed are as follows: > >>>>> raw.data <- read.delim("c33_con3.txt") > >>>>> raw.data.2a <- read.delim ("2c33_con3.txt") > >>>>> d2a <- raw.data.2a[, 2:5] > >>>>> rownames(d2a) <- raw.data.2a[,1] > >>>>> group2a <- c(rep("c33", 2), rep("con3", 2)) > >>>>> d2a <- DGEList(counts = d2a, group = group2a) > >>>>> d2a <- estimateCommonDisp(d2a) > >>>>> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) > >>>>> prior.n2a <- estimateSmoothing(d2a) > >>>>> de2a.tgw <- exactTest(d2a, common.disp = FALSE) > >>>>> de2a.tgw > >>>> An object of class "DGEExact" > >>>> $table > >>>> > >>>> logConc logFC p.value > >>>> MGG_00005 | Mo hypothetical protein (1014 nt) > >>>> -16.67772 0.05248378 0.9394668 > >>>> MGG_00015 | Mo catechol O-methyltransferase (1102 nt) > >>>> -14.68066 0.36189877 0.2786389 > >>>> MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) > >>>> -13.50677 0.32379041 0.3759259 > >>>> MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) > >> -14.28686 > >>>> -0.35747999 0.3040601 > >>>> MGG_00018 | Mo integral membrane protein (2504 nt) > >>>> -14.56791 0.45187243 0.1701996 > >>>> 11452 more rows ... > >>>> $comparison > >>>> [1] "c33" "con3" > >>>> $genes > >>>> NULL > >>>> > >>>> > >>>>> sessionInfo() > >>>> R version 2.12.1 (2010-12-16) > >>>> Platform: i386-pc-mingw32/i386 (32-bit) > >>>> locale: > >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > >>>> States.1252 LC_MONETARY=English_United States.1252 > >>>> [4] LC_NUMERIC=C LC_TIME=English_United > >>>> States.1252 > >>>> attached base packages: > >>>> [1] stats graphics grDevices utils datasets methods base > >>>> other attached packages: > >>>> [1] edgeR_2.0.3 > >>>> loaded via a namespace (and not attached): > >>>> [1] limma_3.6.9 tools_2.12.1 > >>>> > >>>> I would really appreciate your comments or suggestions. > >>>> > >>>> Many thanks! > >>>> > >>>> Sridhara > >>>> > >>>> -- > >>>> Sridhara G Kunjeti > >>>> PhD Candidate > >>>> University of Delaware > >>>> Department of Plant and Soil Science > >>>> email- sridhara@udel.edu > >>>> Ph: 832-566-0011 > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@r-project.org > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >>> ------------------------------ > >>> Mark Robinson, PhD (Melb) > >>> Epigenetics Laboratory, Garvan > >>> Bioinformatics Division, WEHI > >>> e: mrobinson@wehi.edu.au > >>> e: m.robinson@garvan.org.au > >>> p: +61 (0)3 9345 2628 > >>> f: +61 (0)3 9347 0852 > >>> ------------------------------ > >>> > >>> > >>> ______________________________________________________________________ > >>> The information in this email is confidential and intended solely for the > >> addressee. > >>> You must not disclose, forward, print or use it without the permission of > >> the sender. > >>> ______________________________________________________________________ > >>> > >>> > >>> > >>> -- > >>> Sridhara G Kunjeti > >>> PhD Candidate > >>> University of Delaware > >>> Department of Plant and Soil Science > >>> email- sridhara@udel.edu > >>> Ph: 832-566-0011 > >> > >> ------------------------------ > >> Mark Robinson, PhD (Melb) > >> Epigenetics Laboratory, Garvan > >> Bioinformatics Division, WEHI > >> e: mrobinson@wehi.edu.au > >> e: m.robinson@garvan.org.au > >> p: +61 (0)3 9345 2628 > >> f: +61 (0)3 9347 0852 > >> ------------------------------ > >> > >> > >> ______________________________________________________________________ > >> The information in this email is confidential and inte...{{dropped:20}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -------------------------------------------------------------------- ---- > Davis J McCarthy > Research Technician > Bioinformatics Division > Walter and Eliza Hall Institute of Medical Research > 1G Royal Parade, Parkville, Vic 3052, Australia > dmccarthy@wehi.edu.au > http://www.wehi.edu.au > > > > > ______________________________________________________________________ > The information in this email is confidential and intended solely for the addressee. > You must not disclose, forward, print or use it without the permission of the sender. > ______________________________________________________________________ > > > > -- > Sridhara G Kunjeti > PhD Candidate > University of Delaware > Department of Plant and Soil Science > email- sridhara@udel.edu > Ph: 832-566-0011 ---------------------------------------------------------------------- -- Davis J McCarthy Research Technician Bioinformatics Division Walter and Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville, Vic 3052, Australia dmccarthy@wehi.edu.au http://www.wehi.edu.au ______________________________________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________________________________________________ [[alternative HTML version deleted]]

ADD REPLY • link 12.9 years ago Davis McCarthy ▴ 260

0

Entering edit mode

Hello Davis, Yes, this helped me to solve the problem. On the other hand, I have a different kind of question, which is related to the exactTest. First two columns in my inputs files are the counts for the control group. ============================================= example 1: Textfile1 Gene con3-1 con3-2 dca-1 dca-2. when I run the exactTest > de1.tgw <- exactTest(d1, common.disp = FALSE) Comparison of groups: dca3 - con3 So, if the logFC is positive, it means it is up-regulated in dca3, and these dots are plotted above '0' in the plotsmear. ================================================ example 2: When I swap the columns Gene dca-1 dca-2 con3-1 con3-2 > de1.tgw <- exactTest(d1, common.disp = FALSE) Comparison of groups: con3 - dca3 Here if the logFC is negative, it means it is up-regulated in dca3, and these are plotted below '0' in the plotSmear. Here the bottom line is if I swap the columns, when I run the exactTest, it changes the sequence in pairing. In other words pairs* change* from dca3 - con3 to con3 - dca3. This worked absolutely fine with 6 pairs. But for four pairs, even when I swap the columns in the input data, in the exactTest the sequence is not changing. i.e., con3 - c33 *does not change* to c33-con3 My worry is if I look at logFC values, for some of the pair if the values is "+", then it is up-regulated in the treatment and for some it is "-". I am assuming this is going to be a problem when I generate plotSmear. I mean inconsistent. Any help in generating same logFC values (positive for upregualtion in treatment) will be appreciated. Thanks, Sridhara On Mon, May 30, 2011 at 2:33 AM, Davis McCarthy <dmccarthy@wehi.edu.au>wrote: > Hi Sridhara > > I'm not sure I completely follow what you're saying about the FDRs being > 0.5, 0.6 etc. Can you show us a top table? Output of topTags(). Actually it > would be good to see all of your edgeR function calls to get a better idea > of how you're carrying out your analysis. In principle I don't think that > "0" in the data will have any adverse effects on your analysis, so I'm not > really sure what the results are that you're trying to describe. > > If you are in an R 2.13 session and enter the commands: > > source("http://www.bioconductor.org/biocLite.R") > biocLite("edgeR") > > then edgeR version 2.2.5 will be installed on your system. I would > recommend following the latest version of the edgeR User's Guide, which was > released with edgeR 2.2.x. You can get it from edgeR's Bioconductor page: > http://www.bioconductor.org/packages/2.8/bioc/html/edgeR.html > > Hope that helps. > > Cheers > Davis > > > On May 27, 2011, at 9:36 PM, Sridhara Gupta Kunjeti wrote: > > Hello Davis, > Thank you very much for your email. After looking at one of my comparisons, > it makes total sense about the p-value. But, I did notice that out of 10827 > genes, most of them (10820) had an *FDR* of 1 and rest others had an FDR > of 0.5, 0.6, 0.7, and 0.8 so on.... I was wondering if "0" in the data will > cause this FDR? > > I will also install latest version of R 2.13 and also the edgeR. Could you > please let me know the latest version of edgeR that is available for me to > download? I am assuming I can still follow the same manual (from version > 2.0.3) for the new version of edgeR. > > Many thanks! > Sridhara > > > On Thu, May 26, 2011 at 10:52 PM, Davis McCarthy <dmccarthy@wehi.edu.au>wrote: > >> Hi Sridhara >> >> I do not think it is genes with all zero counts for group A and group C >> are causing the results you see. >> >> I just tested this on a dataset with 9 groups, and comparing two groups, A >> and B, with 285 genes with all zero counts in groups A and B yielded >> "expected" p-values and FDRs. Therefore I do not think that your p-values >> all being 1 is driven by these all-zero genes. >> >> Is there truly very little difference in expression between groups A and C >> relative to biological variability in your data? You could have a look at >> the counts (raw, normalized or counts per million) for the top- ranked (even >> if not significant) genes for your group A - group C comparison. >> >> If you see little difference in expression between the groups for the top >> genes then you may have no differential expression between these groups. If, >> on the other hand, there does look to be large differences in expression >> between the groups then you may have found a bug in the p-values that are >> being output and we can go ahead and try to fix the issue. >> >> I notice that you are using R 2.12 and edgeR version 2.0.3. I would >> recommend updating to R 2.13 and the latest release of edgeR--- there have >> been many improvements made to the package since version 2.0.3 and any bug >> fixes (if required) will roll out to the current release and devel versions, >> not legacy versions of the package. >> >> Cheers >> Davis >> >> >> >> On May 26, 2011, at 6:16 AM, Sridhara Gupta Kunjeti wrote: >> >> > Hello Mark, >> > Thank you very much for you email. It greatly helped me to export the >> FDR, >> > p-value, logFC and logConc into csv format. >> > I have one real quick question, this is more of statistical question. >> > After exporting the FDR, I started analyzing pair by pair. In the below >> > example, what I noticed is when comparing the group A - B, I got p-value >> and >> > FDR that make sense. But, when I checked for the group A- group C >> > comparision. all the 10,000 genes had FDR and p-value of 1, then I >> counted >> > the number of genes that had "0" in both the groups for both the >> replicates, >> > it turned out to be about 400 genes. So, my question is why the other >> genes >> > (9600) had FDR and p-value of "1". Do you think the 400 genes with "0" >> > counts would affect the analysis? Do I need to delete these 400 genes >> for >> > the pair (gp A - gp C) comparison and then run and edgeR analysis >> > individually? >> > >> > groupA Group B >> Group >> > C >> > Genes A1 A2 B1 B2 C1 >> C2 >> > 1 0 0 11 12 >> 0 >> > 0 >> > 2 120 102 45 38 30 >> > 40 >> > >> > >> > Any help or comments will be appreciated. >> > >> > Many thanks! >> > Sridhara >> > >> > >> > On Sun, May 22, 2011 at 4:24 PM, Mark Robinson <mrobinson@wehi.edu.au>> >wrote: >> > >> >> Hi Sridhara, >> >> >> >> The problem here is that the output of topTags() (your 'fdr06') is not >> a >> >> data.frame or matrix, which is what write.table() works best on. >> Instead, >> >> try: >> >> >> >> fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH", >> >> sort.by="p.value") >> >> write.table(fdr06$table, file = "FDR06.csv", sep=",") >> >> >> >> Cheers, >> >> Mark >> >> >> >> On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote: >> >> >> >>> Hello Mark, >> >>> Thanks for your email. I have one quick question. Is it possible to >> >> export all the 10,427 genes after passing exactTest()? what argument do >> I >> >> need to use to do that? Basically I wanted the complete list of genes >> with >> >> the following info: >> >>>> topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value") >> >>> Comparison of groups: T6-P18 >> >>> >> >> logConc logFC PValue FDR >> >>> PITG_08841 | Pi conserved hypothetical protein (129 nt) >> >> -28.79463 42.442850 1.032735e-11 1.076833e-07 >> >>> PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt) >> >> -12.93992 9.148329 1.288618e-09 6.193586e-06 >> >>> >> >>> If I use the following argument, it is showing an error message. >> >>> >> >>> fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by >> >> ="p.value") >> >>> write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA, >> >> qmethod="double") >> >>> Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, : >> >> arguments imply differing number of rows: 10427, 1, 2 >> >>> >> >>> If I do the same with n = 10426, it is executinig without any error. >> >> Except that I am missing one row. >> >>> >> >>> Any suggetions on how to export all the columns for all the rows will >> be >> >> a great help. >> >>> >> >>> Many thanks! >> >>> Sridhara >> >>> >> >>> >> >>> >> >>> >> >>> On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <mrobinson@wehi.edu.au>> > >> >> wrote: >> >>> Hi Sridhara, >> >>> >> >>> If you haven't already, you might have a solid read of the edgeR >> user's >> >> guide, it has answers to some of your questions. >> >>> >> >>> >> >>> On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote: >> >>> >> >>>> Hello, >> >>>> I have used edgeR for DGE analysis and I have few questions regarding >> >> the >> >>>> model and comparisions. >> >>>> >> >>>> 1) What kind of statistical model is taken into account to analyze >> >> treatment >> >>>> structure and conduct analysis of variance? >> >>> >> >>> For the example you show below (a 2-group comparison), the 'Negative >> >> binomial models' Section in the user's guide covers this. Of course, >> the >> >> package has facility for more complicated "treatment structure" through >> >> generalized linear models (See the 'Experiment with multiple factors' >> >> Section, for example). >> >>> >> >>> >> >>>> 2) How does the edgeR correct the multiple comparisions? >> >>> >> >>> See ?topTags; its also mentioned in the user's guide. >> >>> >> >>> ---- >> >>> topTags(object, n=10, adjust.method="BH", sort.by="p.value") >> >>> ... >> >>> adjust.method: character string stating the method used to adjust >> >>> p-values for multiple testing, passed on to p.adjust >> >>> ... >> >>> ---- >> >>> >> >>> >> >>>> 3) I am assuming that the calculated p-values in the output after >> >>>> performing the tagwiseDispersion are after adjusting for multiple >> >> testing. >> >>>> Please correct me if I am wrong? If so, what kind of multiple testing >> >> is >> >>>> taken into account? >> >>> >> >>> exactTest() doesn't do the multiple testing correction, but topTags() >> >> does. >> >>> >> >>> HTH, >> >>> Mark >> >>> >> >>> >> >>>> >> >>>> The arguments that I passed are as follows: >> >>>>> raw.data <- read.delim("c33_con3.txt") >> >>>>> raw.data.2a <- read.delim ("2c33_con3.txt") >> >>>>> d2a <- raw.data.2a[, 2:5] >> >>>>> rownames(d2a) <- raw.data.2a[,1] >> >>>>> group2a <- c(rep("c33", 2), rep("con3", 2)) >> >>>>> d2a <- DGEList(counts = d2a, group = group2a) >> >>>>> d2a <- estimateCommonDisp(d2a) >> >>>>> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500) >> >>>>> prior.n2a <- estimateSmoothing(d2a) >> >>>>> de2a.tgw <- exactTest(d2a, common.disp = FALSE) >> >>>>> de2a.tgw >> >>>> An object of class "DGEExact" >> >>>> $table >> >>>> >> >>>> logConc logFC p.value >> >>>> MGG_00005 | Mo hypothetical protein (1014 nt) >> >>>> -16.67772 0.05248378 0.9394668 >> >>>> MGG_00015 | Mo catechol O-methyltransferase (1102 nt) >> >>>> -14.68066 0.36189877 0.2786389 >> >>>> MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt) >> >>>> -13.50677 0.32379041 0.3759259 >> >>>> MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt) >> >> -14.28686 >> >>>> -0.35747999 0.3040601 >> >>>> MGG_00018 | Mo integral membrane protein (2504 nt) >> >>>> -14.56791 0.45187243 0.1701996 >> >>>> 11452 more rows ... >> >>>> $comparison >> >>>> [1] "c33" "con3" >> >>>> $genes >> >>>> NULL >> >>>> >> >>>> >> >>>>> sessionInfo() >> >>>> R version 2.12.1 (2010-12-16) >> >>>> Platform: i386-pc-mingw32/i386 (32-bit) >> >>>> locale: >> >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> >>>> States.1252 LC_MONETARY=English_United States.1252 >> >>>> [4] LC_NUMERIC=C LC_TIME=English_United >> >>>> States.1252 >> >>>> attached base packages: >> >>>> [1] stats graphics grDevices utils datasets methods base >> >>>> other attached packages: >> >>>> [1] edgeR_2.0.3 >> >>>> loaded via a namespace (and not attached): >> >>>> [1] limma_3.6.9 tools_2.12.1 >> >>>> >> >>>> I would really appreciate your comments or suggestions. >> >>>> >> >>>> Many thanks! >> >>>> >> >>>> Sridhara >> >>>> >> >>>> -- >> >>>> Sridhara G Kunjeti >> >>>> PhD Candidate >> >>>> University of Delaware >> >>>> Department of Plant and Soil Science >> >>>> email- sridhara@udel.edu >> >>>> Ph: 832-566-0011 >> >>>> >> >>>> [[alternative HTML version deleted]] >> >>>> >> >>>> _______________________________________________ >> >>>> Bioconductor mailing list >> >>>> Bioconductor@r-project.org >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>>> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>> >> >>> ------------------------------ >> >>> Mark Robinson, PhD (Melb) >> >>> Epigenetics Laboratory, Garvan >> >>> Bioinformatics Division, WEHI >> >>> e: mrobinson@wehi.edu.au >> >>> e: m.robinson@garvan.org.au >> >>> p: +61 (0)3 9345 2628 >> >>> f: +61 (0)3 9347 0852 >> >>> ------------------------------ >> >>> >> >>> >> >>> ______________________________________________________________________ >> >>> The information in this email is confidential and intended solely for >> the >> >> addressee. >> >>> You must not disclose, forward, print or use it without the permission >> of >> >> the sender. >> >>> ______________________________________________________________________ >> >>> >> >>> >> >>> >> >>> -- >> >>> Sridhara G Kunjeti >> >>> PhD Candidate >> >>> University of Delaware >> >>> Department of Plant and Soil Science >> >>> email- sridhara@udel.edu >> >>> Ph: 832-566-0011 >> >> >> >> ------------------------------ >> >> Mark Robinson, PhD (Melb) >> >> Epigenetics Laboratory, Garvan >> >> Bioinformatics Division, WEHI >> >> e: mrobinson@wehi.edu.au >> >> e: m.robinson@garvan.org.au >> >> p: +61 (0)3 9345 2628 >> >> f: +61 (0)3 9347 0852 >> >> ------------------------------ >> >> >> >> >> >> ______________________________________________________________________ >> >> The information in this email is confidential and inte...{{dropped:20}} >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ------------------------------------------------------------------- ----- >> Davis J McCarthy >> Research Technician >> Bioinformatics Division >> Walter and Eliza Hall Institute of Medical Research >> 1G Royal Parade, Parkville, Vic 3052, Australia >> dmccarthy@wehi.edu.au >> http://www.wehi.edu.au >> >> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> > > > > -- > Sridhara G Kunjeti > PhD Candidate > University of Delaware > Department of Plant and Soil Science > email- sridhara@udel.edu > Ph: 832-566-0011 > > > -------------------------------------------------------------------- ---- > Davis J McCarthy > Research Technician > Bioinformatics Division > Walter and Eliza Hall Institute of Medical Research > 1G Royal Parade, Parkville, Vic 3052, Australia > dmccarthy@wehi.edu.au > http://www.wehi.edu.au > > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:20}}

ADD REPLY • link 12.9 years ago Sridhara Gupta Kunjeti ▴ 320

Login before adding your answer.