edgeR - ANOVA-like test for differential gene expression

0

Entering edit mode

Benedikt Drosse ▴ 30

@benedikt-drosse-6014

Last seen 9.6 years ago

Dear edgeR users, I worked my way through the user's guide of edgeR. However, I still have some questions and problems understanding the anova-like way of analyzing differential gene expression. I would be very grateful for any help or suggestion. I hope I did not overlook any earlier post concerning the same topic. My RNAseq data are derived from 4 different developmental stages with 3 biological replicates each (see overview). I am interested in genes differentially expressed between any of the developmental stages. I am not sure, which way of analysis is best to answer this question. Making the "Anova-like" model asking for all contrasts at once or doing all possible contrasts one by one. In principle the "anova-like" way of analysis should fit best to my question, if I understood the edgeR user's guide correctly. So I make a model matrix as given in "design" and fit the glm model for the effect of Development including the intercept. To ask for expression differences between any of the developmental stages I use "coef=2:4" in the call of glmLRT. So my questions are: Will coef=2:4 yield only the contrasts of Development2-Development1, Development3-Development1, Development4-Development1 (as it is described in the user's guide), or will it also test the contrasts of Development3-Development2, Development4-Development2 and Development4-Development3, which would also be important contrasts for me to analyze? For further filtering and candidate gene identification I would like to extract the logFoldChanges and significance level between any of the developmental stages. If the "anova-like" analysis provides all of the six possible contrasts, is there an easy way to extract logFC and significance level for any of these contrasts from the resulting glmLRT-output? If the anova-like analysis does NOT yield ALL of the possible contrasts, I would have to make the contrasts one by one. In this case would it be more useful to build the glm model without an intercept (see design_2), and then ask for the six contrasts separately as indicated in the comment of glm_data_2? Doing so it would be easy for me to get the logFC and the corresponding signficance level. Is it conceptionally wrong to do the single contrasts instead of an anova-like analysis concerning the multiple testing problem (6 tests instead of 1)? I will be very grateful for any comment on my questions, Best regards, Benedikt Digel #R code: overview = matrix(nrow=12, ncol=1) overview[,1] = factor(c(1,1,1,2,2,2,3,3,3,4,4,4)) colnames(overview) = "Development" rownames(overview) = paste("library_", 1:12, sep="") design = model.matrix(object = ~Development) fit = glmFit(y=data_with_tagwiseDISP, design=design) # I did not provide the corresponding data for the analysis, since it should just indicate the way I call glmFit glm_data = glmLRT(glmfit=fit, coef=2:4) design_2 = model.matrix(object=~0+Development) fit_2 = glmFit(y=data_with_tagwiseDISP, design=design_2) # I did not provide the corresponding data for the analysis, since it should just indicate the way I call glmFit glm_data_2 = glmLRT(glmfit=fit_2, contrast=c(-1,1,0,0)) # c(-1,0,1,0); c(-1,0,0,1); c(0,-1,1,0); c(0,1-,0,1); c(0,0,-1,1) all possible contrasts to extract effects of all comparisons -- Benedikt Digel geb. Drosse PhD student AG von Korff Max Planck Institute for Plant Breeding Research Dept. Plant Developmental Biology Carl-von-Linn?-Weg 10 50829 Cologne Germany Tel. (Office): +49-(0)221-5062-280 Email: drosse at mpipz.mpg.de

RNASeq edgeR RNASeq edgeR • 5.6k views

ADD COMMENT • link 10.8 years ago Benedikt Drosse ▴ 30

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

Hi Benedikt, When do you coef=2:4, what you are actually doing is comparing the full model against a model where those coefficients are deleted entirely. What remains after deleting those coefficients is just the intercept term. Hence, you are testing all possible contrasts between conditions. There are 6 contrasts, but they are redundant, and there are only 3 degrees of freedom being tested (since you are testing 3 coefficients). If you use topTable, you will get logFC values for 3 of the 6 contrasts. The other 3 logFC values can be calculated by subtracting the logFC values you have. -Ryan Thompson On 6/26/13 8:43 AM, Benedikt Drosse wrote: > Dear edgeR users, > > I worked my way through the user's guide of edgeR. However, I still > have some questions and problems understanding the anova-like way of > analyzing differential gene expression. I would be very grateful for > any help or suggestion. I hope I did not overlook any earlier post > concerning the same topic. > > My RNAseq data are derived from 4 different developmental stages with > 3 biological replicates each (see overview). I am interested in genes > differentially expressed between any of the developmental stages. > I am not sure, which way of analysis is best to answer this question. > Making the "Anova-like" model asking for all contrasts at once or > doing all possible contrasts one by one. > > In principle the "anova-like" way of analysis should fit best to my > question, if I understood the edgeR user's guide correctly. So I make > a model matrix as given in "design" and fit the glm model for the > effect of Development including the intercept. To ask for expression > differences between any of the developmental stages I use "coef=2:4" > in the call of glmLRT. > > So my questions are: > Will coef=2:4 yield only the contrasts of Development2-Development1, > Development3-Development1, Development4-Development1 (as it is > described in the user's guide), > or will it also test the contrasts of Development3-Development2, > Development4-Development2 and Development4-Development3, which would > also be important contrasts for me to analyze? > For further filtering and candidate gene identification I would like > to extract the logFoldChanges and significance level between any of > the developmental stages. > If the "anova-like" analysis provides all of the six possible > contrasts, is there an easy way to extract logFC and significance > level for any of these contrasts from the resulting glmLRT-output? > > If the anova-like analysis does NOT yield ALL of the possible > contrasts, I would have to make the contrasts one by one. > In this case would it be more useful to build the glm model without an > intercept (see design_2), and then ask for the six contrasts > separately as indicated in the comment of glm_data_2? > Doing so it would be easy for me to get the logFC and the > corresponding signficance level. > Is it conceptionally wrong to do the single contrasts instead of an > anova-like analysis concerning the multiple testing problem (6 tests > instead of 1)? > > I will be very grateful for any comment on my questions, > > Best regards, > Benedikt Digel > > > > #R code: > overview = matrix(nrow=12, ncol=1) > overview[,1] = factor(c(1,1,1,2,2,2,3,3,3,4,4,4)) > colnames(overview) = "Development" > rownames(overview) = paste("library_", 1:12, sep="") > > design = model.matrix(object = ~Development) > fit = glmFit(y=data_with_tagwiseDISP, design=design) # I did not > provide the corresponding data for the analysis, since it should just > indicate the way I call glmFit > glm_data = glmLRT(glmfit=fit, coef=2:4) > > design_2 = model.matrix(object=~0+Development) > fit_2 = glmFit(y=data_with_tagwiseDISP, design=design_2) # I > did not provide the corresponding data for the analysis, since it > should just indicate the way I call glmFit > glm_data_2 = glmLRT(glmfit=fit_2, contrast=c(-1,1,0,0)) # > c(-1,0,1,0); c(-1,0,0,1); c(0,-1,1,0); c(0,1-,0,1); c(0,0,-1,1) all > possible contrasts to extract effects of all comparisons >

ADD COMMENT • link 10.8 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Eduardo Andrés León ▴ 90

@eduardo-andres-leon-5963

Last seen 9.6 years ago

Hi Benedikt, I'm not an expert but I had a similar problem not long time ago, and the example 3.3.1 in the edgeR guide, helped me a lot while designing the contrast : > my.contrasts <- makeContrasts( + Drug.1vs0 = Drug.1h-Drug.0h, + Drug.2vs0 = Drug.2h-Drug.0h, + Placebo.1vs0 = Placebo.1h-Placebo.0h, + Placebo.2vs0 = Placebo.2h-Placebo.0h, + DrugvsPlacebo.0h = Drug.0h-Placebo.0h, + DrugvsPlacebo.1h = (Drug.1h-Drug.0h)-(Placebo.1h- Placebo.0h), + DrugvsPlacebo.2h = (Drug.2h-Drug.0h)-(Placebo.2h- Placebo.0h), + levels=design) I hope this helps cheers On 26 Jun 2013, at 17:43, Benedikt Drosse <drosse@mpipz.mpg.de> wrote: > Dear edgeR users, > > I worked my way through the user's guide of edgeR. However, I still have some questions and problems understanding the anova-like way of analyzing differential gene expression. I would be very grateful for any help or suggestion. I hope I did not overlook any earlier post concerning the same topic. > > My RNAseq data are derived from 4 different developmental stages with 3 biological replicates each (see overview). I am interested in genes differentially expressed between any of the developmental stages. > I am not sure, which way of analysis is best to answer this question. Making the "Anova-like" model asking for all contrasts at once or doing all possible contrasts one by one. > > In principle the "anova-like" way of analysis should fit best to my question, if I understood the edgeR user's guide correctly. So I make a model matrix as given in "design" and fit the glm model for the effect of Development including the intercept. To ask for expression differences between any of the developmental stages I use "coef=2:4" in the call of glmLRT. > > So my questions are: > Will coef=2:4 yield only the contrasts of Development2-Development1, Development3-Development1, Development4-Development1 (as it is described in the user's guide), > or will it also test the contrasts of Development3-Development2, Development4-Development2 and Development4-Development3, which would also be important contrasts for me to analyze? > For further filtering and candidate gene identification I would like to extract the logFoldChanges and significance level between any of the developmental stages. > If the "anova-like" analysis provides all of the six possible contrasts, is there an easy way to extract logFC and significance level for any of these contrasts from the resulting glmLRT-output? > > If the anova-like analysis does NOT yield ALL of the possible contrasts, I would have to make the contrasts one by one. > In this case would it be more useful to build the glm model without an intercept (see design_2), and then ask for the six contrasts separately as indicated in the comment of glm_data_2? > Doing so it would be easy for me to get the logFC and the corresponding signficance level. > Is it conceptionally wrong to do the single contrasts instead of an anova-like analysis concerning the multiple testing problem (6 tests instead of 1)? > > I will be very grateful for any comment on my questions, > > Best regards, > Benedikt Digel > > > > #R code: > overview = matrix(nrow=12, ncol=1) > overview[,1] = factor(c(1,1,1,2,2,2,3,3,3,4,4,4)) > colnames(overview) = "Development" > rownames(overview) = paste("library_", 1:12, sep="") > > design = model.matrix(object = ~Development) > fit = glmFit(y=data_with_tagwiseDISP, design=design) # I did not provide the corresponding data for the analysis, since it should just indicate the way I call glmFit > glm_data = glmLRT(glmfit=fit, coef=2:4) > > design_2 = model.matrix(object=~0+Development) > fit_2 = glmFit(y=data_with_tagwiseDISP, design=design_2) # I did not provide the corresponding data for the analysis, since it should just indicate the way I call glmFit > glm_data_2 = glmLRT(glmfit=fit_2, contrast=c(-1,1,0,0)) # c(-1,0,1,0); c(-1,0,0,1); c(0,-1,1,0); c(0,1-,0,1); c(0,0,-1,1) all possible contrasts to extract effects of all comparisons > > -- > Benedikt Digel geb. Drosse > > PhD student > AG von Korff > Max Planck Institute for Plant Breeding Research > Dept. Plant Developmental Biology > Carl-von-Linné-Weg 10 > 50829 Cologne > Germany > > Tel. (Office): +49-(0)221-5062-280 > Email: drosse@mpipz.mpg.de > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor =================================================== Eduardo Andrés León Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 5054/3063) e-mail: eandres@cnio.es Fax: (+34) 91 224 69 76 Unidad de Bioinformática Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fernández Almagro, 3 Madrid (Spain) http://bioinfo.cnio.es http://bioinfo.cnio.es/people/eandres =================================================== [[alternative HTML version deleted]]

ADD COMMENT • link 10.8 years ago Eduardo Andrés León ▴ 90

0

Entering edit mode

Dear Eduardo, thanks for your very quick reply. Yes, I had a looked at your suggested example from the user's guide already before. My main problem is not in designing the contrasts but rather in understanding, what the anova- like analysis using coef=2:4 does exactly and whether this is the preferred method over doing the single contrasts in separate calls to glmLRT. And whether it is possible to extract logFC and significance level for the single contrast when the anova-like method was chosen with coef=2:4. Thanks anyway. Bests, Benedikt On 26.06.2013 17:50, Eduardo Andrés León wrote: > Hi Benedikt, > > I'm not an expert but I had a similar problem not long time ago, and > the example 3.3.1 in the edgeR guide, helped me a lot while designing > the contrast : > > > my.contrasts <- makeContrasts( > > * > + Drug.1vs0 = Drug.1h-Drug.0h, > * > + Drug.2vs0 = Drug.2h-Drug.0h, > * > + Placebo.1vs0 = Placebo.1h-Placebo.0h, > * > + Placebo.2vs0 = Placebo.2h-Placebo.0h, > * > + DrugvsPlacebo.0h = Drug.0h-Placebo.0h, > * > + DrugvsPlacebo.1h = (Drug.1h-Drug.0h)-(Placebo.1h- Placebo.0h), > * > + DrugvsPlacebo.2h = (Drug.2h-Drug.0h)-(Placebo.2h- Placebo.0h), > + levels=design) > > > I hope this helps > > cheers > > > On 26 Jun 2013, at 17:43, Benedikt Drosse <drosse@mpipz.mpg.de> <mailto:drosse@mpipz.mpg.de>> wrote: > >> Dear edgeR users, >> >> I worked my way through the user's guide of edgeR. However, I still >> have some questions and problems understanding the anova-like way of >> analyzing differential gene expression. I would be very grateful for >> any help or suggestion. I hope I did not overlook any earlier post >> concerning the same topic. >> >> My RNAseq data are derived from 4 different developmental stages with >> 3 biological replicates each (see overview). I am interested in genes >> differentially expressed between any of the developmental stages. >> I am not sure, which way of analysis is best to answer this question. >> Making the "Anova-like" model asking for all contrasts at once or >> doing all possible contrasts one by one. >> >> In principle the "anova-like" way of analysis should fit best to my >> question, if I understood the edgeR user's guide correctly. So I make >> a model matrix as given in "design" and fit the glm model for the >> effect of Development including the intercept. To ask for expression >> differences between any of the developmental stages I use "coef=2:4" >> in the call of glmLRT. >> >> So my questions are: >> Will coef=2:4 yield only the contrasts of Development2-Development1, >> Development3-Development1, Development4-Development1 (as it is >> described in the user's guide), >> or will it also test the contrasts of Development3-Development2, >> Development4-Development2 and Development4-Development3, which would >> also be important contrasts for me to analyze? >> For further filtering and candidate gene identification I would like >> to extract the logFoldChanges and significance level between any of >> the developmental stages. >> If the "anova-like" analysis provides all of the six possible >> contrasts, is there an easy way to extract logFC and significance >> level for any of these contrasts from the resulting glmLRT-output? >> >> If the anova-like analysis does NOT yield ALL of the possible >> contrasts, I would have to make the contrasts one by one. >> In this case would it be more useful to build the glm model without >> an intercept (see design_2), and then ask for the six contrasts >> separately as indicated in the comment of glm_data_2? >> Doing so it would be easy for me to get the logFC and the >> corresponding signficance level. >> Is it conceptionally wrong to do the single contrasts instead of an >> anova-like analysis concerning the multiple testing problem (6 tests >> instead of 1)? >> >> I will be very grateful for any comment on my questions, >> >> Best regards, >> Benedikt Digel >> >> >> >> #R code: >> overview = matrix(nrow=12, ncol=1) >> overview[,1] = factor(c(1,1,1,2,2,2,3,3,3,4,4,4)) >> colnames(overview) = "Development" >> rownames(overview) = paste("library_", 1:12, sep="") >> >> design = model.matrix(object = ~Development) >> fit = glmFit(y=data_with_tagwiseDISP, design=design) # I did not >> provide the corresponding data for the analysis, since it should just >> indicate the way I call glmFit >> glm_data = glmLRT(glmfit=fit, coef=2:4) >> >> design_2 = model.matrix(object=~0+Development) >> fit_2 = glmFit(y=data_with_tagwiseDISP, design=design_2) # I >> did not provide the corresponding data for the analysis, since it >> should just indicate the way I call glmFit >> glm_data_2 = glmLRT(glmfit=fit_2, contrast=c(-1,1,0,0)) # >> c(-1,0,1,0); c(-1,0,0,1); c(0,-1,1,0); c(0,1-,0,1); c(0,0,-1,1) all >> possible contrasts to extract effects of all comparisons >> >> -- >> Benedikt Digel geb. Drosse >> >> PhD student >> AG von Korff >> Max Planck Institute for Plant Breeding Research >> Dept. Plant Developmental Biology >> Carl-von-Linné-Weg 10 >> 50829 Cologne >> Germany >> >> Tel. (Office): +49-(0)221-5062-280 >> Email: drosse@mpipz.mpg.de <mailto:drosse@mpipz.mpg.de> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > =================================================== > Eduardo Andrés León > Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 5054/3063) > e-mail: eandres@cnio.es <mailto:eandres@cnio.es> Fax: (+34) 91 224 69 76 > Unidad de Bioinformática Bioinformatics Unit > Centro Nacional de Investigaciones Oncológicas > C.P.: 28029 Zip Code: 28029 > C/. Melchor Fernández Almagro, 3 Madrid (Spain) > http://bioinfo.cnio.eshttp://bioinfo.cnio.es/people/eandres > =================================================== > -- Benedikt Digel geb. Drosse PhD student AG von Korff Max Planck Institute for Plant Breeding Research Dept. Plant Developmental Biology Carl-von-Linné-Weg 10 50829 Cologne Germany Tel. (Office): +49-(0)221-5062-280 Email: drosse@mpipz.mpg.de [[alternative HTML version deleted]]

ADD REPLY • link 10.8 years ago Benedikt Drosse ▴ 30

0

Entering edit mode

Benedikt Drosse ▴ 30

@benedikt-drosse-6014

Last seen 9.6 years ago

Dear Ryan, thanks for clarification. Also your post from March, which I found via google, was very helpful and answered all my questions, thanks. http://grokbase.com/t/r/bioconductor/133c8x4cet/bioc-anova-like-test- in-edger Bests, Benedikt -- Benedikt Digel geb. Drosse PhD student AG von Korff Max Planck Institute for Plant Breeding Research Dept. Plant Developmental Biology Carl-von-Linn?-Weg 10 50829 Cologne Germany Tel. (Office): +49-(0)221-5062-280 Email: drosse at mpipz.mpg.de

ADD COMMENT • link 10.8 years ago Benedikt Drosse ▴ 30

Login before adding your answer.