edgeR: GLM for multi-factor and mulit-level designs
1
0
Entering edit mode
@dorota-herman-5627
Last seen 9.7 years ago
Hello everyone, I have been playing with the GLM approach for RNA-seq data in DESeq and edgeR but I am fairly new in DE analyses. I am interested in pairwise comparisons in multi-factor multi-level designs. My question concerns my understanding of an application of the glmLRT function #My code is >countsTable <- read.delim(file) >header <- c('A_1','A_2','A_3','B_1','B_2','B_3','C_1','C_2','C_3','D_1','D_2','D _3') >names(countsTable) <- header >conds <- factor(c('A','A','A?,'B','B','B','C','C','C','D','D','D')) >Ex<-factor(c('exper1', 'exper2', 'exper3', 'exper2', 'exper3', 'exper4', 'exper1', 'exper3', 'exper4', 'exper2', 'exper3', 'exper4')) >group <- conds >dge <- DGEList(counts=countsTable,group=group) >dge <- calcNormFactors(dge, method='TMM') >design <- model.matrix(~Ex+conds) >rownames(design)<-colnames(dge) >dge <- estimateGLMCommonDisp(dge,design) >dge <- estimateGLMTrendedDisp(dge, design) >dge <- estimateGLMTagwiseDisp(dge, design) >fit <- glmFit(dge, design) #my design looks like: > design (Intercept) Eexper2 Eexper3 Eexper4 condsB condsC condsD A_1 1 0 0 0 0 0 0 A_2 1 1 0 0 0 0 0 A_3 1 0 1 0 0 0 0 B_1 1 1 0 0 1 0 0 B_2 1 0 1 0 1 0 0 B_3 1 0 0 1 1 0 0 C_1 1 0 0 0 0 1 0 C_2 1 0 1 0 0 1 0 C_3 1 0 0 1 0 1 0 D_1 1 1 0 0 0 0 1 D_2 1 0 1 0 0 0 1 D_3 1 0 0 1 0 0 1 attr(,"assign") [1] 0 1 1 1 2 2 2 attr(,"contrasts") attr(,"contrasts")$E [1] "contr.treatment" attr(,"contrasts")$conds [1] "contr.treatment" I understand that R function rewrote the model matrix because of the identifiability problem for parameter estimations. However it causes my confusion in further usage of that design for the pairwise comparisons. In a case when I want to obtain differentially expressed genes between A and B, I understand I should use the function: >lrt <- glmLRT(fit,coef="condsB") Is it correct? In a case when I want to obtain differentially expressed genes between C and D (*without taking into account A*), are these calling functions correct? >C_D<-makeContrasts(condsC-condsD,levels=design) >lrt <- glmLRT(fit,contrast=C_D) Does it mean that glmLRT function takes into account first conds (A) when we use ?coef? parameter and discard it when we use ?contrast? parameter? Or it means that the second analysis, between C and D takes into account differential expression with A too? I hope my explanation of the question is not too confusing. Best wishes Dorota -- ================================================================== Dorota Herman, PhD VIB Department of Plant Systems Biology, Ghent University Technologiepark 927 9052 Gent, Belgium Tel: +32 (0)9 3313692 Email:dorota.herman at psb.vib-ugent.be Web: http://www.psb.ugent.be
• 2.7k views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 9 months ago
Scripps Research, La Jolla, CA
I think you have both tests correct. Each coefficient in the model matrix from conds is an "X vs A" coefficient. So the "condsB" coefficient is the coefficient for B vs A. That means that you can test B vs A by simply testing that coefficient equal to zero, as you have done. Note that you could also make a contrast for the same test like this: B_A<-makeContrasts(condsB,levels=design) Essentially, you are testing the null hypothesis that the specified coefficient is zero for each gene. which would make a vector with just one nonzero element. When you test the contrast of condsC-condsD, you are subtracting "D vs A" from "C vs A", which is the same as adding "C vs A" and "A vs D". Thus this difference gives "C vs D", which is what you want. Condition A is not being ignored; it simply cancels out of the contrast in question. Instead of testing whether a single coefficient is equal to zero, you are testing whether the difference of two coefficients is zero. By the way, note the order: "condsC - condsD" will give the same statistics as "condsD - condsC", but will give opposite fold changes, so choose the one that makes the most sense to you (typically "treatment - control" for simple experiments). On Tue 27 Nov 2012 07:18:20 AM PST, Dorota Herman wrote: > > Hello everyone, > > I have been playing with the GLM approach for RNA-seq data in DESeq > and edgeR but I am fairly new in DE analyses. I am interested in > pairwise comparisons in multi-factor multi-level designs. My question > concerns my understanding of an application of the glmLRT function > > #My code is >> >> countsTable <- read.delim(file) >> header <- > > c('A_1','A_2','A_3','B_1','B_2','B_3','C_1','C_2','C_3','D_1','D_2', 'D_3') > > >> >> names(countsTable) <- header >> conds <- factor(c('A','A','A?,'B','B','B','C','C','C','D','D','D')) >> Ex<-factor(c('exper1', 'exper2', 'exper3', 'exper2', 'exper3', 'exper4', > > 'exper1', 'exper3', 'exper4', 'exper2', 'exper3', 'exper4')) >> >> group <- conds >> dge <- DGEList(counts=countsTable,group=group) >> dge <- calcNormFactors(dge, method='TMM') >> design <- model.matrix(~Ex+conds) >> rownames(design)<-colnames(dge) >> dge <- estimateGLMCommonDisp(dge,design) >> dge <- estimateGLMTrendedDisp(dge, design) >> dge <- estimateGLMTagwiseDisp(dge, design) >> fit <- glmFit(dge, design) > > > #my design looks like: >> >> design > > (Intercept) Eexper2 Eexper3 Eexper4 condsB condsC condsD > A_1 1 0 0 0 0 0 0 > A_2 1 1 0 0 0 0 0 > A_3 1 0 1 0 0 0 0 > B_1 1 1 0 0 1 0 0 > B_2 1 0 1 0 1 0 0 > B_3 1 0 0 1 1 0 0 > C_1 1 0 0 0 0 1 0 > C_2 1 0 1 0 0 1 0 > C_3 1 0 0 1 0 1 0 > D_1 1 1 0 0 0 0 1 > D_2 1 0 1 0 0 0 1 > D_3 1 0 0 1 0 0 1 > attr(,"assign") > [1] 0 1 1 1 2 2 2 > attr(,"contrasts") > attr(,"contrasts")$E > [1] "contr.treatment" > attr(,"contrasts")$conds > [1] "contr.treatment" > > I understand that R function rewrote the model matrix because of the > identifiability problem for parameter estimations. However it causes > my confusion in further usage of that design for the pairwise > comparisons. > > In a case when I want to obtain differentially expressed genes between > A and B, I understand I should use the function: >> >> lrt <- glmLRT(fit,coef="condsB") > > Is it correct? > > In a case when I want to obtain differentially expressed genes between > C and D (*without taking into account A*), are these calling functions > correct? >> >> C_D<-makeContrasts(condsC-condsD,levels=design) >> lrt <- glmLRT(fit,contrast=C_D) > > > Does it mean that glmLRT function takes into account first conds (A) > when we use ?coef? parameter and discard it when we use ?contrast? > parameter? Or it means that the second analysis, between C and D takes > into account differential expression with A too? > > I hope my explanation of the question is not too confusing. > Best wishes > Dorota
ADD COMMENT
0
Entering edit mode
thanks a lot for your respond best wishes Dorota Ryan C. Thompson wrote: > I think you have both tests correct. Each coefficient in the model > matrix from conds is an "X vs A" coefficient. So the "condsB" > coefficient is the coefficient for B vs A. That means that you can > test B vs A by simply testing that coefficient equal to zero, as you > have done. Note that you could also make a contrast for the same test > like this: > > B_A<-makeContrasts(condsB,levels=design) > > Essentially, you are testing the null hypothesis that the specified > coefficient is zero for each gene. > > which would make a vector with just one nonzero element. When you test > the contrast of condsC-condsD, you are subtracting "D vs A" from "C vs > A", which is the same as adding "C vs A" and "A vs D". Thus this > difference gives "C vs D", which is what you want. Condition A is not > being ignored; it simply cancels out of the contrast in question. > Instead of testing whether a single coefficient is equal to zero, you > are testing whether the difference of two coefficients is zero. > > By the way, note the order: "condsC - condsD" will give the same > statistics as "condsD - condsC", but will give opposite fold changes, > so choose the one that makes the most sense to you (typically > "treatment - control" for simple experiments). > > > On Tue 27 Nov 2012 07:18:20 AM PST, Dorota Herman wrote: >> >> Hello everyone, >> >> I have been playing with the GLM approach for RNA-seq data in DESeq >> and edgeR but I am fairly new in DE analyses. I am interested in >> pairwise comparisons in multi-factor multi-level designs. My question >> concerns my understanding of an application of the glmLRT function >> >> #My code is >>> >>> countsTable <- read.delim(file) >>> header <- >> >> c('A_1','A_2','A_3','B_1','B_2','B_3','C_1','C_2','C_3','D_1','D_2' ,'D_3') >> >> >>> >>> names(countsTable) <- header >>> conds <- factor(c('A','A','A?,'B','B','B','C','C','C','D','D','D')) >>> Ex<-factor(c('exper1', 'exper2', 'exper3', 'exper2', 'exper3', >>> 'exper4', >> >> 'exper1', 'exper3', 'exper4', 'exper2', 'exper3', 'exper4')) >>> >>> group <- conds >>> dge <- DGEList(counts=countsTable,group=group) >>> dge <- calcNormFactors(dge, method='TMM') >>> design <- model.matrix(~Ex+conds) >>> rownames(design)<-colnames(dge) >>> dge <- estimateGLMCommonDisp(dge,design) >>> dge <- estimateGLMTrendedDisp(dge, design) >>> dge <- estimateGLMTagwiseDisp(dge, design) >>> fit <- glmFit(dge, design) >> >> >> #my design looks like: >>> >>> design >> >> (Intercept) Eexper2 Eexper3 Eexper4 condsB condsC condsD >> A_1 1 0 0 0 0 0 0 >> A_2 1 1 0 0 0 0 0 >> A_3 1 0 1 0 0 0 0 >> B_1 1 1 0 0 1 0 0 >> B_2 1 0 1 0 1 0 0 >> B_3 1 0 0 1 1 0 0 >> C_1 1 0 0 0 0 1 0 >> C_2 1 0 1 0 0 1 0 >> C_3 1 0 0 1 0 1 0 >> D_1 1 1 0 0 0 0 1 >> D_2 1 0 1 0 0 0 1 >> D_3 1 0 0 1 0 0 1 >> attr(,"assign") >> [1] 0 1 1 1 2 2 2 >> attr(,"contrasts") >> attr(,"contrasts")$E >> [1] "contr.treatment" >> attr(,"contrasts")$conds >> [1] "contr.treatment" >> >> I understand that R function rewrote the model matrix because of the >> identifiability problem for parameter estimations. However it causes >> my confusion in further usage of that design for the pairwise >> comparisons. >> >> In a case when I want to obtain differentially expressed genes between >> A and B, I understand I should use the function: >>> >>> lrt <- glmLRT(fit,coef="condsB") >> >> Is it correct? >> >> In a case when I want to obtain differentially expressed genes between >> C and D (*without taking into account A*), are these calling functions >> correct? >>> >>> C_D<-makeContrasts(condsC-condsD,levels=design) >>> lrt <- glmLRT(fit,contrast=C_D) >> >> >> Does it mean that glmLRT function takes into account first conds (A) >> when we use ?coef? parameter and discard it when we use ?contrast? >> parameter? Or it means that the second analysis, between C and D takes >> into account differential expression with A too? >> >> I hope my explanation of the question is not too confusing. >> Best wishes >> Dorota -- ================================================================== Dorota Herman, PhD VIB Department of Plant Systems Biology, Ghent University Technologiepark 927 9052 Gent, Belgium Tel: +32 (0)9 3313692 Email:dorota.herman at psb.vib-ugent.be Web: http://www.psb.ugent.be
ADD REPLY

Login before adding your answer.

Traffic: 802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6