Question

edgeR: GLM & residuals and model fitting & hypothesis testing

0

Entering edit mode

Susanne Franssen ▴ 30

@susanne-franssen-4994

Last seen 9.6 years ago

Dear all, 1) GLM & residuals: I have a question concerning the use of GLMs in edgeR and the analysis of the residuals after model fitting. I have followed all the steps until model fitting, e.g.: glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion) The results I obtain from the fitting are the following catgories: > names(glmfit.D) [1] "coefficients" "fitted.values" "fail" "not.converged" [5] "deviance" "df.residual" "abundance" "design" [9] "offset" "dispersion" "method" "counts" [13] "samples" What would be the best way to obtain the residuals for the "genewise" GLMs? 2) model fitting & hypothesis testing: I have a fully crossed design with 2 factors and 2 factor levels each: individual <- as.factor(c("indA","indA","indB","indB")) treatment <- as.factor(c("treat1","treat2","treat1","treat2")) in general I would be interested in 3 different aspects: a) effect of individual b) effect of treatment c) interaction between individual and treatment What would be the best way to test for those effects, would I rather test for all three aspects individually, i.e.: a) design <- model.matrix(~individual) b) design <- model.matrix(~treatment) c) design <- model.matrix(~individual*treatment) or doesn't it also make sense to model design <- model.matrix(~individual+treatment) and test for a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2) b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3) ... this way the effect of both factors could be accounted for in the model?! Thanks a lot! Susanne

edgeR edgeR • 3.2k views

ADD COMMENT • link updated 12.2 years ago by Gordon Smyth 50k • written 12.2 years ago by Susanne Franssen ▴ 30

score 0 · Answer 1 · 2012-02-16

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 16 minutes ago

WEHI, Melbourne, Australia

Dear Susanne, 1) You may know that there are many different definitions of "residuals" from a generalized linear models, and none of them have the properties of residuals from normal linear models. I have not found residuals very useful myself in a genomic differential expression context. What sort of residuals did you want, and what were you planning to do with them? 2) You must always fit the model with all relevant factors: individual*treatment. You cannot do meaningful inference from a reduced model until you have determined that the other factors are not important. Hence you have to deal with the interaction first. Best wishes Godon > Date: Tue, 14 Feb 2012 17:44:31 +0100 > From: Susanne Franssen <s.franssen at="" uni-muenster.de=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR: GLM & residuals and model fitting & hypothesis > testing > > Dear all, > > > 1) GLM & residuals: > > I have a question concerning the use of GLMs in edgeR and the analysis > of the residuals after model fitting. > > I have followed all the steps until model fitting, e.g.: > glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion) > > The results I obtain from the fitting are the following catgories: >> names(glmfit.D) > [1] "coefficients" "fitted.values" "fail" "not.converged" > [5] "deviance" "df.residual" "abundance" "design" > [9] "offset" "dispersion" "method" "counts" > [13] "samples" > > What would be the best way to obtain the residuals for the "genewise" GLMs? > > > > 2) model fitting & hypothesis testing: > > I have a fully crossed design with 2 factors and 2 factor levels each: > individual <- as.factor(c("indA","indA","indB","indB")) > treatment <- as.factor(c("treat1","treat2","treat1","treat2")) > > in general I would be interested in 3 different aspects: > a) effect of individual > b) effect of treatment > c) interaction between individual and treatment > > What would be the best way to test for those effects, would I rather > test for all three aspects individually, i.e.: > a) design <- model.matrix(~individual) > b) design <- model.matrix(~treatment) > c) design <- model.matrix(~individual*treatment) > > or doesn't it also make sense to model > design <- model.matrix(~individual+treatment) > and test for > a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2) > b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3) > ... this way the effect of both factors could be accounted for in the model?! > > > Thanks a lot! > Susanne > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 12.2 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon, thanks, a lot for you answer, however I am a little more uncertain now what to do. > 1) You may know that there are many different definitions of "residuals" > from a generalized linear models, and none of them have the properties of > residuals from normal linear models. ?I have not found residuals very useful > myself in a genomic differential expression context. > > What sort of residuals did you want, and what were you planning to do with > them? Indeed I am not aware of this difference! I was thinking of the residual errors in a normal linear model. The problem in my analysis is that I only have 3 df (4 libs) to start with, so I cannot model the complete model individual+treatment+individual*treatment. So an idea was to model individual+treatment and afterwards do a model on the interaction only on the remaining residual errors as new observed vector. Is something like this at all possible or am I completely wrong here? > 2) You must always fit the model with all relevant factors: > individual*treatment. ?You cannot do meaningful inference from a reduced > model until you have determined that the other factors are not important. > Hence you have to deal with the interaction first. It makes sense to have to put in all relevent factors to the model. However, concerning this I don't understand your comment - I have to deal with the interaction first, wouldn't the full model be: individual+treatment+individual*treatment (which I can't do because of a lack of df) Cosidering my samples & factors: >> I have a fully crossed design with 2 factors and 2 factor levels each: >> individual <- as.factor(c("indA","indA","indB","indB")) >> treatment <- as.factor(c("treat1","treat2","treat1","treat2")) how would the cmd for the design matrix look like: design <- model.matrix(~??) and with which cmd would I fit the model and test for it: fit <- glmFit(d.GLM,design) lrt <- glmLRT(d.GLM,fit, coeff=?) Thanks a lot, Susanne > > Best wishes > Godon > >> Date: Tue, 14 Feb 2012 17:44:31 +0100 >> From: Susanne Franssen <s.franssen at="" uni-muenster.de=""> >> To: bioconductor at r-project.org >> Subject: [BioC] edgeR: GLM & residuals and model fitting & hypothesis >> ? ? ? ?testing >> >> Dear all, >> >> >> 1) GLM & residuals: >> >> I have a question concerning the use of GLMs in edgeR and the analysis >> of the residuals after model fitting. >> >> I have followed all the steps until model fitting, e.g.: >> glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion) >> >> The results I obtain from the fitting are the following catgories: >>> >>> names(glmfit.D) >> >> [1] "coefficients" ?"fitted.values" "fail" ? ? ? ? ?"not.converged" >> [5] "deviance" ? ? ?"df.residual" ? "abundance" ? ? "design" >> [9] "offset" ? ? ? ?"dispersion" ? ?"method" ? ? ? ?"counts" >> [13] "samples" >> >> What would be the best way to obtain the residuals for the "genewise" >> GLMs? >> >> >> >> 2) model fitting & hypothesis testing: >> >> I have a fully crossed design with 2 factors and 2 factor levels each: >> individual <- as.factor(c("indA","indA","indB","indB")) >> treatment <- as.factor(c("treat1","treat2","treat1","treat2")) >> >> in general I would be interested in 3 different aspects: >> a) effect of individual >> b) effect of treatment >> c) interaction between individual and treatment >> >> What would be the best way to test for those effects, would I rather >> test for all three aspects individually, i.e.: >> a) design <- model.matrix(~individual) >> b) design <- model.matrix(~treatment) >> c) design <- model.matrix(~individual*treatment) >> >> or doesn't it also make sense to model >> design <- model.matrix(~individual+treatment) >> and test for >> a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2) >> b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3) >> ... this way the effect of both factors could be accounted for in the >> model?! >> >> >> Thanks a lot! >> Susanne >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 12.2 years ago Susanne Franssen ▴ 30

0

Entering edit mode

On Thu, 16 Feb 2012, Susanne Franssen wrote: > Dear Gordon, > > thanks, a lot for you answer, however I am a little more uncertain now > what to do. > >> 1) You may know that there are many different definitions of "residuals" >> from a generalized linear models, and none of them have the properties of >> residuals from normal linear models. ?I have not found residuals very useful >> myself in a genomic differential expression context. >> >> What sort of residuals did you want, and what were you planning to do with >> them? > > Indeed I am not aware of this difference! > I was thinking of the residual errors in a normal linear model. The > problem in my analysis is that I only have 3 df (4 libs) to start > with, so I cannot model the complete model > individual+treatment+individual*treatment. So an idea was to model > individual+treatment and afterwards do a model on the interaction only > on the remaining residual errors as new observed vector. > Is something like this at all possible or am I completely wrong here? No, it is not possible. Please read the section in the edgeR User's Guide "What to do if you have no replicates". >> 2) You must always fit the model with all relevant factors: >> individual*treatment. ?You cannot do meaningful inference from a reduced >> model until you have determined that the other factors are not important. >> Hence you have to deal with the interaction first. > > It makes sense to have to put in all relevent factors to the model. > However, concerning this I don't understand your comment - I have to > deal with the interaction first, wouldn't the full model be: > individual+treatment+individual*treatment (which I can't do because of > a lack of df) In R, the interaction is represented by individual:treatment. The notation individual*treatment is a shorthand for interaction plus main effects, i.e., individual*treatment = individual+treatment+individual:treatment. Please read the "Introduction to R" that comes with your R installation, in particular Section 11 on statistical models. Best wishes Gordon > Cosidering my samples & factors: >>> I have a fully crossed design with 2 factors and 2 factor levels each: >>> individual <- as.factor(c("indA","indA","indB","indB")) >>> treatment <- as.factor(c("treat1","treat2","treat1","treat2")) > > how would the cmd for the design matrix look like: > design <- model.matrix(~??) > and with which cmd would I fit the model and test for it: > fit <- glmFit(d.GLM,design) > lrt <- glmLRT(d.GLM,fit, coeff=?) > > Thanks a lot, > Susanne > > > >> >> Best wishes >> Godon >> >>> Date: Tue, 14 Feb 2012 17:44:31 +0100 >>> From: Susanne Franssen <s.franssen at="" uni-muenster.de=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] edgeR: GLM & residuals and model fitting & hypothesis >>> ? ? ? ?testing >>> >>> Dear all, >>> >>> >>> 1) GLM & residuals: >>> >>> I have a question concerning the use of GLMs in edgeR and the analysis >>> of the residuals after model fitting. >>> >>> I have followed all the steps until model fitting, e.g.: >>> glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion) >>> >>> The results I obtain from the fitting are the following catgories: >>>> >>>> names(glmfit.D) >>> >>> [1] "coefficients" ?"fitted.values" "fail" ? ? ? ? ?"not.converged" >>> [5] "deviance" ? ? ?"df.residual" ? "abundance" ? ? "design" >>> [9] "offset" ? ? ? ?"dispersion" ? ?"method" ? ? ? ?"counts" >>> [13] "samples" >>> >>> What would be the best way to obtain the residuals for the "genewise" >>> GLMs? >>> >>> >>> >>> 2) model fitting & hypothesis testing: >>> >>> I have a fully crossed design with 2 factors and 2 factor levels each: >>> individual <- as.factor(c("indA","indA","indB","indB")) >>> treatment <- as.factor(c("treat1","treat2","treat1","treat2")) >>> >>> in general I would be interested in 3 different aspects: >>> a) effect of individual >>> b) effect of treatment >>> c) interaction between individual and treatment >>> >>> What would be the best way to test for those effects, would I rather >>> test for all three aspects individually, i.e.: >>> a) design <- model.matrix(~individual) >>> b) design <- model.matrix(~treatment) >>> c) design <- model.matrix(~individual*treatment) >>> >>> or doesn't it also make sense to model >>> design <- model.matrix(~individual+treatment) >>> and test for >>> a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2) >>> b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3) >>> ... this way the effect of both factors could be accounted for in the >>> model?! >>> >>> >>> Thanks a lot! >>> Susanne ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}

ADD REPLY • link 12.2 years ago Gordon Smyth 50k