edgeR: GLM & residuals and model fitting & hypothesis testing
1
0
Entering edit mode
@susanne-franssen-4994
Last seen 9.6 years ago
Dear all, 1) GLM & residuals: I have a question concerning the use of GLMs in edgeR and the analysis of the residuals after model fitting. I have followed all the steps until model fitting, e.g.: glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion) The results I obtain from the fitting are the following catgories: > names(glmfit.D) [1] "coefficients" "fitted.values" "fail" "not.converged" [5] "deviance" "df.residual" "abundance" "design" [9] "offset" "dispersion" "method" "counts" [13] "samples" What would be the best way to obtain the residuals for the "genewise" GLMs? 2) model fitting & hypothesis testing: I have a fully crossed design with 2 factors and 2 factor levels each: individual <- as.factor(c("indA","indA","indB","indB")) treatment <- as.factor(c("treat1","treat2","treat1","treat2")) in general I would be interested in 3 different aspects: a) effect of individual b) effect of treatment c) interaction between individual and treatment What would be the best way to test for those effects, would I rather test for all three aspects individually, i.e.: a) design <- model.matrix(~individual) b) design <- model.matrix(~treatment) c) design <- model.matrix(~individual*treatment) or doesn't it also make sense to model design <- model.matrix(~individual+treatment) and test for a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2) b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3) ... this way the effect of both factors could be accounted for in the model?! Thanks a lot! Susanne
edgeR edgeR • 3.2k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 16 minutes ago
WEHI, Melbourne, Australia
Dear Susanne, 1) You may know that there are many different definitions of "residuals" from a generalized linear models, and none of them have the properties of residuals from normal linear models. I have not found residuals very useful myself in a genomic differential expression context. What sort of residuals did you want, and what were you planning to do with them? 2) You must always fit the model with all relevant factors: individual*treatment. You cannot do meaningful inference from a reduced model until you have determined that the other factors are not important. Hence you have to deal with the interaction first. Best wishes Godon > Date: Tue, 14 Feb 2012 17:44:31 +0100 > From: Susanne Franssen <s.franssen at="" uni-muenster.de=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR: GLM & residuals and model fitting & hypothesis > testing > > Dear all, > > > 1) GLM & residuals: > > I have a question concerning the use of GLMs in edgeR and the analysis > of the residuals after model fitting. > > I have followed all the steps until model fitting, e.g.: > glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion) > > The results I obtain from the fitting are the following catgories: >> names(glmfit.D) > [1] "coefficients" "fitted.values" "fail" "not.converged" > [5] "deviance" "df.residual" "abundance" "design" > [9] "offset" "dispersion" "method" "counts" > [13] "samples" > > What would be the best way to obtain the residuals for the "genewise" GLMs? > > > > 2) model fitting & hypothesis testing: > > I have a fully crossed design with 2 factors and 2 factor levels each: > individual <- as.factor(c("indA","indA","indB","indB")) > treatment <- as.factor(c("treat1","treat2","treat1","treat2")) > > in general I would be interested in 3 different aspects: > a) effect of individual > b) effect of treatment > c) interaction between individual and treatment > > What would be the best way to test for those effects, would I rather > test for all three aspects individually, i.e.: > a) design <- model.matrix(~individual) > b) design <- model.matrix(~treatment) > c) design <- model.matrix(~individual*treatment) > > or doesn't it also make sense to model > design <- model.matrix(~individual+treatment) > and test for > a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2) > b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3) > ... this way the effect of both factors could be accounted for in the model?! > > > Thanks a lot! > Susanne > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Dear Gordon, thanks, a lot for you answer, however I am a little more uncertain now what to do. > 1) You may know that there are many different definitions of "residuals" > from a generalized linear models, and none of them have the properties of > residuals from normal linear models. ?I have not found residuals very useful > myself in a genomic differential expression context. > > What sort of residuals did you want, and what were you planning to do with > them? Indeed I am not aware of this difference! I was thinking of the residual errors in a normal linear model. The problem in my analysis is that I only have 3 df (4 libs) to start with, so I cannot model the complete model individual+treatment+individual*treatment. So an idea was to model individual+treatment and afterwards do a model on the interaction only on the remaining residual errors as new observed vector. Is something like this at all possible or am I completely wrong here? > 2) You must always fit the model with all relevant factors: > individual*treatment. ?You cannot do meaningful inference from a reduced > model until you have determined that the other factors are not important. > Hence you have to deal with the interaction first. It makes sense to have to put in all relevent factors to the model. However, concerning this I don't understand your comment - I have to deal with the interaction first, wouldn't the full model be: individual+treatment+individual*treatment (which I can't do because of a lack of df) Cosidering my samples & factors: >> I have a fully crossed design with 2 factors and 2 factor levels each: >> individual <- as.factor(c("indA","indA","indB","indB")) >> treatment <- as.factor(c("treat1","treat2","treat1","treat2")) how would the cmd for the design matrix look like: design <- model.matrix(~??) and with which cmd would I fit the model and test for it: fit <- glmFit(d.GLM,design) lrt <- glmLRT(d.GLM,fit, coeff=?) Thanks a lot, Susanne > > Best wishes > Godon > >> Date: Tue, 14 Feb 2012 17:44:31 +0100 >> From: Susanne Franssen <s.franssen at="" uni-muenster.de=""> >> To: bioconductor at r-project.org >> Subject: [BioC] edgeR: GLM & residuals and model fitting & hypothesis >> ? ? ? ?testing >> >> Dear all, >> >> >> 1) GLM & residuals: >> >> I have a question concerning the use of GLMs in edgeR and the analysis >> of the residuals after model fitting. >> >> I have followed all the steps until model fitting, e.g.: >> glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion) >> >> The results I obtain from the fitting are the following catgories: >>> >>> names(glmfit.D) >> >> [1] "coefficients" ?"fitted.values" "fail" ? ? ? ? ?"not.converged" >> [5] "deviance" ? ? ?"df.residual" ? "abundance" ? ? "design" >> [9] "offset" ? ? ? ?"dispersion" ? ?"method" ? ? ? ?"counts" >> [13] "samples" >> >> What would be the best way to obtain the residuals for the "genewise" >> GLMs? >> >> >> >> 2) model fitting & hypothesis testing: >> >> I have a fully crossed design with 2 factors and 2 factor levels each: >> individual <- as.factor(c("indA","indA","indB","indB")) >> treatment <- as.factor(c("treat1","treat2","treat1","treat2")) >> >> in general I would be interested in 3 different aspects: >> a) effect of individual >> b) effect of treatment >> c) interaction between individual and treatment >> >> What would be the best way to test for those effects, would I rather >> test for all three aspects individually, i.e.: >> a) design <- model.matrix(~individual) >> b) design <- model.matrix(~treatment) >> c) design <- model.matrix(~individual*treatment) >> >> or doesn't it also make sense to model >> design <- model.matrix(~individual+treatment) >> and test for >> a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2) >> b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3) >> ... this way the effect of both factors could be accounted for in the >> model?! >> >> >> Thanks a lot! >> Susanne >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}
ADD REPLY
0
Entering edit mode
On Thu, 16 Feb 2012, Susanne Franssen wrote: > Dear Gordon, > > thanks, a lot for you answer, however I am a little more uncertain now > what to do. > >> 1) You may know that there are many different definitions of "residuals" >> from a generalized linear models, and none of them have the properties of >> residuals from normal linear models. ?I have not found residuals very useful >> myself in a genomic differential expression context. >> >> What sort of residuals did you want, and what were you planning to do with >> them? > > Indeed I am not aware of this difference! > I was thinking of the residual errors in a normal linear model. The > problem in my analysis is that I only have 3 df (4 libs) to start > with, so I cannot model the complete model > individual+treatment+individual*treatment. So an idea was to model > individual+treatment and afterwards do a model on the interaction only > on the remaining residual errors as new observed vector. > Is something like this at all possible or am I completely wrong here? No, it is not possible. Please read the section in the edgeR User's Guide "What to do if you have no replicates". >> 2) You must always fit the model with all relevant factors: >> individual*treatment. ?You cannot do meaningful inference from a reduced >> model until you have determined that the other factors are not important. >> Hence you have to deal with the interaction first. > > It makes sense to have to put in all relevent factors to the model. > However, concerning this I don't understand your comment - I have to > deal with the interaction first, wouldn't the full model be: > individual+treatment+individual*treatment (which I can't do because of > a lack of df) In R, the interaction is represented by individual:treatment. The notation individual*treatment is a shorthand for interaction plus main effects, i.e., individual*treatment = individual+treatment+individual:treatment. Please read the "Introduction to R" that comes with your R installation, in particular Section 11 on statistical models. Best wishes Gordon > Cosidering my samples & factors: >>> I have a fully crossed design with 2 factors and 2 factor levels each: >>> individual <- as.factor(c("indA","indA","indB","indB")) >>> treatment <- as.factor(c("treat1","treat2","treat1","treat2")) > > how would the cmd for the design matrix look like: > design <- model.matrix(~??) > and with which cmd would I fit the model and test for it: > fit <- glmFit(d.GLM,design) > lrt <- glmLRT(d.GLM,fit, coeff=?) > > Thanks a lot, > Susanne > > > >> >> Best wishes >> Godon >> >>> Date: Tue, 14 Feb 2012 17:44:31 +0100 >>> From: Susanne Franssen <s.franssen at="" uni-muenster.de=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] edgeR: GLM & residuals and model fitting & hypothesis >>> ? ? ? ?testing >>> >>> Dear all, >>> >>> >>> 1) GLM & residuals: >>> >>> I have a question concerning the use of GLMs in edgeR and the analysis >>> of the residuals after model fitting. >>> >>> I have followed all the steps until model fitting, e.g.: >>> glmfit.D <- glmFit(D, design, dispersion = D$tagwise.dispersion) >>> >>> The results I obtain from the fitting are the following catgories: >>>> >>>> names(glmfit.D) >>> >>> [1] "coefficients" ?"fitted.values" "fail" ? ? ? ? ?"not.converged" >>> [5] "deviance" ? ? ?"df.residual" ? "abundance" ? ? "design" >>> [9] "offset" ? ? ? ?"dispersion" ? ?"method" ? ? ? ?"counts" >>> [13] "samples" >>> >>> What would be the best way to obtain the residuals for the "genewise" >>> GLMs? >>> >>> >>> >>> 2) model fitting & hypothesis testing: >>> >>> I have a fully crossed design with 2 factors and 2 factor levels each: >>> individual <- as.factor(c("indA","indA","indB","indB")) >>> treatment <- as.factor(c("treat1","treat2","treat1","treat2")) >>> >>> in general I would be interested in 3 different aspects: >>> a) effect of individual >>> b) effect of treatment >>> c) interaction between individual and treatment >>> >>> What would be the best way to test for those effects, would I rather >>> test for all three aspects individually, i.e.: >>> a) design <- model.matrix(~individual) >>> b) design <- model.matrix(~treatment) >>> c) design <- model.matrix(~individual*treatment) >>> >>> or doesn't it also make sense to model >>> design <- model.matrix(~individual+treatment) >>> and test for >>> a) lrt.cd_ind <- glmLRT(D, glmfit.D, coef=2) >>> b) lrt.cd_treat <- glmLRT(D, glmfit.D, coef=3) >>> ... this way the effect of both factors could be accounted for in the >>> model?! >>> >>> >>> Thanks a lot! >>> Susanne ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD REPLY

Login before adding your answer.

Traffic: 892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6