Variance explained (coefficient of determination) in glmFit / glmLRT
Entering edit mode
Gemma • 0
Last seen 4 weeks ago

I am trying to run a model where I am predicting gene expression as a function of some fixed factors, using glmFit to fit the model and glmLRT to get the significance of each fixed effect. I would like to get the amount of variation explained by each fixed effect (i.e. coefficient of determination or similar). Is there a way to do that with an object of class "DGEGLM"?

More concretely, I would like to get the variance in expression for each gene (20 of them) explained by AG, PO and MG (the three fixed effects I am including in the model).

Many thanks in advance!

Here is a link to the data:

And here the code:

c_AxB_A_r1 = read.csv("83Fx332M_fh_R1_83.csv")
c_AxB_B_r1 = read.csv("83Fx332M_fh_R1_332.csv")
c_AxB_A_r2 = read.csv("83Fx332M_fh_R2_83.csv")
c_AxB_B_r2 = read.csv("83Fx332M_fh_R2_332.csv")

c_BxA_A_r1 = read.csv("332Fx83M_fh_R1_83.csv")
c_BxA_B_r1 = read.csv("332Fx83M_fh_R1_332.csv")
c_BxA_A_r2 = read.csv("332Fx83M_fh_R2_83.csv")
c_BxA_B_r2 = read.csv("332Fx83M_fh_R2_332.csv")


y <- DGEList(counts=cbind(c_AxB_A_r1$counts, c_AxB_A_r2$counts, c_AxB_B_r1$counts, c_AxB_B_r2$counts, 
                          c_BxA_A_r1$counts, c_BxA_A_r2$counts, c_BxA_B_r1$counts, c_BxA_B_r2$counts), genes=genenames)

isexpr <- rowSums(cpm(y)>1) >= 4
y <- y[isexpr, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)

AG <- factor(c(0, 0, 1, 1, 0, 0, 1, 1))
PO <- factor(c(0, 0, 1, 1, 1, 1, 0, 0))
MG <- factor(c(0, 0, 0, 0, 1, 1, 1, 1))

design <- model.matrix(~AG+PO+MG)
rownames(design) <- colnames(y)

y <- estimateDisp(y, design, robust=TRUE)

fit <- glmFit(y, design)

model_AG <- glmLRT(fit, coef=2)
results_AG <- model_AG$table
FDR <- p.adjust(results_AG$PValue, method="fdr")
results_AG <- cbind(results_AG, FDR, y$genes$genes)
colnames(results_AG)[6] <- "gene"

model_PO <- glmLRT(fit, coef=3)
results_PO <- model_PO$table
FDR <- p.adjust(results_PO$PValue, method="fdr")
results_PO <- cbind(results_PO, FDR, y$genes$genes)
colnames(results_PO)[6] <- "gene"

model_MG <- glmLRT(fit, coef=4)
results_MG <- model_MG$table
FDR <- p.adjust(results_MG$PValue, method="fdr")
results_MG <- cbind(results_MG, FDR, y$genes$genes)
colnames(results_MG)[6] <- "gene"
edgeR • 201 views
Entering edit mode
Last seen 1 hour ago
WEHI, Melbourne, Australia

The concept of "coefficient of determination" is only defined for overall regressions, not for individual predictor variables. Assigning how much variability is explained by each individual covariate is not generally possible for any linear model or generalized linear model because the covariates overlap in terms of the variability that they predict. This is not a limitation of edgeR, but a universal property of linear models.

edgeR can return any quantity that would exist for an ordinary univariate glm. For example, one can easily perform a sequential analysis of deviance for each gene, but the proportion of the total deviance attributable to each predictor depends on the order that they are added to the model, i.e., which predictor is adjusted for which.

Entering edit mode

Many thanks for your answer, that makes sense.

I imagine one could then compute a partial or semipartial measures of variance contributed by each fixed factor when considering the other predictors (or variance contributed irrespective of the other predictors - which I guess would be similar to just running the models with only one predictor at a time).

I for example found this package ( that can do such analysis on ordinary glms.

Could you please suggest a way of doing this using edgeR?

Many thanks again!


Login before adding your answer.

Traffic: 394 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6