Variance explained (coefficient of determination) in glmFit / glmLRT
1
0
Entering edit mode
Gemma • 0
@2509afca
Last seen 4 weeks ago
Austria

I am trying to run a model where I am predicting gene expression as a function of some fixed factors, using glmFit to fit the model and glmLRT to get the significance of each fixed effect. I would like to get the amount of variation explained by each fixed effect (i.e. coefficient of determination or similar). Is there a way to do that with an object of class "DGEGLM"?

More concretely, I would like to get the variance in expression for each gene (20 of them) explained by AG, PO and MG (the three fixed effects I am including in the model).

And here the code:



genenames=c_AxB_A_r1$gene y <- DGEList(counts=cbind(c_AxB_A_r1$counts, c_AxB_A_r2$counts, c_AxB_B_r1$counts, c_AxB_B_r2$counts, c_BxA_A_r1$counts, c_BxA_A_r2$counts, c_BxA_B_r1$counts, c_BxA_B_r2$counts), genes=genenames) isexpr <- rowSums(cpm(y)>1) >= 4 y <- y[isexpr, , keep.lib.sizes=FALSE] y <- calcNormFactors(y) AG <- factor(c(0, 0, 1, 1, 0, 0, 1, 1)) PO <- factor(c(0, 0, 1, 1, 1, 1, 0, 0)) MG <- factor(c(0, 0, 0, 0, 1, 1, 1, 1)) design <- model.matrix(~AG+PO+MG) rownames(design) <- colnames(y) y <- estimateDisp(y, design, robust=TRUE) y$common.dispersion

fit <- glmFit(y, design)

model_AG <- glmLRT(fit, coef=2)
results_AG <- model_AG$table FDR <- p.adjust(results_AG$PValue, method="fdr")
results_AG <- cbind(results_AG, FDR, y$genes$genes)
colnames(results_AG)[6] <- "gene"

model_PO <- glmLRT(fit, coef=3)
results_PO <- model_PO$table FDR <- p.adjust(results_PO$PValue, method="fdr")
results_PO <- cbind(results_PO, FDR, y$genes$genes)
colnames(results_PO)[6] <- "gene"

model_MG <- glmLRT(fit, coef=4)
results_MG <- model_MG$table FDR <- p.adjust(results_MG$PValue, method="fdr")
results_MG <- cbind(results_MG, FDR, y$genes$genes)
colnames(results_MG)[6] <- "gene"

edgeR • 201 views
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

The concept of "coefficient of determination" is only defined for overall regressions, not for individual predictor variables. Assigning how much variability is explained by each individual covariate is not generally possible for any linear model or generalized linear model because the covariates overlap in terms of the variability that they predict. This is not a limitation of edgeR, but a universal property of linear models.

edgeR can return any quantity that would exist for an ordinary univariate glm. For example, one can easily perform a sequential analysis of deviance for each gene, but the proportion of the total deviance attributable to each predictor depends on the order that they are added to the model, i.e., which predictor is adjusted for which.

0
Entering edit mode

I imagine one could then compute a partial or semipartial measures of variance contributed by each fixed factor when considering the other predictors (or variance contributed irrespective of the other predictors - which I guess would be similar to just running the models with only one predictor at a time).

I for example found this package (https://cran.r-project.org/web/packages/partR2/vignettes/Using_partR2.html) that can do such analysis on ordinary glms.

Could you please suggest a way of doing this using edgeR?

Many thanks again!