I would like to compute the variance explained (i.e. coefficient of determination, R2) by a model in edgeR. Concretely, I am modelling various gene expression phenotypes using glmFit, and determining the significants of a few predictors using glmLRT.

Could you please indicate how to compute R2 from the output of these models?

Many thanks in advance!

Many thanks, that seems to work!

However, the estimates tend to be surprisingly high, for many genes R2 being close to 1 (quartiles being 0%: 1.879e-13; 25%: 0.0152; 50%: 0.1009; 75%: 0.5773; 100%: 0.9997). Is this distribution as expected?

I would also like to ask a follow-up question regarding this other post: Variance explained (coefficient of determination) in glmFit / glmLRT

I am actually fitting a model with three predictors and trying to

compute the proportion of deviance explained by each of the predictors. I understand that this relies on making a choice on the order in which the predictors are included in the model (which others we are correcting for). A conservative and consistent way of computing`R2_pred1`

might be by subtracting R2 computed considering only`pred2`

and`pred3`

from that of the full model (considering all three predictors). Something like this:Would this be correct?

Many thanks again!

The R2 values look completely normal. The median R2 is 10%, which seems somewhat low rather than high. With so many genes, you will naturally get some R2 over the whole range from 0 to 1, just by chance variation, which is what you see.

Even if none of the genes are differentially expressed and the data was just random, you would still expect to get R2 values around 3 / (nsamples - 1) on average.

Regarding the predictor specific R2, I don't know what you're trying to do. Your

`R2_nox1`

is the proportion of the deviance that x1 contributes over and above nox2 and nox3, but I don't know why you are computing`R2_x1`

etc. There is no right or wrong here. You're just computing descriptive statistics.Many thanks for your answer.

To clarify the last point,

`R2_nox1`

is the proportion of the deviance explained when considering`design.nox1 = model.matrix(~x2+x3)`

, i.e. without`x1`

. So to compute the deviance contributed by`x1`

I am subtracting`R2_nox1`

from the deviance explained by when considering all the predictors (`R_full`

):`R2_x1 = R2_full - R2_nox1`

.The idea is to compute the deviance contributed only by

`x1`

, by removing the contributions of`x2`

and`x3`

from the total explained variance. Is this one possible way of computing the deviance contributed by one variable when accounting for that contributed by the rest? Otherwise, could you please propose an alternative?Thanks again!

No, you have it wrong way around.

If you want deviance explained by x1, then you need

You do not need to subtract one R2 from another.

Both strategies actually lead to the same results. Resolved, thanks!