Question: edgeR GLM fitted values
0
3.3 years ago by
Mattia10
Milano
Mattia10 wrote:

Hi,

I'm wondering if it could be possible to use "fitted values" from edgeR GLM model as normalized pseudo counts to be used as expression data matrix for further classification purpose. I implemented GLM because I needed to correct data for several covariates.

Thanks,

Mattia.

rnaseq classification edger glm • 1.1k views
modified 3.3 years ago by Aaron Lun23k • written 3.3 years ago by Mattia10
1
3.3 years ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

The fitted values account for library sizes, but they are not "normalized" with respect to library size. Samples with larger library sizes will have larger fitted values, which is the opposite of what one would normally expect from normalization. Similarly, the effect of any nuisance covariates in the GLM will still be included in the fitted values.

In addition, the fitted values won't contain observation-specific errors. This means that they're proportional to the mean of counts rather than the counts themselves. This may not be desirable for downstream applications - groups with more samples will have more precise estimates of the mean, whereas groups with fewer samples will have more variable fitted values. Treating the fitted values as "counts" with similar mean-variance relationships would be inappropriate.

If you want corrected and normalized expression values for each sample, I would suggest using the cpm method. You can then use removeBatchEffect to get rid of any batch effects or problematic covariates. If you want corrected and normalized (log-)expression values for each group, I would use the GLM coefficients, though their interpretation would depend on how you've parametrized your model.