edgeR GLM fitted values
Entering edit mode
Mattia ▴ 10
Last seen 4.2 years ago


I'm wondering if it could be possible to use "fitted values" from edgeR GLM model as normalized pseudo counts to be used as expression data matrix for further classification purpose. I implemented GLM because I needed to correct data for several covariates.



edgeR GLM classification rnaseq • 2.0k views
Entering edit mode
Aaron Lun ★ 28k
Last seen 1 hour ago
The city by the bay

The fitted values account for library sizes, but they are not "normalized" with respect to library size. Samples with larger library sizes will have larger fitted values, which is the opposite of what one would normally expect from normalization. Similarly, the effect of any nuisance covariates in the GLM will still be included in the fitted values.

In addition, the fitted values won't contain observation-specific errors. This means that they're proportional to the mean of counts rather than the counts themselves. This may not be desirable for downstream applications - groups with more samples will have more precise estimates of the mean, whereas groups with fewer samples will have more variable fitted values. Treating the fitted values as "counts" with similar mean-variance relationships would be inappropriate.

If you want corrected and normalized expression values for each sample, I would suggest using the cpm method. You can then use removeBatchEffect to get rid of any batch effects or problematic covariates. If you want corrected and normalized (log-)expression values for each group, I would use the GLM coefficients, though their interpretation would depend on how you've parametrized your model.


Login before adding your answer.

Traffic: 352 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6