Question

edgeR GLM fitted values

0

Entering edit mode

Mattia ▴ 10

@mattia-9769

Last seen 4.8 years ago

Milano

Hi,

I'm wondering if it could be possible to use "fitted values" from edgeR GLM model as normalized pseudo counts to be used as expression data matrix for further classification purpose. I implemented GLM because I needed to correct data for several covariates.

Thanks,

Mattia.

edgeR GLM classification rnaseq • 2.2k views

ADD COMMENT • link updated 8.2 years ago by Aaron Lun ★ 28k • written 8.2 years ago by Mattia ▴ 10

score 1 · Answer 1 · 2016-02-22

The fitted values account for library sizes, but they are not "normalized" with respect to library size. Samples with larger library sizes will have larger fitted values, which is the opposite of what one would normally expect from normalization. Similarly, the effect of any nuisance covariates in the GLM will still be included in the fitted values.

In addition, the fitted values won't contain observation-specific errors. This means that they're proportional to the mean of counts rather than the counts themselves. This may not be desirable for downstream applications - groups with more samples will have more precise estimates of the mean, whereas groups with fewer samples will have more variable fitted values. Treating the fitted values as "counts" with similar mean-variance relationships would be inappropriate.

If you want corrected and normalized expression values for each sample, I would suggest using the cpm method. You can then use removeBatchEffect to get rid of any batch effects or problematic covariates. If you want corrected and normalized (log-)expression values for each group, I would use the GLM coefficients, though their interpretation would depend on how you've parametrized your model.