Question

edgeR:fitted and partial expression values

0

Entering edit mode

alakatos ▴ 130

@alakatos-6983

Last seen 4.8 years ago

United States

Hello Everyone,

I have a very noise RNAseq dataset with several covariates analyzed in edgeR.

design <- model.matrix(~0 + conditions + cov1 + cov2, data=pheno)
d <- calcNormFactors(d, method ="TMM")
d <- estimateDisp(d, design, robust=TRUE )
fit <- glmFit(d, design, dispersion=d$trended.dispersion, robust=TRUE)
my.contrasts <- makeContrasts(....)

Issue 1.

I was wondering if it is correct to extract the fitted values for visualization purposes to get rid of the noise.

norm <-cpm(fit$fitted.values, normalized.lib.sizes = TRUE, log = TRUE, prior.count = 1)

Issue 2.

Is there any way to obtain only partial expression values related to conditions (proportion of explained variance) for downstream network analysis?

Thank you for your help.

Anita

edgeR cpm edger fitted model explained variance • 1.3k views

ADD COMMENT • link updated 6.5 years ago by Gordon Smyth 51k • written 6.5 years ago by alakatos ▴ 130

score 0 · Answer 1 · 2018-01-24

Regarding visualization; it seems you want to use the fitted values instead of the original counts for each sample. At best, this is not necessary, as you could just visualize the GLM coefficients directly to examine the effects of interest (which you would be doing anyway, by comparing fitted values from different conditions in a plot). At worst, replacing the counts with fitted values would be actively misleading, as people looking at your plot would initially think that the fitted values were your original counts. You're not showing the sampling variance inherent to the counts, which would be necessary for a faithful visual representation of the data.

Regarding "partial expression values"; I don't know what these are. Are you saying you want to regress out particular covariates and use the corrected observations for downstream analyses? I would suggest applying removeBatchEffect on the log-CPMs. It is also possible to do this on the original counts via quantile-quantile mapping, but it's a lot of effort; see A: Is Limma's removeBatchEffect() and log2() commutative?.

score 0 · Answer 2 · 2018-01-25

I am a great believer in doing simple, direct analyses when they do the job. I am not convinced here that you really need anything another than the log-fold changes between the conditions, i.e., what is stored in the estimated coefficients. The logFCs estimate the differences between the conditions, adjusted for the covariates, and that is surely what you need for any downstream network analysis.

Similarly, why would you not be visualizing the log-fold changes?

You definitely shouldn't be running cpm() on the fitted values, and there's no need to do that anyway.

Finally, there are some curious aspects to your code. You have set robust=TRUE for glmFit(), but there is no such argument for that function. Have a look at ?glmFit. You've also restricted to trended dispersion for glmFit, even though we advise you against that. Why did you do that?