Question: Normalized counts after fitting linear model with batch effects - edgeR
gravatar for Yahan
9 months ago by
Yahan0 wrote:


I'm dealing with a set of sequencing data with batch effects. The samples were sequenced at two different times (4 control + 4 treatment1 at the first time, and 1 control + 4 treatment2 at the second time). The batch effects are very obvious when I looked at the PCA plots of raw data. I used RUVSeq and edgeR and fitted the linear model with batch effects included in the design. The results are OK, but I cannot find a way to look at the counts without batch effects. Even the counts in fit$fitted.values are still with batch effects.

So, I'm wondering is it possible to get the counts without batch effects after linear model fitting? I need these counts for making heatmap. I found someone said this in an old post - getting a matrix of batch corrected counts is not possible. If the counts do not exist, how are logFC and logCPM calculated (they are batch effects free in my results)?

Or this can be done by other packages like DESeq?

Or I should use removeBatchEffect function just for making heatmap?

Thank you!


ADD COMMENTlink modified 9 months ago by Ryan C. Thompson6.9k • written 9 months ago by Yahan0

This is not an answer to your question, but your design is almost completely confounded, since the second batch only has a single control sample. This means that the entire batch correction hinges on that single sample, and any noise in that sample will be interpreted as a batch effect to be subtracted out of all other samples.

ADD REPLYlink written 9 months ago by Ryan C. Thompson6.9k

True. I noticed this huge batch effects problem after I got the data. I'm also concerned about the design.

ADD REPLYlink written 9 months ago by Yahan0
gravatar for Ryan C. Thompson
9 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.9k wrote:

Getting a matrix of batch-corrected counts is not really possible, because the resulting matrix would not represent actual counts. However, getting a matrix of batch-corrected logCPM values is certainly possible, using removeBatchEffect. Note that this kind of batch subtraction will likely change the row means.

The logFC value in the results is the value of the coefficient or contrast that was tested, while the logCPM value in the results is simply the average logCPM of all samples, which is not affected by batch effects.

When building a heatmap using batch-corrected values, always be wary of confirmation bias. You can almost always get a good-looking heatmap if you subtract out all the variation that doesn't fit your model and plot the remaining variation, regardless of whether there is genuine differential expression.

ADD COMMENTlink written 9 months ago by Ryan C. Thompson6.9k

Thank you Ryan! I've seen the huge change on library size and cpm numbers made by removeBatchEffect. I made a mistake and you are right on the logCPM.

ADD REPLYlink written 9 months ago by Yahan0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 242 users visited in the last hour