#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Normalized counts after fitting linear model with batch effects - edgeR
0
13 months ago by
Yahan0
Yahan0 wrote:

Hi,

I'm dealing with a set of sequencing data with batch effects. The samples were sequenced at two different times (4 control + 4 treatment1 at the first time, and 1 control + 4 treatment2 at the second time). The batch effects are very obvious when I looked at the PCA plots of raw data. I used RUVSeq and edgeR and fitted the linear model with batch effects included in the design. The results are OK, but I cannot find a way to look at the counts without batch effects. Even the counts in fit\$fitted.values are still with batch effects.

So, I'm wondering is it possible to get the counts without batch effects after linear model fitting? I need these counts for making heatmap. I found someone said this in an old post - getting a matrix of batch corrected counts is not possible. If the counts do not exist, how are logFC and logCPM calculated (they are batch effects free in my results)?

Or this can be done by other packages like DESeq?

Or I should use removeBatchEffect function just for making heatmap?

Thank you!

Yahan

modified 13 months ago by Ryan C. Thompson7.2k • written 13 months ago by Yahan0

This is not an answer to your question, but your design is almost completely confounded, since the second batch only has a single control sample. This means that the entire batch correction hinges on that single sample, and any noise in that sample will be interpreted as a batch effect to be subtracted out of all other samples.

True. I noticed this huge batch effects problem after I got the data. I'm also concerned about the design.

Answer: Normalized counts after fitting linear model with batch effects - edgeR
1
13 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.2k wrote:

Getting a matrix of batch-corrected counts is not really possible, because the resulting matrix would not represent actual counts. However, getting a matrix of batch-corrected logCPM values is certainly possible, using removeBatchEffect. Note that this kind of batch subtraction will likely change the row means.

The logFC value in the results is the value of the coefficient or contrast that was tested, while the logCPM value in the results is simply the average logCPM of all samples, which is not affected by batch effects.

When building a heatmap using batch-corrected values, always be wary of confirmation bias. You can almost always get a good-looking heatmap if you subtract out all the variation that doesn't fit your model and plot the remaining variation, regardless of whether there is genuine differential expression.