Question

Normalized counts after fitting linear model with batch effects - edgeR

0

Entering edit mode

Yahan • 0

@yahan-14837

Last seen 6 months ago

United States

Hi,

I'm dealing with a set of sequencing data with batch effects. The samples were sequenced at two different times (4 control + 4 treatment1 at the first time, and 1 control + 4 treatment2 at the second time). The batch effects are very obvious when I looked at the PCA plots of raw data. I used RUVSeq and edgeR and fitted the linear model with batch effects included in the design. The results are OK, but I cannot find a way to look at the counts without batch effects. Even the counts in fit$fitted.values are still with batch effects.

So, I'm wondering is it possible to get the counts without batch effects after linear model fitting? I need these counts for making heatmap. I found someone said this in an old post - getting a matrix of batch corrected counts is not possible. If the counts do not exist, how are logFC and logCPM calculated (they are batch effects free in my results)?

Or this can be done by other packages like DESeq?

Or I should use removeBatchEffect function just for making heatmap?

Thank you!

Yahan

edger batch effect normalized counts • 1.5k views

ADD COMMENT • link updated 6.2 years ago by Ryan C. Thompson ★ 7.9k • written 6.2 years ago by Yahan • 0

0

Entering edit mode

This is not an answer to your question, but your design is almost completely confounded, since the second batch only has a single control sample. This means that the entire batch correction hinges on that single sample, and any noise in that sample will be interpreted as a batch effect to be subtracted out of all other samples.

ADD REPLY • link 6.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

True. I noticed this huge batch effects problem after I got the data. I'm also concerned about the design.

ADD REPLY • link 6.2 years ago Yahan • 0

score 1 · Answer 1 · 2018-01-22

Getting a matrix of batch-corrected counts is not really possible, because the resulting matrix would not represent actual counts. However, getting a matrix of batch-corrected logCPM values is certainly possible, using removeBatchEffect. Note that this kind of batch subtraction will likely change the row means.

The logFC value in the results is the value of the coefficient or contrast that was tested, while the logCPM value in the results is simply the average logCPM of all samples, which is not affected by batch effects.

When building a heatmap using batch-corrected values, always be wary of confirmation bias. You can almost always get a good-looking heatmap if you subtract out all the variation that doesn't fit your model and plot the remaining variation, regardless of whether there is genuine differential expression.