I have a quick question concerning the removal of the batch effect using removeBatchEffect()…
I have RNA-seq data and i have three batches. I normalize using spike in RNA and i estimate size factors (following the Deseq methodology). At the end i get a matrix which i log2 transform. When i do PCA using this data matrix i have a profound batch effect.
I used the removeBatchEffect function on this log2 matrix. I get a correction for my batch effect which is great. When i construct samples correlation (pearson) heatmap though, i get a total different value scale compared to my log2 data. Is there a simple explanation on that? The correlation looks similar in both cases but totally different scales! Is there any way that the function changes the values in such degree?
I would appreciate your help!
Ellis
Could you please explain in a bit more detail what you mean when you say the following?
I normalize using spike in RNA and i estimate size factors (following the Deseq methodology)
Can you describe in a bit more detail (perhaps with code) how you are doing that, exactly?
Yes you are right...I follow the Deseq2 protocol. This is the code:
I get the matrix above and then i do a PCA. In my PCA i find a batch effect. I costruct a correlation heatmap and i see the clustering of my samples.
Continuing, i run the
removeBatchEffect
function on the Log.countsMmus
matrix. I get a corrected batch effect and my PCA looks great. BUT, when i construct a correlation heatmap, the scale is totally different and the correlation values are really high. I believed that after the removal of the batch effect, the pearson correlation heatmap would look similar to my initial matrix.I hope that makes it clearer..Thanks!
Thanks for sharing the code. Next step: including figures would be helpful (as well as code to generate them), such as the PCA and heatmaps pre/post batch effect removal.
Anyway: if the PCA result on your data after you call
removeBatchEffect
is so striking, why should you expect the correlation heatmap to look similar to the original data (do you mean the original correlation heatmap)? What do you mean by the scales being completely different?Hey Steve,
Thanks for your reply..this is the original initial heatmap:
https://dl.dropboxusercontent.com/u/14753468/p_correlation_all_data_SE.pdf
And this is the heatmap after removal of the batch effect:
https://dl.dropboxusercontent.com/u/14753468/p_correlation_all_data_SE_corrected.pdf
What do you think? The colors representing the columns are the three batches. As you can see, in the second case the batches mix quite well.
Well, I guess the red batch gets broken up somewhat. The stranger thing is the differences in the color keys; I think you've forced symmetric colors in the second, and you haven't done so in the first. Otherwise there would be no reason that a correlation of zero is white in the second plot.
Other than that, I don't think there's any cause for concern. The absolute values of the correlations seem comparable between the plots, so the changes aren't ridiculous.