Correcting for Batch Effects Prior to Differential Gene Expression Analysis with limma
1
1
Entering edit mode
adscheid3 • 0
@adscheid3-12893
Last seen 6.8 years ago

Hello! I have a question about correcting for batch effects prior to differential gene expression analysis with limma. I've read that batch effect correction functions such as ComBat should not be used prior to differential expression analysis in limma, and that batch effects should be accounted for in linear modeling instead. However, in my case a batch effect and disease effect are one in the same, so if I account for the batch effect in the linear model the differential expression analysis will not include disease influences on differential gene expression. Therefore I'd like to re-run several healthy and disease samples, use those to calculate healthy and disease gene-wise normalization factors, and multiply out by those factors to eliminate the batch effect while maintaining disease effects. Is it acceptable to do the normalization using read per million data, back calculate to raw data using library sizes for each sample, and then do differential expression using limma? Thanks, all the best!

Adam      

limma batch effect rna-seq • 2.2k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 day ago
The city by the bay

To answer your specific question: no, it is not valid to perform normalization within conditions. TMM normalization is relative, so normalization factors computed for separate sets of counts are not comparable. In any case, normalization doesn't solve the problem of batch effects. Normalization only eliminates global scaling differences between libraries across all genes. You can't get rid of gene-wise differences between batches.

More generally, you probably can't analyze this data set, because the experimental design is fundamentally broken. When the batch is confounded with your conditions of interest, any difference in expression between conditions may or may not be due to the batch effect - it's mathematically impossible to distinguish between these two possibilities. (The exception is if you have multiple batches nested within each condition, in which case you could use duplicateCorrelation to account for the within-batch correlations.)

ADD COMMENT

Login before adding your answer.

Traffic: 837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6