How to perform batch correction on scRNA-seq with confounding plate effects?
1
0
Entering edit mode
jashmore • 0
@jashmore-23116
Last seen 3.4 years ago
United Kingdom

I have scRNA-seq data generated by Smart-seq2 which includes one plate of wild-type cells and a separate plate of knock-out cells. I would like to combine the data from these two plates in order to investigate any effects caused by the experimental intervention. I understand that plate and genotype are confounded in this design, but can fastMNN still be used to integrate such data? My approach would be to analyse the plates separately then merge them with fastMNN and re-do the dimensionality reduction and clustering so I can perform some comparative analyses. Are there any caveats or limitations to this approach that I should keep in mind?

batchelor • 1.4k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

I understand that plate and genotype are confounded in this design, but can fastMNN still be used to integrate such data?

Yes, that's fine, but read this.

My approach would be to analyse the plates separately then merge them with fastMNN and re-do the dimensionality reduction and clustering so I can perform some comparative analyses. Are there any caveats or limitations to this approach that I should keep in mind?

That is also fine, and in fact a comparison of the separate and combined analyses is the basis of one of our suggested diagnostics here.

ADD COMMENT
0
Entering edit mode

I fail to understand why this is OK in scRNA-seq. In bulk RNA-seq this would be a big no-no. If the plates/batches are confounded with treatment, you would almost throw away the data. Why is it still OK in scRNA-seq? I must say that I see this "confounded design" frequently in scRNA-seq where control and treated are done on different plates/batches/days.

Aaron which chapter of the book do you refer to?

ADD REPLY
0
Entering edit mode

Thanks for the link. Surely MNN or any scRNA integration would "work" but I still don't understand why it is OK to proceed with downstream analysis. Any ComBat or limma batch correction on the data would remove the treatment effect, so in this confounded case we cannot do any batch correction for downstream analysis. In fact, in your book, but also most other scRNAseq integration methods do not advice to use the corrected expression matrix.

But suppose in this case we use the uncorrected data, it's confounded, so how can we "trust" the results? Are we sure the DE are due to biological rather than technical effects? Or is the technical batch effect in scRNAseq much smaller than in bulk RNAseq?

Or is it perhaps is confounding plates/batches less relevant for DA (diffential abundance), compared to DE (differential expression)?

ADD REPLY

Login before adding your answer.

Traffic: 566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6