I have scRNA-seq data generated by Smart-seq2 which includes one plate of wild-type cells and a separate plate of knock-out cells. I would like to combine the data from these two plates in order to investigate any effects caused by the experimental intervention. I understand that plate and genotype are confounded in this design, but can fastMNN still be used to integrate such data? My approach would be to analyse the plates separately then merge them with fastMNN and re-do the dimensionality reduction and clustering so I can perform some comparative analyses. Are there any caveats or limitations to this approach that I should keep in mind?
I fail to understand why this is OK in scRNA-seq. In bulk RNA-seq this would be a big no-no. If the plates/batches are confounded with treatment, you would almost throw away the data. Why is it still OK in scRNA-seq? I must say that I see this "confounded design" frequently in scRNA-seq where control and treated are done on different plates/batches/days.
Aaron which chapter of the book do you refer to?
The updated link is https://bioconductor.org/books/release/OSCA.multisample/differential-abundance.html#sacrificing-differences.
Thanks for the link. Surely MNN or any scRNA integration would "work" but I still don't understand why it is OK to proceed with downstream analysis. Any ComBat or limma batch correction on the data would remove the treatment effect, so in this confounded case we cannot do any batch correction for downstream analysis. In fact, in your book, but also most other scRNAseq integration methods do not advice to use the corrected expression matrix.
But suppose in this case we use the uncorrected data, it's confounded, so how can we "trust" the results? Are we sure the DE are due to biological rather than technical effects? Or is the technical batch effect in scRNAseq much smaller than in bulk RNAseq?
Or is it perhaps is confounding plates/batches less relevant for DA (diffential abundance), compared to DE (differential expression)?