I am a bioinformatics novice and trying to get a handle on both conceptual and technical issues with batch effects in RNAseq datasets and analysis. The context is this:
I've done differential expression analyses in an experiment comparing responses to a drug treatment (trt, ctrl) in two populations (A, B) using both edgeR and DESeq2. Sequences for all four treatment*population group combinations were generated simultaneously in a single run on the same machine. I now want to generate sequence data from a third population (C) and compare this new population's response in the same fashion to A and B.
I understand that an effect of batch would arise with population C because its sequences were generated after A and B on a different run, and thus that batch effect would need to be included in linear models for edgeR and DESeq2. But I'm having a difficult time figuring out how batch effects could be distinguished from population effects since it (C) is the only population sequenced on the second run. Short of resequencing samples from all three populations and treatment combinations, is there a robust approach to differentiating population effects from batch effects in this context?
edited for grammar