After preforming an RNAseq analysis with 35 samples (10 controls, 25 cases), we've sequenced an additional 6 samples of a separate nature and would like to combine all 41 samples in a larger analysis. The design of the experiment is basically this:
Older data - 4 separate conditions (control, negative, intermediate, positive), 5 batches
Newer data - 1 completely different condition (diffuse), 1 batch
We know there are batch effects in the older data and would like to correct for those batch effects but are unsure of the best way to do so within the combination of all the data due to the confounded condition-batch of the newer data. The approach I've tried takes the old data, utilizes removeBatchEffects from the limma package, forces any negatives from the batch effect removal to zero, combines the old/new data, and then executes voom with just condition in the model. This seems to yield the desired results. However, the comparison of the conditions to the controls within the older data differ greatly compared to previous analysis (simply including batch in the model). Unfortunately, we can't include it in the model here because the new data's condition and batch are confounded. Would it be possible to model the older data through voom with just the batch factor, combine the old/new data, then model the complete set with the condition factor? Hoping for suggestions on the best approach to remove the batch effects of the older data while still maintaining the power to compare the newer data condition to the older data conditions.
Thanks in advance!