I'm running DESeq2 to identify genes that are differentially expressed between two general phenotypes: fruits and vegetables. Here's my design:
general specific batch =============================================== 1 fruit apple apple 2 fruit apple apple 3 fruit pear pear 4 fruit pear pear 5 veggie cucumber NA 6 veggie cucumber NA 7 veggie cucumber NA 8 veggie cucumber NA 9 veggie cucumber NA
I'm a bit worried about the batch effect nested within the fruit phenotype - is there a way, or should I even bother, to try to correct for the differences between apples and pears? If I correct for "specific", I might accidentally take out the "general" effect. But if I replace cucumber level with NA, as in "batch", DESeq2 won't let me add "batch" to the model as a covariate. I also considered taking out the apple/pear batch separately from fruits only (with limma's RemoveBatchEffect for example), and then compare to veggies. But that almost sounds like I would be artificially amplifying the differences I want to see between fruits and veggies. Any other ideas?