The current dataset I am looking at has seven environments (control and three different concentrations of exposure A separately and exposure B separately) and each exposure/concentration combination has three replicates (one in each batch).
My goal is to compare each of the six experimental conditions with the control. Here my plan is to use one treatment factor with seven levels and one batch factor with three levels.
The complication, however, is that there are more than one control per batch. Batch 1 has one control, whereas batch 2 and 3 have two controls each. In those batches that has more than one control, these controls are _not_ technical replicates.
Will this be an issue for an edgeR-based differential expression analysis? It appears that "control" is unevenly represented in the different batches. Or can the kind of analysis I described above handle this? Are there any other ways can this be handled?
A quick follow-up. The reason for the fact that batch 1 only has one control is that one of them failed. To compensate, they made another control in batch 4, but that is the only sample in that batch. My instinct is this sample should be excluded because there is only one sample for batch 4, so there is no way to know what the effects of being in batch 4 is.
What do you think about this reasoning?
You are correct. The only way to include such information would be to use
duplicateCorrelation
withvoom
, but that has its own issues. To me, it doesn't seem worth the trouble just to include one extra sample.