I have a couple of questions about using ComBat from the sva package for batch effect normalisations of synthetic genetic array (SGA) data.
Briefly, for anyone who's not familiar, SGAs are a high-throughput method for detecting genetic interactions in yeast. A deletion library is used, which contains thousands of strains, with each strain having a different non-essential gene deleted. By plating the library on agar plates, we can measure the colony size of each mutant, and get fitness estimates for each mutant in the library. By crossing a strain with a mutation of interest against the entire library, we can make a library of double mutants, and again get fitness estimates for each double mutant. We can then compare the two libraries to identify double mutants whose fitness is not what we expected (based on the fitness of the original library mutant) and pick out genetic interactions.
My first question: does anyone have any comments about the suitability of ComBat to normalise batch effects in SGA data? The colony sizes are approximately normally distributed, with a median colony size of 1. A library typically contains at least 3000 different mutants.
My second question: as I understand it, ComBat is only really suitable for instances where each batch contains at least 10 samples. Now this is a bit of an issue for the SGAs, since they are much more labour intensive than microarrays, and tend to be done in much smaller batches (i.e. 5 samples would be a typical batch size for a lot of labs). However, each library mutant is plated in quadruplicate, meaning that we would actually have 20 individual colony size measurements for each library mutant per batch (I guess this is analagous to having duplicate probes for microarrays). So what I want to do is split SGAs by each replicate, giving four times as many samples in each batch and making the data a lot more suitable for batch effect normalisation. So my question is, are there any reasons (statistical, technical etc.) why this would be inappropriate to do? I can't think of any reason why not, since all 20 replicates are part of the same batch, and so should all share the same non-biological signature. But I'm not an expert in this, so I would appreciate comments and thoughts.