I have questions about how many treatments should be fitted to GLM. Say, I have A, B, C, D 4 treatments, and 1 control, each treatment (including control) has 3 replicates. All treatments and control are the same species. I want to compare A against B and A against control. I am hesitating between two options: first is fit ONLY A, B and control to generlised linear model (GLM) and compare them; second is fit all A, B, C, D and control to GLM, compare any combination of treatment/control pairs, and extract comparisons of A-B and A-control.

Are both reasonable? Which is better? What difference can be expected between the two options?


Both approaches are reasonable.

If all treatments A-D were part of the same experiment, done on the same type of cells at the same time, then fitting a model to all the data at once is usually better. This is because it provides more samples from which to estimate variability for each gene.

If however C and D are very different from A and B for some reason, for example being profiles of a different cell type, then just analyzing A and B alone would usually be preferable. In this case, the variability of replicate samples for treatments C and D might not be representative of what we would expect for A and B.

Good answer. Thanks a lot!


