2-Factor Experiment in edgeR, Effect of no reps in one set and two reps in others
I am analyzing differential gene expression from an RNA-Seq experiment in which two factors were applied. I have multiple replicates among samples when just Factor 1 is considered, and there is significant clustering among the groups. When Factor 2 is considered as well, there is one group that has no replicates of Factor 2 at a certain level of Factor 1, while the other levels of Factor 1 have 2 replicates for each level of Factor 2. I have good reason not to discard the group with no replicates, because the annotated set of differentially expressed genes strongly indicates the effect of Factor 2 is significant.

My question is, do I need to do anything special with respect to the dispersion factor, or will the factor derived from the replicated samples be applied to the group lacking replicates? Thanks!

Nancy

If your design is ~Factor1 + Factor2, then you have nothing to worry about, since all the coefficients will be estimated with some replication. If your design is ~Factor1 * Factor2 (or ~Group, where Group is all the unique combinations of Factor1 and Factor2), then you do indeed have a group with no replication. You can still fit the model and test for differential expression using this group, because the other groups have replication from which the dispersions can be estimated. The only caveats are the obvious ones. First, the group with no replication will not contribute anything to the dispersion estimation. If that group happens to have a higher dispersion than others, this would result in you underestimating the overall dispersions for the whole experiment, and thereby overestimating your significance. Second, the expression estimate for that group will be unreliable. In theory, the p-values should take this into account, but you can only ask so much of a statistical method, and with an N of 1, they might not be very reliable.

