Hi EdgeR developers and all,
I'm testing RNA-Seq data with the following factors:
treatmentA , with the categories 0,1
treatmentB , with the categories 0,1,2
batch , with the categories 0,1,2,3
the goal is to detect genes in which the effect of treatmentA, and (separately) the effect of treatmentB, is significant. The batch effects are not of interest.
As you can see in the below MDS plot, different groups include different number of individuals, and different representation of categories. For example: in the batch_0 group, treatmentB can only get the categories 0 and 1, while in batch_1 group, treatmentB can get all the categories 0,1,2. And so on ... Yet we wish of course to find the global effect of each treatment.
I tried the below design with GLM (currently with no interactions, since not sure its relevant with this design):
design = model.matrix(~0 + treatmentA_ + batch_ + treatmentB_, data=z$samples)
I would be happy to have you advice whether this is a correct design in this case (given that the design is not balanced, etc ...).
thanks a lot
Assaf
Hi Gordon,
In fact EdgeR gave a relatively large group of genes in which the effect of treatmentB was significant, especially for the treatmentB_0 vs. treatmentB_2 groups.
I tried to visualized it by grouping the samples by Batch and TreatmentA (so, the same categories appear in each group), and then for each group plotted the CPM fold change differences of each sample form the mean CPM of its group, in a heatmap. From the heatmap it appears that, for these significant genes, treatmentB has a global effect, though not always a consistent effect in all samples. Yet, indeed , such effect is hardly seen in the MDS figure.
thanks a lot, and of course thanks Aaron for previous help
Assaf