Question

EdgeR GLM, when factor categories are not equaly represented in different groups

0

Entering edit mode

assaf www ▴ 140

@assaf-www-6709

Last seen 5.9 years ago

Hi EdgeR developers and all,

I'm testing RNA-Seq data with the following factors:

treatmentA , with the categories 0,1
treatmentB , with the categories 0,1,2
batch , with the categories 0,1,2,3

the goal is to detect genes in which the effect of treatmentA, and (separately) the effect of treatmentB, is significant. The batch effects are not of interest.

As you can see in the below MDS plot, different groups include different number of individuals, and different representation of categories. For example: in the batch_0 group, treatmentB can only get the categories 0 and 1, while in batch_1 group, treatmentB can get all the categories 0,1,2. And so on ... Yet we wish of course to find the global effect of each treatment.

I tried the below design with GLM (currently with no interactions, since not sure its relevant with this design):

design = model.matrix(~0 + treatmentA_ + batch_ + treatmentB_, data=z$samples)

I would be happy to have you advice whether this is a correct design in this case (given that the design is not balanced, etc ...).

thanks a lot
Assaf

edger • 1.0k views

ADD COMMENT • link updated 8.8 years ago by Gordon Smyth 52k • written 8.8 years ago by assaf www ▴ 140

score 1 · Accepted Answer · 2016-09-26

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 53 minutes ago

WEHI, Melbourne, Australia

It will work ok. The very strong effects for treatmeantA and for the batches are roughly orthogonal. You will pick up lots of DE for treatmentA but probably little or no DE for treatmentB.

ADD COMMENT • link 8.8 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon,
In fact EdgeR gave a relatively large group of genes in which the effect of treatmentB was significant, especially for the treatmentB_0 vs. treatmentB_2 groups.

I tried to visualized it by grouping the samples by Batch and TreatmentA (so, the same categories appear in each group), and then for each group plotted the CPM fold change differences of each sample form the mean CPM of its group, in a heatmap. From the heatmap it appears that, for these significant genes, treatmentB has a global effect, though not always a consistent effect in all samples. Yet, indeed , such effect is hardly seen in the MDS figure.

thanks a lot, and of course thanks Aaron for previous help

Assaf

ADD REPLY • link 8.8 years ago assaf www ▴ 140