Question: Complex multifactorial design with random effects in Limma/voom
1
gravatar for joasmile84
3 months ago by
joasmile8410
joasmile8410 wrote:

Hi,

I have performed a RNA-seq experiment that has quite a complex design. I have 40 bee individuals belonging to 4 treatment groups (1 control group and 3 different treatments, each of which I would like to compare to the control). Each treatment has 10 biological replicates (10 individual bees). For each individual the tissue of interest was dissected in three regions, so I have 120 RNA-seq samples. The 40 individuals were collected from ten hives (for each hive I have 4 individuals, one per treatment group), and the samples were always processed in batches by hive (RNA, library prep, sequencing lane), so now I have just one batch effect of these ten batches/hives. Finally, through other methods I have realized that while performing the experiment the individuals were transitioning between two behavioural states (these two behavioural groups are not perfectly balanced across treatments and hives/batches but it is not too bad), and I would expect some differential expression between them (which does seem to show up in exploratory ordination plots). This is how my design matrix looks like

  Sample Sample_ID Treatment Region Hive Behaviour
1        1    H1B1_6        MD     OL   H1     nurse
2        2    H1B1_6        MD     MB   H1     nurse
3        3    H1B1_6        MD     AL   H1     nurse
4        4    H1B3_5        Bi     OL   H1     nurse
5        5    H1B3_5        Bi     MB   H1     nurse
6        6    H1B3_5        Bi     AL   H1     nurse
7        7    H1B5_3       CLA     OL   H1     nurse
8        8    H1B5_3       CLA     MB   H1     nurse
9        9    H1B5_3       CLA     AL   H1     nurse
10      10    H1B6_4       CLH     OL   H1     nurse
11      11    H1B6_4       CLH     MB   H1     nurse
12      12    H1B6_4       CLH     AL   H1     nurse
13      13    H2B1_4        Bi     OL   H2     nurse
14      14    H2B1_4        Bi     MB   H2     nurse
15      15    H2B1_4        Bi     AL   H2     nurse
16      16    H2B3_3       CLH     OL   H2     nurse
17      17    H2B3_3       CLH     MB   H2     nurse
18      18    H2B3_3       CLH     AL   H2     nurse
19      19    H2B4_3       CLA     OL   H2     nurse
20      20    H2B4_3       CLA     MB   H2     nurse
21      21    H2B4_3       CLA     AL   H2     nurse
22      22    H2B5_1        MD     OL   H2   forager
23      23    H2B5_1        MD     MB   H2   forager
24      24    H2B5_1        MD     AL   H2   forager
25      25    H3B1_3       CLH     OL   H3     nurse
26      26    H3B1_3       CLH     MB   H3     nurse
27      27    H3B1_3       CLH     AL   H3     nurse
28      28    H3B3_4        Bi     OL   H3   forager
29      29    H3B3_4        Bi     MB   H3   forager
30      30    H3B3_4        Bi     AL   H3   forager
31      31    H3B5_2       CLA     OL   H3     nurse
32      32    H3B5_2       CLA     MB   H3     nurse
33      33    H3B5_2       CLA     AL   H3     nurse
34      34    H3B6_2        MD     OL   H3   forager
35      35    H3B6_2        MD     MB   H3   forager
36      36    H3B6_2        MD     AL   H3   forager
37      37    H4B3_4       CLA     OL   H4     nurse
38      38    H4B3_4       CLA     MB   H4     nurse
39      39    H4B3_4       CLA     AL   H4     nurse
40      40    H4B4_7        MD     OL   H4     nurse
41      41    H4B4_7        MD     MB   H4     nurse
42      42    H4B4_7        MD     AL   H4     nurse
43      43    H4B5_2        Bi     OL   H4   forager
44      44    H4B5_2        Bi     MB   H4   forager
45      45    H4B5_2        Bi     AL   H4   forager
46      46    H4B6_2       CLH     OL   H4     nurse
47      47    H4B6_2       CLH     MB   H4     nurse
48      48    H4B6_2       CLH     AL   H4     nurse
49      49    H5B1_3       CLA     OL   H5     nurse
50      50    H5B1_3       CLA     MB   H5     nurse
51      51    H5B1_3       CLA     AL   H5     nurse
52      52    H5B2_3        Bi     OL   H5     nurse
53      53    H5B2_3        Bi     MB   H5     nurse
54      54    H5B2_3        Bi     AL   H5     nurse
55      55    H5B4_4       CLH     OL   H5     nurse
56      56    H5B4_4       CLH     MB   H5     nurse
57      57    H5B4_4       CLH     AL   H5     nurse
58      58    H5B5_6        MD     OL   H5   forager
59      59    H5B5_6        MD     MB   H5   forager
60      60    H5B5_6        MD     AL   H5   forager
61      61    H6B2_5       CLA     OL   H6   forager
62      62    H6B2_5       CLA     MB   H6   forager
63      63    H6B2_5       CLA     AL   H6   forager
64      64    H6B4_3       CLH     OL   H6   forager
65      65    H6B4_3       CLH     MB   H6   forager
66      66    H6B4_3       CLH     AL   H6   forager
67      67    H6B5_4        Bi     OL   H6     nurse
68      68    H6B5_4        Bi     MB   H6     nurse
69      69    H6B5_4        Bi     AL   H6     nurse
70      70    H6B6_2        MD     OL   H6     nurse
71      71    H6B6_2        MD     MB   H6     nurse
72      72    H6B6_2        MD     AL   H6     nurse
73      73    H7B1_3       CLA     OL   H7   forager
74      74    H7B1_3       CLA     MB   H7   forager
75      75    H7B1_3       CLA     AL   H7   forager
76      76    H7B2_3       CLH     OL   H7     nurse
77      77    H7B2_3       CLH     MB   H7     nurse
78      78    H7B2_3       CLH     AL   H7     nurse
79      79    H7B3_4        MD     OL   H7   forager
80      80    H7B3_4        MD     MB   H7   forager
81      81    H7B3_4        MD     AL   H7   forager
82      82    H7B4_9        Bi     OL   H7     nurse
83      83    H7B4_9        Bi     MB   H7     nurse
84      84    H7B4_9        Bi     AL   H7     nurse
85      85    H8B5_3        MD     OL   H8     nurse
86      86    H8B5_3        MD     MB   H8     nurse
87      87    H8B5_3        MD     AL   H8     nurse
88      88    H8B1_3        Bi     OL   H8     nurse
89      89    H8B1_3        Bi     MB   H8     nurse
90      90    H8B1_3        Bi     AL   H8     nurse
91      91    H8B2_3       CLH     OL   H8     nurse
92      92    H8B2_3       CLH     MB   H8     nurse
93      93    H8B2_3       CLH     AL   H8     nurse
94      94    H8B3_3       CLA     OL   H8   forager
95      95    H8B3_3       CLA     MB   H8   forager
96      96    H8B3_3       CLA     AL   H8   forager
97      97    H9B1_3       CLA     OL   H9   forager
98      98    H9B1_3       CLA     MB   H9   forager
99      99    H9B1_3       CLA     AL   H9   forager
100    100    H9B3_5       CLH     OL   H9   forager
101    101    H9B3_5       CLH     MB   H9   forager
102    102    H9B3_5       CLH     AL   H9   forager
103    103    H9B4_7        Bi     OL   H9   forager
104    104    H9B4_7        Bi     MB   H9   forager
105    105    H9B4_7        Bi     AL   H9   forager
106    106    H9B5_3        MD     OL   H9   forager
107    107    H9B5_3        MD     MB   H9   forager
108    108    H9B5_3        MD     AL   H9   forager
109    109   H10B1_6       CLH     OL  H10   forager
110    110   H10B1_6       CLH     MB  H10   forager
111    111   H10B1_6       CLH     AL  H10   forager
112    112   H10B2_4        MD     OL  H10     nurse
113    113   H10B2_4        MD     MB  H10     nurse
114    114   H10B2_4        MD     AL  H10     nurse
115    115   H10B5_5        Bi     OL  H10     nurse
116    116   H10B5_5        Bi     MB  H10     nurse
117    117   H10B5_5        Bi     AL  H10     nurse
118    118   H10B6_3       CLA     OL  H10     nurse
119    119   H10B6_3       CLA     MB  H10     nurse
120    120   H10B6_3       CLA     AL  H10     nurse

Because the project is about analysing the effects of some specific treatments on a more peripheral tissue than where the treatment was applied, I do not expect the treatment to have had a very strong effect, but I would like to identify these minor effects in a good detail if they are there.

Ideally I would like to know:

1) What genes are DEGs between the control and each treatment overall (across all three tissue regions). 2) Are there any region specific effects of treatment, so the same pairwise comparisons as before, but by specific region (interaction between treatment and region?). 3) Are there any DEGs dependent on behavioural state (again both overall and region-specific). 4) Is there an interaction between treatment and behavioural group, i.e. specific genes may differ between treatments and control only for one of the two behavioural groups. And possibly also by region (interaction between region, behaviour and treatment?)

I first tried to analyze my data by splitting the dataset by region (so 40 samples each in three separate analyses), but like this I do not seem to get any DEG in any pair-wise comparison between treatments. So I then tried to model the whole dataset of 120 samples in one go, and like this I seem to find a few genes of interest, but I am unsure if what I have been doing is indeed correct and recommended, and how to then include and get results for specific comparisons for interaction terms. This is my code so far:

library(edgeR)
y=DGEList(counts=count.table)

Hive=factor(Design$Hive)
Region=factor(Design$Region)
Behav=factor(Design$Behaviour)
Treatment <- relevel(Design$Treatment, "MD")
Treat=factor(Treatment)

design=model.matrix(~0+Treat+Hive+Region+Behav) 

#Filter low expressed genes
keep <- filterByExpr(y, design, min.count = 10, min.total.count = 20)
y <- y[keep,,keep.lib.sizes=FALSE]
dim(y)

#Now normalize counts
y=calcNormFactors(y, method = "TMM")

colnames(design)<-make.names(colnames(design))
v=voom(y,design,plot=TRUE)
cor=duplicateCorrelation(v,design,block=Design$Sample_ID)

fit=lmFit(v,design,block=Design$Sample_ID, correlation=cor$consensus)
head(coef(fit))

cm=makeContrasts(MDvsCLH= TreatCLH-TreatMD, MDvsCLA= TreatCLA-TreatMD, MDvsBi= TreatBi-TreatMD, ForagervsNurse =Behavnurse, levels=make.names(colnames(coef(fit)))) # To test treatment and behaviour effect

fit2=contrasts.fit(fit,cm)
fit2=eBayes(fit2, robust=TRUE)

summary(decideTests(fit2))

MDvsCLH_table=topTable(fit2,coef="MDvsCLH",n=Inf, sort="p", p=0.05)
MDvsCLA_table=topTable(fit2,coef="MDvsCLA",n=Inf, sort="p", p=0.05)
MDvsBi_table=topTable(fit2,coef="MDvsBi",n=Inf, sort="p", p=0.05)
ForagervsNurse_table=topTable(fit2,coef="ForagervsNurse",n=Inf, sort="p", p=0.05)

MDvsCLH_table
MDvsCLA_table
MDvsBi_table
ForagervsNurse_table

In essence, I have tried to include Treatment, Region, Behaviour and Hive (the technical batches) as Fixed effects and the individuals from which the three regions came from as a Random effect. I would have included also Hive as a random effect instead of a fixed effect but I am not sure if it is possible to include two different random effects?

Like this I do get some DEGs for pairwise comparisons between treatments and also for the behavioural group comparison (more so than treatment, so, as I was expecting, that has a stronger effect) and plotting the lcpm counts of individual DEGs shows that at least a few are consistently and convincingly up-regulated in treatments compared to the control group (MD) across the three regions.

Would this approach be valid and therefore I should trust that I am not getting false positive results? And if I then want to include interactions and get treatment comparisons for all the specific regions and behaviours how could I do that?

Should I approach the different questions with separate models or should I do that all at once somehow?

Thank you very much for your kind help with this.

Best, Joanito

limma edger • 164 views
ADD COMMENTlink modified 3 months ago by Aaron Lun25k • written 3 months ago by joasmile8410
Answer: Complex multifactorial design with random effects in Limma/voom
3
gravatar for Aaron Lun
3 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Seems like your life would be easiest if you just combined the three factors of interest.

# Assuming you loaded the data.frame into 'df'.
group <- paste0(df$Treatment, df$Region, df$Behaviour)
design <- model.matrix(~0 + group + df$Hive)

... and then block on Sample_ID in duplicateCorrelation(). It is fortunate that each pair of hives seems to share at least a few samples from the same group, so all hive effects are estimable and not confounded with group. This means that you can happily form any of the comparisons that you are interested in.

1) What genes are DEGs between the control and each treatment overall (across all three tissue regions).

Say, comparing the average effect of MD with Bi, one might do:

con <- makeContrasts(
    (groupBiALforager + groupBiALnurse + groupBiMBforager
         + groupBiMBnurse + groupBiOLforager + groupBiOLnurse)/6 - 
    (groupMDALforager + groupMDALnurse + groupMDMBforager
         + groupMDMBnurse + groupMDOLforager + groupMDOLnurse)/6,
    levels=design)

2) Are there any region specific effects of treatment, so the same pairwise comparisons as before, but by specific region (interaction between treatment and region?).

Hold on. There's two ways to interpret this contrast. If you want the same pairwise comparisons by region, then you'd get something like the below. Let's say we care about "AL", whatever that is.

con <- makeContrasts(
    (groupBiALforager + groupBiALnurse)/2 - 
    (groupMDALforager + groupMDALnurse)/2,
    levels=design)

However, if you want to find region-specific effects of treatment, this is an interaction contrast. This is asking for differences in the treatment effects between different regions, let's say "AL" and "MB":

# Treatment effect in AL:
con1 <- makeContrasts(
    (groupBiALforager + groupBiALnurse)/2 - 
    (groupMDALforager + groupMDALnurse)/2,
    levels=design)

# Treatment effect in MB:
con2 <- makeContrasts(
    (groupBiMBforager + groupBiMBnurse)/2 - 
    (groupMDMBforager + groupMDMBnurse)/2,
    levels=design)

# Interaction effect. I could have defined this in a single
# makeContrasts() call, but that would have been a pain to write.
con <- con1 - con2

3) Are there any DEGs dependent on behavioural state (again both overall and region-specific).

Rinse and repeat with the above code, testing between forager and nurse instead of between Bi and MD. For example, to detect differential expression associated with behaviour in the AL region, one might do:

con <- makeContrasts(
    (groupBiALforager + groupMDALforager)/2 - 
    (groupBiALnurse + groupMDALnurse)/2,
    levels=design)

The overall one would just average across all treatment/region combinations, I won't repeat that here.

4) Is there an interaction between treatment and behavioural group, i.e. specific genes may differ between treatments and control only for one of the two behavioural groups.

Again, use the interaction code above, modified so that con1 compares between treatments in one behaviour and con2 compares between treatments in another behaviour.

And possibly also by region (interaction between region, behaviour and treatment?)

A real three-way interaction term is a nightmare to interpret, are you sure this is what you want? Well anyway, here it is. For a three-way interaction involving regions "Bi" vs "MD", treatments "AL" vs "MB" and behaviour "forager" vs "nurse", we have:

con <- makeContrasts(
    groupBiALforager - (groupBiALnurse + groupBiMBforager + groupMDALforager)
        + (groupBiMBnurse + groupMDALnurse + groupMDMBforager) 
        - groupMDMBnurse,
    levels=design)
ADD COMMENTlink modified 3 months ago • written 3 months ago by Aaron Lun25k

Dear Aaron,

Thank you so much, this was really really helpful and got me through the main questions I wanted to ask! I have a few remaining doubts that I hope you could clarify.

When I run the analysis by first creating a "group" factor that contains region, treatment and behaviour as you suggested and then test for overall treatment effects, I get less DEGs for pairwise treatment comparisons than when I used the formula:

design=model.matrix(~0+Treat+Behav+Hive+Region)

and then run the following contrasts:

cm=makeContrasts(MDvsCLH= TreatCLH-TreatMD, MDvsCLA= TreatCLA-TreatMD, MDvsBi= TreatBi-TreatMD, levels=make.names(colnames(coef(fit)))) 

So what would be the difference of this previous approach I used with the one you suggested (apart from more easily allowing me to test all contrasts I am interested in)? I implemented your suggestion like this:

group <- paste0(Treat, Region, Behav)
design <- model.matrix(~0 + group + Hive)

con <- makeContrasts(
   MDvsCLH=  (groupCLHALforager + groupCLHALnurse + groupCLHMBforager
         + groupCLHMBnurse + groupCLHOLforager + groupCLHOLnurse)/6 - 
    (groupMDALforager + groupMDALnurse + groupMDMBforager
         + groupMDMBnurse + groupMDOLforager + groupMDOLnurse)/6,
   MDvsCLA=  (groupCLAALforager + groupCLAALnurse + groupCLAMBforager
         + groupCLAMBnurse + groupCLAOLforager + groupCLAOLnurse)/6 - 
    (groupMDALforager + groupMDALnurse + groupMDMBforager
         + groupMDMBnurse + groupMDOLforager + groupMDOLnurse)/6,
   MDvsBi=  (groupBiALforager + groupBiALnurse + groupBiMBforager
         + groupBiMBnurse + groupBiOLforager + groupBiOLnurse)/6 - 
    (groupMDALforager + groupMDALnurse + groupMDMBforager
         + groupMDMBnurse + groupMDOLforager + groupMDOLnurse)/6,
   CLAvsCLH=  (groupCLHALforager + groupCLHALnurse + groupCLHMBforager
         + groupCLHMBnurse + groupCLHOLforager + groupCLHOLnurse)/6 - 
    (groupCLAALforager + groupCLAALnurse + groupCLAMBforager
         + groupCLAMBnurse + groupCLAOLforager + groupCLAOLnurse)/6,
    levels=design)

If you were to spell out the model that you suggested how would that look like? Do I get less DEGs because it also includes interactions and we then end up with less degrees of freedom?

If this is the case, then I assume that having the behaviour factor in the model will also reduce the power to test for treatment effects (but at the same time controlling for possible confounding effects of behaviour)? Is there a trade-off here? I am just wondering what would be the best thing to do because the treatment effect is clearly very small, but the few DEGs I recover seem to be absolutely non random and to make a lot of sense, but a few disappear depending on whether I use your approach and enter the behaviour factor or not. I am just wondering what would be the best way to characterize them in most detail.

Thanks a lot again for your kind help!

ADD REPLYlink written 3 months ago by joasmile8410

So what would be the difference of this previous approach I used with the one you suggested (apart from more easily allowing me to test all contrasts I am interested in)?

More residual d.f. but stronger assumptions.

If you were to spell out the model that you suggested how would that look like?

Don't know what you're asking for here.

Do I get less DEGs because it also includes interactions and we then end up with less degrees of freedom?

Yes, which reduces power compared to the previous model. However, the previous model also assumed that all of the interesting biological effects (behaviour, region and treatment) were additive. If this is not true, one can expect systematic biases in the estimation of the coefficients, possibly manifesting as an increased number of (false positive) DEGs.

If this is the case, then I assume that having the behaviour factor in the model will also reduce the power to test for treatment effects (but at the same time controlling for possible confounding effects of behaviour)?

Yes.

Is there a trade-off here?

Yes.

I am just wondering what would be the best way to characterize them in most detail.

If you want to look at interaction effects - or even if you don't, but interactions are present - you cannot use an additive model.

ADD REPLYlink written 3 months ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 190 users visited in the last hour