help for designing a multifactorial GLM framework edgeR
Entering edit mode
Last seen 7.7 years ago
United Kingdom


I have a big RNA-seq experiment and an "easy" biological question but not sure how to make the exact design and contrast within the GLM framework in edgeR. Any help is much appreciated.

I have 6 populations, 3 replicate x population (Cages), 2 treatments (cold and benign), 3 time blocks (Season). A dataframe showing the samples and experimental factors is reported below

My basic question is to find shared differentially expressed genes between cold and benign treatments, between all populations and between different seasons. In other words, I a looking for consistent expression changes moving from benign to cold at any given transition. I would also consider not using the "Cage" description to make the design more easy.

Is there a specific design and contrast I should use? Or should I use more contrast and then make a comparison between the outputs? I can't find any valuable example in the edgeR guide that would fit my problem. 

Also, since some populations have not been tested in one season (winter2013) and some treatment are missing (e.g. Groningen/benign/winter.2011), some interactions won't work because the glmFit error "Design matrix not of full rank".

Thanks for the help!

Season Popularion Cage Treatment Sample
Winter.2011 Barcelona 1 Benign BAR1A
Winter.2011 Barcelona 1 Cold BAR1B
Winter.2011 Barcelona 2 Benign BAR2A
Winter.2011 Barcelona 2 Cold BAR2B
Winter.2011 Barcelona 3 Benign BAR3A
Winter.2011 Barcelona 3 Cold BAR3B
Winter.2011 Gif 1 Benign GIF1A
Winter.2011 Gif 1 Cold GIF1B
Winter.2011 Gif 2 Benign GIF2A
Winter.2011 Gif 2 Cold GIF2B
Winter.2011 Gif 3 Benign GIF3A
Winter.2011 Gif 3 Cold GIF3B
Winter.2011 Groningen 1 Cold GRO1A
Winter.2011 Groningen 2 Cold GRO2A
Winter.2011 Groningen 3 Cold GRO3A
Winter.2011 Montpellier 1 Benign MON1A
Winter.2011 Montpellier 1 Cold MON1B
Winter.2011 Montpellier 2 Benign MON2A
Winter.2011 Montpellier 2 Cold MON2B
Winter.2011 Montpellier 3 Benign MON3A
Winter.2011 Montpellier 3 Cold MON3B
Winter.2011 Uppsala 1 Cold UPP1A
Winter.2011 Uppsala 2 Cold UPP2A
Winter.2011 Uppsala 3 Cold UPP3A
Winter.2011 Valencia 1 Benign VAL1A
Winter.2011 Valencia 1 Cold VAL1B
Winter.2011 Valencia 2 Benign VAL2A
Winter.2011 Valencia 2 Cold VAL2B
Winter.2011 Valencia 3 Benign VAL3A
Winter.2011 Valencia 3 Cold VAL3B
Winter.2012 Barcelona 1 Benign BAR1E
Winter.2012 Barcelona 1 Cold BAR1F
Winter.2012 Barcelona 2 Benign BAR2E
Winter.2012 Barcelona 2 Cold BAR2F
Winter.2012 Barcelona 3 Benign BAR3E
Winter.2012 Barcelona 3 Cold BAR3F
Winter.2012 Gif 1 Benign GIF1E
Winter.2012 Gif 1 Cold GIF1F
Winter.2012 Gif 2 Benign GIF2E
Winter.2012 Gif 2 Cold GIF2F
Winter.2012 Gif 3 Benign GIF3E
Winter.2012 Gif 3 Cold GIF3F
Winter.2012 Groningen 1 Benign GRO1E
Winter.2012 Groningen 1 Cold GRO1F
Winter.2012 Groningen 2 Benign GRO2E
Winter.2012 Groningen 2 Cold GRO2F
Winter.2012 Groningen 3 Benign GRO3E
Winter.2012 Groningen 3 Cold GRO3F
Winter.2012 Montpellier 1 Benign MON1E
Winter.2012 Montpellier 1 Cold MON1F
Winter.2012 Montpellier 2 Benign MON2E
Winter.2012 Montpellier 2 Cold MON2F
Winter.2012 Montpellier 3 Benign MON3E
Winter.2012 Montpellier 3 Cold MON3F
Winter.2012 Uppsala 1 Benign UPP_VAL1C
Winter.2012 Uppsala 1 Cold UPP_VAL1D
Winter.2012 Uppsala 2 Benign UPP_VAL2C
Winter.2012 Uppsala 2 Cold UPP_VAL2D
Winter.2012 Uppsala 3 Benign UPP_VAL3C
Winter.2012 Uppsala 3 Cold UPP_VAL3D
Winter.2012 Valencia 1 Benign VAL_VAL1E
Winter.2012 Valencia 1 Cold VAL_VAL1F
Winter.2012 Valencia 2 Benign VAL_VAL2E
Winter.2012 Valencia 2 Cold VAL_VAL2F
Winter.2012 Valencia 3 Benign VAL_VAL3E
Winter.2012 Valencia 3 Cold VAL_VAL3F
Winter.2013 Gif 1 Benign GIF1I
Winter.2013 Gif 1 Cold GIF1L
Winter.2013 Gif 2 Benign GIF2I
Winter.2013 Gif 2 Cold GIF2L
Winter.2013 Gif 3 Benign GIF3I
Winter.2013 Gif 3 Cold GIF3L
Winter.2013 Uppsala 1 Benign UPP_UPP1F
Winter.2013 Uppsala 1 Cold UPP_UPP1G
Winter.2013 Uppsala 2 Benign UPP_UPP2F
Winter.2013 Uppsala 2 Cold UPP_UPP2G
Winter.2013 Uppsala 3 Benign UPP_UPP3F
Winter.2013 Uppsala 3 Cold UPP_UPP3G
Winter.2013 Uppsala 1 Benign UPP_VAL1G
Winter.2013 Uppsala 1 Cold UPP_VAL1H
Winter.2013 Uppsala 2 Benign UPP_VAL2G
Winter.2013 Uppsala 2 Cold UPP_VAL2H
Winter.2013 Uppsala 3 Benign UPP_VAL3G
Winter.2013 Uppsala 3 Cold UPP_VAL3H
Winter.2013 Valencia 1 Benign VAL_UPP1E
Winter.2013 Valencia 1 Cold VAL_UPP1F
Winter.2013 Valencia 2 Benign VAL_UPP2E
Winter.2013 Valencia 2 Cold VAL_UPP2F
Winter.2013 Valencia 3 Benign VAL_UPP3E
Winter.2013 Valencia 3 Cold VAL_UPP3F
Winter.2013 Valencia 1 Benign VAL_VAL1I
Winter.2013 Valencia 1 Cold VAL_VAL1L
Winter.2013 Valencia 2 Benign VAL_VAL2I
Winter.2013 Valencia 2 Cold VAL_VAL2L
Winter.2013 Valencia 3 Benign VAL_VAL3I
Winter.2013 Valencia 3 Cold VAL_VAL3L
edgeR GLM • 884 views
Entering edit mode
Last seen 3 hours ago
United States

It's not entirely clear what you mean by 'shared differentially expressed genes', so here are a couple of possibilities. You could fit a 'conventional' ANOVA where you estimate the difference between cold and benign after controlling for the other factors. Let's say your data.frame is called 'samples':

> design <- model.matrix(~Treatment+Season+Population+Cage, samples)
> colnames(design)
 [1] "(Intercept)"           "TreatmentCold"         "SeasonWinter.2012"    
 [4] "SeasonWinter.2013"     "PopulationGif"         "PopulationGroningen"  
 [7] "PopulationMontpellier" "PopulationUppsala"     "PopulationValencia"   
[10] "Cage2"                 "Cage3"

In this parameterization, TreatmentCold estimates the difference between cold and benign, after accounting for the other parameters in the model. This is the conventional way to fit a model, and depending on your goals, may be the way to go (especially if you care to know if e.g. the Groningen and Montpellier populations are different, after controlling for the other parameters, etc).

An alternative is to fit a cell means model, where each 'cell' is a combination of season/population/treatment:

> combo <- factor(apply(samples[,c(1,2,4)], 1, paste, collapse = "_"))
> design2 <- model.matrix(~0+combo)
> colnames(design2)
 [1] "comboWinter.2011_Barcelona_Benign"   "comboWinter.2011_Barcelona_Cold"    
 [3] "comboWinter.2011_Gif_Benign"         "comboWinter.2011_Gif_Cold"          
 [5] "comboWinter.2011_Groningen_Cold"     "comboWinter.2011_Montpellier_Benign"
 [7] "comboWinter.2011_Montpellier_Cold"   "comboWinter.2011_Uppsala_Cold"      
 [9] "comboWinter.2011_Valencia_Benign"    "comboWinter.2011_Valencia_Cold"     
[11] "comboWinter.2012_Barcelona_Benign"   "comboWinter.2012_Barcelona_Cold"    
[13] "comboWinter.2012_Gif_Benign"         "comboWinter.2012_Gif_Cold"          
[15] "comboWinter.2012_Groningen_Benign"   "comboWinter.2012_Groningen_Cold"    
[17] "comboWinter.2012_Montpellier_Benign" "comboWinter.2012_Montpellier_Cold"  
[19] "comboWinter.2012_Uppsala_Benign"     "comboWinter.2012_Uppsala_Cold"      
[21] "comboWinter.2012_Valencia_Benign"    "comboWinter.2012_Valencia_Cold"     
[23] "comboWinter.2013_Gif_Benign"         "comboWinter.2013_Gif_Cold"          
[25] "comboWinter.2013_Uppsala_Benign"     "comboWinter.2013_Uppsala_Cold"      
[27] "comboWinter.2013_Valencia_Benign"    "comboWinter.2013_Valencia_Cold" 

This parameterization allows you to make any combination you might care about, but you have to construct the contrasts by hand. In this parameterization you would test for differences between the benign and cold using a contrast like

> contrast <- vector(length = ncol(design2))
> contrast[grep("Cold", colnames(design2))] <- 1/15
> contrast[grep("Benign", colnames(design2))] <- -1/13
> contrast
 [1] -0.07692308  0.06666667 -0.07692308  0.06666667  0.06666667 -0.07692308
 [7]  0.06666667  0.06666667 -0.07692308  0.06666667 -0.07692308  0.06666667
[13] -0.07692308  0.06666667 -0.07692308  0.06666667 -0.07692308  0.06666667
[19] -0.07692308  0.06666667 -0.07692308  0.06666667 -0.07692308  0.06666667
[25] -0.07692308  0.06666667 -0.07692308  0.06666667

Note that this testing a slightly different hypothesis. Instead of testing Cold vs Benign after controlling for the other parameters, you are instead testing to see if on average the Cold samples are higher or lower than the Benign samples (essentially ignoring the other parameters). But with this parameterization you can easily ask different questions, like 'Is there an interaction between Benign and Cold with respect to the Valencia and Uppsala populations?'



Entering edit mode
Aaron Lun ★ 27k
Last seen 12 hours ago
The city by the bay

In addition to James' suggestions, another approach is to set up the design matrix like so:

> Combination <- paste0(samples$Season, "_", samples$Population)
> Cage <- factor(samples$Cage)
> Treatment <- samples$Treatment
> design3 <- model.matrix(~0 + Combination + Combination:Cage + Combination:Treatment)
> qrd <- qr(design3)
> design3 <- design3[,qrd$pivot[1:qrd$rank],drop=FALSE]
> colnames(design3)
 [1] "CombinationWinter.2011_Barcelona"
 [2] "CombinationWinter.2011_Gif"
 [3] "CombinationWinter.2011_Groningen"
# ... and so on
[16] "CombinationWinter.2011_Barcelona:Cage2"
[17] "CombinationWinter.2011_Gif:Cage2"
[18] "CombinationWinter.2011_Groningen:Cage2"
# ... and so on
[31] "CombinationWinter.2011_Barcelona:Cage3"
[32] "CombinationWinter.2011_Gif:Cage3"       
[33] "CombinationWinter.2011_Groningen:Cage3"
# ... and so on
[46] "CombinationWinter.2011_Barcelona:TreatmentCold"  
[47] "CombinationWinter.2011_Gif:TreatmentCold"        
[48] "CombinationWinter.2011_Montpellier:TreatmentCold"
# ... and so on

To understand this set up, think of each season/population combination in isolation. For example, let's consider Barcelona in Winter 2011. Here, we have three pairs of cold/benign samples, with one pair from each cage. This can be analyzed with a paired-sample design where we block on the Cage factor. We have one coefficient describing the change between treatments across all cages; dropping this coefficent will test for the effect of the treatment.

Now, design3 is simply an amped-up version of this paired-sample approach. Each season/population combination has its own blocking factors for the cage effect, as well as its own coefficient for the treatment effect. If you drop one of the treatment coefficients, you can test for the difference between benign and cold treatments for any season/population combination. If you want to find DE genes that are shared across seasons or populations, you can test the treatment effect in each season/population and perform an intersection on the DE lists between seasons/populations.

Note that the QR decomposition just removes the treatment coefficients corresponding to the Uppsala and Groningen samples of Winter 2011. This is because they don't have any benign samples, so there's nothing to compare the cold samples to.


Login before adding your answer.

Traffic: 420 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6