The editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Correct formulation of contrasts of interest regarding an RNA-Seq gene expression analysis with edgeR
gravatar for fotakisg
5 months ago by
fotakisg0 wrote:

Dear Bioconductor Support Site,

I'm currently working with an RNA-Seq dataset, which consists of 84 samples(Collaborative Cross mice genetic panel of normal, diabetic and obese phenotypes).  My code for the analysis used so far is the following:

head(sampleInfo) # the phenotype data information 

SampleName Sex Obesity Diabetes
1      DT_77   M   Obese Diabetic
2      DT_39   F   Obese   Normal
3     DT_145   M  Normal   Normal
4     DT_130   F   Obese   Normal
5     DT_103   M  Normal   Normal
6     DT_124   M   Obese Diabetic

Next, after the creation of DGEList object/filtering/normalization, for the construction of the design matrix, I considered two separate approaches,to find DE genes concerning the obesity phenotype, but taking into account diabetes, while blocking on the sex confounder:

1) First Approach

# Create the design matrix <- as.factor(dge$samples$Obesity) <- as.factor(dge$samples$Diabetes) <- as.factor(dge$samples$Sex)
design <- model.matrix( : +

y3 <- estimateDisp(dge, design, robust=TRUE)

Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05,  :
  Design matrix not of full rank.  The following coefficients not estimable:
  (Intercept) sex.groupM condition1.groupNormal:condition2.groupDiabetic condition1.groupObese:condition2.groupDiabetic
1           1          1                                               0                                              1
2           1          0                                               0                                              0
3           1          1                                               0                                              0
4           1          0                                               0                                              0
5           1          1                                               0                                              0
6           1          1                                               0                                              1
  condition1.groupNormal:condition2.groupNormal condition1.groupObese:condition2.groupNormal
1                                             0                                            0
2                                             0                                            1
3                                             1                                            0
4                                             0                                            1
5                                             1                                            0
6                                             0                                            0

fit <- glmQLFit(y3, design, robust=TRUE)

fit2 <- glmQLFTest(fit, coef=2)

2) Second Approach

mult_con <- factor(paste(,, sep = "."))
sample2 <- cbind(sampleinfo,Group=mult_con)
design <- model.matrix(~0+sample2$Group +

  sample2$GroupNormal.Diabetic sample2$GroupNormal.Normal sample2$GroupObese.Diabetic sample2$GroupObese.Normal sex.groupM
1                            0                          0                           1                         0          1
2                            0                          0                           0                         1          0
3                            0                          1                           0                         0          1
4                            0                          0                           0                         1          0
5                            0                          1                           0                         0          1
6                            0                          0                           1                         0          1

my.contrasts <- makeContrasts(OvsN = sample2$GroupObese.Normal- sample2$GroupNormal.Normal, ONvsOD = sample2$GroupObese.Normal-sample2$GroupObese.Diabetic, levels=design)

# Obese_Normal vs Obese_Diabetic example

qlf.ON.OD <- glmQLFTest(fit, contrast=my.contrasts[,"ONvsOD"])
de.ON.OD <- topTags(qlf.ON.OD,n=nrow(qlf.ON.OD),adjust.method="BH", p.value=0.05)
dat.ON.OD <- de.ON.OD$table

Thus, my questions are the following;

A) 1) For the 1rst approach: regarding the initial error of full Rank, should I remove manually some coefficients that are not of "interest " ? For example, the aforementioned "condition1.groupObese:condition2.groupNormal" ?

2) Moreover, is the formulation and interpretation of the interaction approach correct ? The coefficient condition1.groupNormal:condition2.groupDiabetic , essentially compares the Normal samples from the Obesity group, with the Diabetic samples from the Diabetes group, correct ?

B) Overall, which of the two approaches should I consider ? Should I expect significant differences between the two approaches for the same comparisons ?

Thank you in advance and excuse me for my long post !!


ADD COMMENTlink modified 5 months ago by Aaron Lun22k • written 5 months ago by fotakisg0
Answer: Correct formulation of contrasts of interest regarding an RNA-Seq gene expressio
gravatar for Aaron Lun
5 months ago by
Aaron Lun22k
Cambridge, United Kingdom
Aaron Lun22k wrote:

Your second model parametrization is much easier to use and interpret. The first four coefficients represent the average log-expression within each obesity-disease combination, while the last coefficient represents the log-fold change of males over females. You can then simply test for pairwise differences between combinations:

# Paraphrasing the column names to keep things simple:
con <- makeContrasts(GroupNormal.Diabetic - GroupNormal.Normal, levels=design)

... or if you want to see the average effect of obesity:

con <- makeContrasts((GroupNormal.Diabetic + GroupNormal.Normal)/2
    - (GroupObese.Diabetic + GroupObese.Normal)/2, levels=design)

... or you can test the obesity effect separately between disease and normal patients, and intersect the DE lists.

I won't bother trying to figure out what the coefficients mean in your first approach, as your second approach is much more usable. I will just mention that, if you were to remove the column corresponding to the unestimable coefficient, the first design matrix would be mathematically equivalent to that in your second approach. That is, I could get the second design matrix from linear combinations of the columns in the first design matrix. However, the second design matrix is much easier to interpret, so you might as well start with that.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Aaron Lun22k

Dear Aaron,

Thank you for your reply and your input, especially for the average effect of obesity between patient and normal samples.

Best regards,
Fotakis Georgios

ADD REPLYlink written 5 months ago by fotakisg0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 364 users visited in the last hour