Question: EdgeR - Model matrix for complex model similar to user guide 3.5 example - but more complex
gravatar for D
4 months ago by
D0 wrote:

Hi all,

I'm trying to analyse some RNAseq data which has been generated using an experimental design not dissimilar to the example in the user guide (section 3.5, p41), but with some differences that are making applying that model difficult.

My study design:

> Pheno_mat
   disease Patient Treatment
1  healthy       1      None
2  healthy       1    Drug_A
3  healthy       1    Drug_B
4  healthy       2      None
5  healthy       2    Drug_A
6  healthy       2    Drug_B
7  disease       3      None
8  disease       3    Drug_A
9  disease       3    Drug_B
10 disease       4      None
11 disease       4    Drug_A
12 disease       4    Drug_B
13 disease       5      None
14 disease       5    Drug_A
15 disease       5    Drug_B

I want to find: The impact of drugA/B on healthy and diseased samples. The difference in the impact of DrugA/B in disease vs healthy.

The reason I can't apply the model near exactly as shown in 3.5 is that I do not have a balanced number of disease/healthy samples, so revising the targets names to 1-3 in both disease groups doesn't work, I still have design matrix that is not full rank:

> Pheno_mat
   disease Patient Treatment Patient_fix
1  healthy       1      None           1
2  healthy       1    Drug_A           1
3  healthy       1    Drug_B           1
4  healthy       2      None           2
5  healthy       2    Drug_A           2
6  healthy       2    Drug_B           2
7  disease       3      None           1
8  disease       3    Drug_A           1
9  disease       3    Drug_B           1
10 disease       4      None           2
11 disease       4    Drug_A           2
12 disease       4    Drug_B           2
13 disease       5      None           3
14 disease       5    Drug_A           3
15 disease       5    Drug_B           3

So, what are my options? 1: I can't follow the example as I have a different number of patients in healthy/disease, so my design matrix isn't full rank 2: Disease state is explained by patient ID, but I can't use patient:disease then handle disease state in the contrasts because I get n=1 in each group and can't estimate dispersions 3: If I simply fit a nested Disease+Drug model I ignore that my patient data are paired...

Any suggestions would be gratefully received!



limma edger • 152 views
ADD COMMENTlink modified 4 months ago by James W. MacDonald51k • written 4 months ago by D0

I’ve changed the tags. Please don’t add additional packages to the question because it emails all the developers when you tag a post. If you have an edgeR question it’s appropriate to tag with “edgeR”.

ADD REPLYlink written 4 months ago by Michael Love26k


I figured fitting appropriate multifactor GLM was common to all packages, rather than specific to edgeR.

Apologies nonetheless.

ADD REPLYlink written 4 months ago by D0

This is true, but asking for the attention of multiple package developers when you are planning to use edgeR is additional burden on our time. We have limited time for support and maintenance of the packages, I actually do it mostly out of work hours because it’s not supported effort.

ADD REPLYlink written 4 months ago by Michael Love26k
Answer: EdgeR - Model matrix for complex model similar to user guide 3.5 example - but m
gravatar for James W. MacDonald
4 months ago by
United States
James W. MacDonald51k wrote:

The only reason the matrix isn't full rank is because you have coefficients that don't estimate anything.

> phenomat <- data.frame(disease = factor(rep(c("healthy","disease"), c(6,9))), patient = factor(rep(c(1:2,1:3), each = 3)), treatment = factor(rep(c("Drug_A","Drug_B","None"), 5)))
> design <- model.matrix(~disease + disease:patient + disease:treatment, phenomat)
> which(colSums(design) == 0L)

## Note that there isn't a healthy patient 3!

> design <- design[,-6]
> library(limma)
> is.fullrank(design)
[1] TRUE

ADD COMMENTlink written 4 months ago by James W. MacDonald51k

To follow up a bit, I would favor the much simpler formulation:

# Note the different 'patient' compared to James' post.
phenomat2 <- data.frame(
    disease = factor(rep(c("healthy","disease"), c(6,9))), 
    patient = factor(rep(c(1:5), each = 3)), 
    treatment = factor(rep(c("Drug_A","Drug_B","None"), 5)))

design2 <- model.matrix(~0 + patient + disease:treatment, phenomat2)
design2 <- design2[,!grepl("None", colnames(design2))]

The first 5 coefficients represent the log-expression of each untreated patient. The last 4 coefficients represent the log-fold change upon treatment with each drug in each disease condition. You can then compare them to zero to test the actual effect of the drug, or compare them to each other to test for differences in effects between disease conditions or between drugs.

In terms of the actual linear model, this is equivalent to James' approach; it's just that the columns have been shuffled around and added to each other to go from design to design2, such that the interpretation of the coefficients changes. I find the coefficients in design2 to be easier to interpret, though design is probably more of the classical way of doing things.

ADD REPLYlink modified 4 months ago • written 4 months ago by Aaron Lun25k

Thanks both, very helpful!

Kind regards,


ADD REPLYlink written 4 months ago by D0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 190 users visited in the last hour