Question: EdgeR - Model matrix for complex model similar to user guide 3.5 example - but more complex
0
4 months ago by
D0
UK
D0 wrote:

Hi all,

I'm trying to analyse some RNAseq data which has been generated using an experimental design not dissimilar to the example in the user guide (section 3.5, p41), but with some differences that are making applying that model difficult.

My study design:

> Pheno_mat
disease Patient Treatment
1  healthy       1      None
2  healthy       1    Drug_A
3  healthy       1    Drug_B
4  healthy       2      None
5  healthy       2    Drug_A
6  healthy       2    Drug_B
7  disease       3      None
8  disease       3    Drug_A
9  disease       3    Drug_B
10 disease       4      None
11 disease       4    Drug_A
12 disease       4    Drug_B
13 disease       5      None
14 disease       5    Drug_A
15 disease       5    Drug_B


I want to find: The impact of drugA/B on healthy and diseased samples. The difference in the impact of DrugA/B in disease vs healthy.

The reason I can't apply the model near exactly as shown in 3.5 is that I do not have a balanced number of disease/healthy samples, so revising the targets names to 1-3 in both disease groups doesn't work, I still have design matrix that is not full rank:

> Pheno_mat
disease Patient Treatment Patient_fix
1  healthy       1      None           1
2  healthy       1    Drug_A           1
3  healthy       1    Drug_B           1
4  healthy       2      None           2
5  healthy       2    Drug_A           2
6  healthy       2    Drug_B           2
7  disease       3      None           1
8  disease       3    Drug_A           1
9  disease       3    Drug_B           1
10 disease       4      None           2
11 disease       4    Drug_A           2
12 disease       4    Drug_B           2
13 disease       5      None           3
14 disease       5    Drug_A           3
15 disease       5    Drug_B           3


So, what are my options? 1: I can't follow the example as I have a different number of patients in healthy/disease, so my design matrix isn't full rank 2: Disease state is explained by patient ID, but I can't use patient:disease then handle disease state in the contrasts because I get n=1 in each group and can't estimate dispersions 3: If I simply fit a nested Disease+Drug model I ignore that my patient data are paired...

Any suggestions would be gratefully received!

Thanks,

Dean

limma edger • 152 views
modified 4 months ago by James W. MacDonald51k • written 4 months ago by D0

I’ve changed the tags. Please don’t add additional packages to the question because it emails all the developers when you tag a post. If you have an edgeR question it’s appropriate to tag with “edgeR”.

Hi,

I figured fitting appropriate multifactor GLM was common to all packages, rather than specific to edgeR.

Apologies nonetheless.

This is true, but asking for the attention of multiple package developers when you are planning to use edgeR is additional burden on our time. We have limited time for support and maintenance of the packages, I actually do it mostly out of work hours because it’s not supported effort.

Answer: EdgeR - Model matrix for complex model similar to user guide 3.5 example - but m
3
4 months ago by
United States
James W. MacDonald51k wrote:

The only reason the matrix isn't full rank is because you have coefficients that don't estimate anything.


> phenomat <- data.frame(disease = factor(rep(c("healthy","disease"), c(6,9))), patient = factor(rep(c(1:2,1:3), each = 3)), treatment = factor(rep(c("Drug_A","Drug_B","None"), 5)))
> design <- model.matrix(~disease + disease:patient + disease:treatment, phenomat)
> which(colSums(design) == 0L)
diseasehealthy:patient3
6

## Note that there isn't a healthy patient 3!

> design <- design[,-6]
> library(limma)
> is.fullrank(design)
[1] TRUE


1

To follow up a bit, I would favor the much simpler formulation:

# Note the different 'patient' compared to James' post.
phenomat2 <- data.frame(
disease = factor(rep(c("healthy","disease"), c(6,9))),
patient = factor(rep(c(1:5), each = 3)),
treatment = factor(rep(c("Drug_A","Drug_B","None"), 5)))

design2 <- model.matrix(~0 + patient + disease:treatment, phenomat2)
design2 <- design2[,!grepl("None", colnames(design2))]


The first 5 coefficients represent the log-expression of each untreated patient. The last 4 coefficients represent the log-fold change upon treatment with each drug in each disease condition. You can then compare them to zero to test the actual effect of the drug, or compare them to each other to test for differences in effects between disease conditions or between drugs.

In terms of the actual linear model, this is equivalent to James' approach; it's just that the columns have been shuffled around and added to each other to go from design to design2, such that the interpretation of the coefficients changes. I find the coefficients in design2 to be easier to interpret, though design is probably more of the classical way of doing things.

Kind regards,

Dean