Question: EdgeR - Model matrix for complex model similar to user guide 3.5 example - but more complex
0
gravatar for DonnyB
5 days ago by
DonnyB0
UK
DonnyB0 wrote:

Hi all,

I'm trying to analyse some RNAseq data which has been generated using an experimental design not dissimilar to the example in the user guide (section 3.5, p41), but with some differences that are making applying that model difficult.

My study design:

> Pheno_mat
   disease Patient Treatment
1  healthy       1      None
2  healthy       1    Drug_A
3  healthy       1    Drug_B
4  healthy       2      None
5  healthy       2    Drug_A
6  healthy       2    Drug_B
7  disease       3      None
8  disease       3    Drug_A
9  disease       3    Drug_B
10 disease       4      None
11 disease       4    Drug_A
12 disease       4    Drug_B
13 disease       5      None
14 disease       5    Drug_A
15 disease       5    Drug_B

I want to find: The impact of drugA/B on healthy and diseased samples. The difference in the impact of DrugA/B in disease vs healthy.

The reason I can't apply the model near exactly as shown in 3.5 is that I do not have a balanced number of disease/healthy samples, so revising the targets names to 1-3 in both disease groups doesn't work, I still have design matrix that is not full rank:

> Pheno_mat
   disease Patient Treatment Patient_fix
1  healthy       1      None           1
2  healthy       1    Drug_A           1
3  healthy       1    Drug_B           1
4  healthy       2      None           2
5  healthy       2    Drug_A           2
6  healthy       2    Drug_B           2
7  disease       3      None           1
8  disease       3    Drug_A           1
9  disease       3    Drug_B           1
10 disease       4      None           2
11 disease       4    Drug_A           2
12 disease       4    Drug_B           2
13 disease       5      None           3
14 disease       5    Drug_A           3
15 disease       5    Drug_B           3

So, what are my options? 1: I can't follow the example as I have a different number of patients in healthy/disease, so my design matrix isn't full rank 2: Disease state is explained by patient ID, but I can't use patient:disease then handle disease state in the contrasts because I get n=1 in each group and can't estimate dispersions 3: If I simply fit a nested Disease+Drug model I ignore that my patient data are paired...

Any suggestions would be gratefully received!

Thanks,

Dean

limma edger • 81 views
ADD COMMENTlink modified 4 days ago by James W. MacDonald50k • written 5 days ago by DonnyB0

I’ve changed the tags. Please don’t add additional packages to the question because it emails all the developers when you tag a post. If you have an edgeR question it’s appropriate to tag with “edgeR”.

ADD REPLYlink written 4 days ago by Michael Love24k

Hi,

I figured fitting appropriate multifactor GLM was common to all packages, rather than specific to edgeR.

Apologies nonetheless.

ADD REPLYlink written 4 days ago by DonnyB0

This is true, but asking for the attention of multiple package developers when you are planning to use edgeR is additional burden on our time. We have limited time for support and maintenance of the packages, I actually do it mostly out of work hours because it’s not supported effort.

ADD REPLYlink written 4 days ago by Michael Love24k
Answer: EdgeR - Model matrix for complex model similar to user guide 3.5 example - but m
3
gravatar for James W. MacDonald
4 days ago by
United States
James W. MacDonald50k wrote:

The only reason the matrix isn't full rank is because you have coefficients that don't estimate anything.


> phenomat <- data.frame(disease = factor(rep(c("healthy","disease"), c(6,9))), patient = factor(rep(c(1:2,1:3), each = 3)), treatment = factor(rep(c("Drug_A","Drug_B","None"), 5)))
> design <- model.matrix(~disease + disease:patient + disease:treatment, phenomat)
> which(colSums(design) == 0L)
diseasehealthy:patient3 
                      6 

## Note that there isn't a healthy patient 3!

> design <- design[,-6]
> library(limma)
> is.fullrank(design)
[1] TRUE

ADD COMMENTlink written 4 days ago by James W. MacDonald50k
1

To follow up a bit, I would favor the much simpler formulation:

# Note the different 'patient' compared to James' post.
phenomat2 <- data.frame(
    disease = factor(rep(c("healthy","disease"), c(6,9))), 
    patient = factor(rep(c(1:5), each = 3)), 
    treatment = factor(rep(c("Drug_A","Drug_B","None"), 5)))

design2 <- model.matrix(~0 + patient + disease:treatment, phenomat2)
design2 <- design2[,!grepl("None", colnames(design2))]

The first 5 coefficients represent the log-expression of each untreated patient. The last 4 coefficients represent the log-fold change upon treatment with each drug in each disease condition. You can then compare them to zero to test the actual effect of the drug, or compare them to each other to test for differences in effects between disease conditions or between drugs.

In terms of the actual linear model, this is equivalent to James' approach; it's just that the columns have been shuffled around and added to each other to go from design to design2, such that the interpretation of the coefficients changes. I find the coefficients in design2 to be easier to interpret, though design is probably more of the classical way of doing things.

ADD REPLYlink modified 4 days ago • written 4 days ago by Aaron Lun24k

Thanks both, very helpful!

Kind regards,

Dean

ADD REPLYlink written 2 days ago by DonnyB0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 296 users visited in the last hour