DESeq2 Multiple Variable Nested Design
1
0
Entering edit mode
@zahraaboukhalil-13049
Last seen 4.4 years ago

Hello,

I have a question about the design of a DESeq2 experiment analysing RNA seq data. I have 3 factors to take into account: condition, patient and cell type. I have 2 conditions: normal and leukaemia. Within each condition I have multiple patients (please note this is an unpaired design and I do not have data from the same individual in both the normal and leukaemia condition). Within each patient there are 2 possible cell types: A and B. I am mostly interested in the overall condition effect, controlling for the differences across patients and cell types.

For example:

Condition Patient Cell Type
Normal 1 A
Normal 1 B
Normal 2 A
Normal 2 B
Leukaemia 3 A
Leukaemia 3 B
Leukaemia 4 A
Leukaemia 4 B

After reading the vignette, I tried creating a nested factor for the patient, giving the following:

Condition Patient Cell Type Patient.nested
Normal 1 A 1
Normal 1 B 1
Normal 2 A 2
Normal 2 B 2
Leukaemia 3 A 1
Leukaemia 3 B 1
Leukaemia 4 A 2
Leukaemia 4 B 2

As described in the vignette, I also removed the levels missing from an interaction of factors. I then tried using the following design: ~condition + condition:patient.nested + condition:cell type. However I still encounter the 'model matrix not full rank' error.

I'd appreciate any help on the design of this and advice on whether I have used the correct approach. Thank you!

0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

Can you post the exact code and as.data.frame(colData(dds)) where you get the model matrix not full rank error?

What do you get with

model.matrix(~condition + condition:patient.nested + condition:cell type, colData(dds))
0
Entering edit mode

Hi Michael, thank you for your response.

I have now managed to successfully run this by doing what you have suggested. I then use the model matrix for the full argument in the DESeq function. Can I just confirm that using this design formula is the correct approach to identify genes differentially expression due to the overall condition effect? To identify these genes I use 'conditionleukaemia' as follows:

#Create model matrix

m1<- model.matrix(~condition + condition:patient.nested + condition:cell_type, colData(dds))

#Identify differentially expressed genes

deg<- DESeq(dds, full=m1, betaPrior = F)

res.05 <- results(deg, alpha=.05, name="conditionleukaemic")

I have also used a more simple design of ~cell_type + condition and I am unsure as to which design is the most appropriate.

Thanks again for all your help!

0
Entering edit mode

Take a look at the DESeq2 section on interactions. It has a useful diagram of how to interpret the terms, which applies to what I write below:

"conditionleukaemic", is the main effect, which is the effect only for the reference level of cell type (A), and "conditionleukaemic.celltypeB" is the interaction term, which is the additional effect in cell type B, beyond the main effect. So the overall effect is an average of the effect in A and B, which you can achieve with a numeric contrast. Take a look at the order of coefficients in resultsNames(dds), then give a 1 to the main effect and a 0.5 to the interaction term, and 0's for all other terms. You can provide this numeric vector to the 'contrast' argument of results().