Question: Complex Design matrix for DE analysis
0
6 weeks ago by
sutturka0
sutturka0 wrote:

I am working with a complex experiment design shown below and I am not able to build the appropriate design matrix. I looked through the DESeq2 manual and this blog and tried different designs and every time I am getting error as "Design matrix not of full rank".

Patients highlighted in orange are unpaired (i.e. have only pre-treatment data). I also introduced ind.n column as suggested by DESeq2 manual but may not be necessary in the design.

I am interested in DE testing for:

1. ResistantvsSensitive considering the patients effect
2. PostvsPre considering the patients effect (All samples)
3. PostvsPre considering the patients effect (within Resistant group)
4. PostvsPre considering the patients effect (within Sensitive group)

Can you please suggest the best possible design matrix?

Also, a very naive question - I have seen design examples starting with "~1", "~0 + ". What is the difference between the two? I found the explanations for - 1. Dependent and independent variables represented on either side of ~ symbol 2. Interaction are represented with : symbol. But I am unsure of "~1" and "~0 +" designs.

limma edger deseq2 • 109 views
modified 6 weeks ago by Michael Love24k • written 6 weeks ago by sutturka0
Answer: Complex Design matrix for DE analysis
0
6 weeks ago by
Michael Love24k
United States
Michael Love24k wrote:

For DESeq2, I would follow the example in the vignette and drop the unpaired samples.

If you use limma, you can include the unpaired samples, and inform the model about correlations by patient. The other package authors may have more advice on that approach.

What he said. For limma, use duplicateCorrelation while blocking on Patient, and use a one-way layout in your design matrix with a combined factor containing Response and Treatment. For edgeR, you would need to do different things depending on your contrast:

1. Subset to only include pre samples and then use a one-way layout on Response to test for differences within pre. Repeat for post.
2. Use a design matrix that looks something like ~Patient + Response:Treatment (probably need to drop a few terms containing Treatmentpre in their names to get to full rank). This will give you two terms at the right of the matrix, representing the post/pre effect in each response type. Test that the average of this response is non-zero.
3. Same design as 2, but just test the coefficient corresponding to the post/pre effect in resistant samples.
4. Same design as 3 but for sensitive samples.

In 2-4, the unpaired samples effectively don't contribute anything to the analysis - their only effect is to contribute to the calculation of the average log-CPM - so you might as well drop them.

Dear Micheal and Aaron,

Thank you for the response. I am starting with DESeq2 and next work with limma.

I removed the unpaired samples and then able to get it running correctly. I am posting the code below as it may help someone.

#Add patients IDs specific to Response
r_vec = rep(1:6, each = 2)
s_vec = rep(1:9, each = 2)
samples$pt.n <- factor(c(r_vec, s_vec)) #build Model Matrix m1 = model.matrix(~ Response + Response:pt.n + Response:Treatment, samples) #Remove All zero columns all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m1 <- m1[,-idx] colnames(m1) dds<- DESeqDataSetFromMatrix(countData = countData, colData = samples, design = m1) dds$Response = relevel(dds$Response, "Sensitive") dds$Treatment = relevel(dds\$Treatment, "pre")

dds <- DESeq(dds)
resultsNames(dds)

[1] "Intercept"                      "ResponseSensitive"
[3] "ResponseResistant.pt.n2"        "ResponseSensitive.pt.n2"
[5] "ResponseResistant.pt.n3"        "ResponseSensitive.pt.n3"
[7] "ResponseResistant.pt.n4"        "ResponseSensitive.pt.n4"
[9] "ResponseResistant.pt.n5"        "ResponseSensitive.pt.n5"
[11] "ResponseResistant.pt.n6"        "ResponseSensitive.pt.n6"
[13] "ResponseSensitive.pt.n7"        "ResponseSensitive.pt.n8"
[15] "ResponseSensitive.pt.n9"        "ResponseResistant.Treatmentpre"
[17] "ResponseSensitive.Treatmentpre"


Can you please confirm that my comparison interpretations below are correct as deduced from the resultsNames?

# Resistant (Treatment) vs Sensitive (Control)
res1 = results(dds, contrast=list("ResponseSensitive"))

# Post (Treatment) vs Pre (Control) within the group "Sensitive"
res1 = results(dds, contrast=list("ResponseSensitive.Treatmentpre"))

# Post (Treatment) vs Pre (Control) within the group "Resistant"
res1 = results(dds, contrast=list("ResponseResistant.Treatmentpre"))


Next, I also want to compare Post vs Pre considering patients effect. I think, I need to build another model matrix with adding new pt.n column numbered as per the Treatment. Is this correct?

m1 = model.matrix(~ Treatment + Treatment:pt.n + Treatment:Response, samples)


Thank you again for the help.

1

Rather than continuing to send alerts to the limma and edgeR maintainers, for additional questions specific to DESeq2 could you make a new post?