Question

Complex Design matrix for DE analysis

0

Entering edit mode

sutturka • 0

@sutturka-14580

Last seen 12 months ago

United States

I am working with a complex experiment design shown below and I am not able to build the appropriate design matrix. I looked through the DESeq2 manual and this blog and tried different designs and every time I am getting error as "Design matrix not of full rank".

Design Link for the image

Patients highlighted in orange are unpaired (i.e. have only pre-treatment data). I also introduced ind.n column as suggested by DESeq2 manual but may not be necessary in the design.

I am interested in DE testing for:

ResistantvsSensitive considering the patients effect
PostvsPre considering the patients effect (All samples)
PostvsPre considering the patients effect (within Resistant group)
PostvsPre considering the patients effect (within Sensitive group)

Can you please suggest the best possible design matrix?

Also, a very naive question - I have seen design examples starting with "~1", "~0 + ". What is the difference between the two? I found the explanations for - 1. Dependent and independent variables represented on either side of ~ symbol 2. Interaction are represented with : symbol. But I am unsure of "~1" and "~0 +" designs.

deseq2 edger limma • 1.4k views

ADD COMMENT • link updated 5.0 years ago by Michael Love 41k • written 5.0 years ago by sutturka • 0

score 0 · Answer 1 · 2019-05-08

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 hour ago

United States

For DESeq2, I would follow the example in the vignette and drop the unpaired samples.

If you use limma, you can include the unpaired samples, and inform the model about correlations by patient. The other package authors may have more advice on that approach.

ADD COMMENT • link 5.0 years ago Michael Love 41k

0

Entering edit mode

What he said. For limma, use duplicateCorrelation while blocking on Patient, and use a one-way layout in your design matrix with a combined factor containing Response and Treatment. For edgeR, you would need to do different things depending on your contrast:

Subset to only include pre samples and then use a one-way layout on Response to test for differences within pre. Repeat for post.
Use a design matrix that looks something like ~Patient + Response:Treatment (probably need to drop a few terms containing Treatmentpre in their names to get to full rank). This will give you two terms at the right of the matrix, representing the post/pre effect in each response type. Test that the average of this response is non-zero.
Same design as 2, but just test the coefficient corresponding to the post/pre effect in resistant samples.
Same design as 3 but for sensitive samples.

In 2-4, the unpaired samples effectively don't contribute anything to the analysis - their only effect is to contribute to the calculation of the average log-CPM - so you might as well drop them.

ADD REPLY • link 5.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Micheal and Aaron,

Thank you for the response. I am starting with DESeq2 and next work with limma.

I removed the unpaired samples and then able to get it running correctly. I am posting the code below as it may help someone.

#Add patients IDs specific to Response
r_vec = rep(1:6, each = 2)
s_vec = rep(1:9, each = 2)
samples$pt.n <- factor(c(r_vec, s_vec))

#build Model Matrix
m1 = model.matrix(~ Response + Response:pt.n + Response:Treatment, samples)

#Remove All zero columns
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
idx <- which(all.zero)
m1 <- m1[,-idx]
colnames(m1)

dds<- DESeqDataSetFromMatrix(countData = countData,
                             colData = samples,
                             design = m1)

dds$Response = relevel(dds$Response, "Sensitive")
dds$Treatment = relevel(dds$Treatment, "pre")

dds <- DESeq(dds)
resultsNames(dds)

 [1] "Intercept"                      "ResponseSensitive"             
 [3] "ResponseResistant.pt.n2"        "ResponseSensitive.pt.n2"       
 [5] "ResponseResistant.pt.n3"        "ResponseSensitive.pt.n3"       
 [7] "ResponseResistant.pt.n4"        "ResponseSensitive.pt.n4"       
 [9] "ResponseResistant.pt.n5"        "ResponseSensitive.pt.n5"       
[11] "ResponseResistant.pt.n6"        "ResponseSensitive.pt.n6"       
[13] "ResponseSensitive.pt.n7"        "ResponseSensitive.pt.n8"       
[15] "ResponseSensitive.pt.n9"        "ResponseResistant.Treatmentpre"
[17] "ResponseSensitive.Treatmentpre"

Can you please confirm that my comparison interpretations below are correct as deduced from the resultsNames?

# Resistant (Treatment) vs Sensitive (Control)
res1 = results(dds, contrast=list("ResponseSensitive"))

# Post (Treatment) vs Pre (Control) within the group "Sensitive"
res1 = results(dds, contrast=list("ResponseSensitive.Treatmentpre"))

# Post (Treatment) vs Pre (Control) within the group "Resistant"
res1 = results(dds, contrast=list("ResponseResistant.Treatmentpre"))

Next, I also want to compare Post vs Pre considering patients effect. I think, I need to build another model matrix with adding new pt.n column numbered as per the Treatment. Is this correct?

m1 = model.matrix(~ Treatment + Treatment:pt.n + Treatment:Response, samples)

Thank you again for the help.

ADD REPLY • link 5.0 years ago sutturka • 0

1

Entering edit mode

Rather than continuing to send alerts to the limma and edgeR maintainers, for additional questions specific to DESeq2 could you make a new post?

ADD REPLY • link 5.0 years ago Michael Love 41k