Question: Complex Design matrix for DE analysis
0
gravatar for sutturka
5 months ago by
sutturka0
sutturka0 wrote:

I am working with a complex experiment design shown below and I am not able to build the appropriate design matrix. I looked through the DESeq2 manual and this blog and tried different designs and every time I am getting error as "Design matrix not of full rank".

Design Link for the image

Patients highlighted in orange are unpaired (i.e. have only pre-treatment data). I also introduced ind.n column as suggested by DESeq2 manual but may not be necessary in the design.

I am interested in DE testing for:

  1. ResistantvsSensitive considering the patients effect
  2. PostvsPre considering the patients effect (All samples)
  3. PostvsPre considering the patients effect (within Resistant group)
  4. PostvsPre considering the patients effect (within Sensitive group)

Can you please suggest the best possible design matrix?

Also, a very naive question - I have seen design examples starting with "~1", "~0 + ". What is the difference between the two? I found the explanations for - 1. Dependent and independent variables represented on either side of ~ symbol 2. Interaction are represented with : symbol. But I am unsure of "~1" and "~0 +" designs.

limma edger deseq2 • 188 views
ADD COMMENTlink modified 5 months ago by Michael Love25k • written 5 months ago by sutturka0
Answer: Complex Design matrix for DE analysis
0
gravatar for Michael Love
5 months ago by
Michael Love25k
United States
Michael Love25k wrote:

For DESeq2, I would follow the example in the vignette and drop the unpaired samples.

If you use limma, you can include the unpaired samples, and inform the model about correlations by patient. The other package authors may have more advice on that approach.

ADD COMMENTlink written 5 months ago by Michael Love25k

What he said. For limma, use duplicateCorrelation while blocking on Patient, and use a one-way layout in your design matrix with a combined factor containing Response and Treatment. For edgeR, you would need to do different things depending on your contrast:

  1. Subset to only include pre samples and then use a one-way layout on Response to test for differences within pre. Repeat for post.
  2. Use a design matrix that looks something like ~Patient + Response:Treatment (probably need to drop a few terms containing Treatmentpre in their names to get to full rank). This will give you two terms at the right of the matrix, representing the post/pre effect in each response type. Test that the average of this response is non-zero.
  3. Same design as 2, but just test the coefficient corresponding to the post/pre effect in resistant samples.
  4. Same design as 3 but for sensitive samples.

In 2-4, the unpaired samples effectively don't contribute anything to the analysis - their only effect is to contribute to the calculation of the average log-CPM - so you might as well drop them.

ADD REPLYlink modified 5 months ago • written 5 months ago by Aaron Lun25k

Dear Micheal and Aaron,

Thank you for the response. I am starting with DESeq2 and next work with limma.

I removed the unpaired samples and then able to get it running correctly. I am posting the code below as it may help someone.

#Add patients IDs specific to Response
r_vec = rep(1:6, each = 2)
s_vec = rep(1:9, each = 2)
samples$pt.n <- factor(c(r_vec, s_vec))

#build Model Matrix
m1 = model.matrix(~ Response + Response:pt.n + Response:Treatment, samples)

#Remove All zero columns
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
idx <- which(all.zero)
m1 <- m1[,-idx]
colnames(m1)

dds<- DESeqDataSetFromMatrix(countData = countData,
                             colData = samples,
                             design = m1)

dds$Response = relevel(dds$Response, "Sensitive")
dds$Treatment = relevel(dds$Treatment, "pre")

dds <- DESeq(dds)
resultsNames(dds)

 [1] "Intercept"                      "ResponseSensitive"             
 [3] "ResponseResistant.pt.n2"        "ResponseSensitive.pt.n2"       
 [5] "ResponseResistant.pt.n3"        "ResponseSensitive.pt.n3"       
 [7] "ResponseResistant.pt.n4"        "ResponseSensitive.pt.n4"       
 [9] "ResponseResistant.pt.n5"        "ResponseSensitive.pt.n5"       
[11] "ResponseResistant.pt.n6"        "ResponseSensitive.pt.n6"       
[13] "ResponseSensitive.pt.n7"        "ResponseSensitive.pt.n8"       
[15] "ResponseSensitive.pt.n9"        "ResponseResistant.Treatmentpre"
[17] "ResponseSensitive.Treatmentpre"

Can you please confirm that my comparison interpretations below are correct as deduced from the resultsNames?

# Resistant (Treatment) vs Sensitive (Control)
res1 = results(dds, contrast=list("ResponseSensitive"))

# Post (Treatment) vs Pre (Control) within the group "Sensitive"
res1 = results(dds, contrast=list("ResponseSensitive.Treatmentpre"))

# Post (Treatment) vs Pre (Control) within the group "Resistant"
res1 = results(dds, contrast=list("ResponseResistant.Treatmentpre"))

Next, I also want to compare Post vs Pre considering patients effect. I think, I need to build another model matrix with adding new pt.n column numbered as per the Treatment. Is this correct?

m1 = model.matrix(~ Treatment + Treatment:pt.n + Treatment:Response, samples)

Thank you again for the help.

ADD REPLYlink written 5 months ago by sutturka0
1

Rather than continuing to send alerts to the limma and edgeR maintainers, for additional questions specific to DESeq2 could you make a new post?

ADD REPLYlink modified 5 months ago • written 5 months ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 107 users visited in the last hour