DESeq2 - how to build design for multiple paired samples before/after with 2 conditions? (non-crossover)
1
0
Entering edit mode
jahorst • 0
@jahorst-12291
Last seen 7.8 years ago

Please help me build a design that incorporates between/within-patient variance. I see that vignette('DESeq2'), offers a solution for crossover design (the same patient gets each treatment): to repeat the patient label within each treatment. But this is a randomized placebo-controlled clinical trial without crossover. 

We have 2 samples from each patient at each timepoint, 2 timepoints (before,after), 2 treatments (1 per patient). Total 4 samples per patient. I am blinded as to which treatment group is placebo vs active.

> samples                            treatment    time pairs patient.Repeat patient.n
grA.1002.lesion1.before         A 1Before     a              1         1
grA.1002.lesion2.before         A 1Before     b              1         1
grA.1002.lesion1.after          A  2After     a              1         1
grA.1002.lesion2.after          A  2After     b              1         1
grB.1016.lesion1.before         B 1Before     a              2         2
grB.1016.lesion2.before         B 1Before     b              2         2
grB.1016.lesion1.after          B  2After     a              2         2
grB.1016.lesion2.after          B  2After     b              2         2
grA.2002.lesion1.before         A 1Before     a              3         3
grA.2002.lesion2.before         A 1Before     b              3         3
grA.2002.lesion1.after          A  2After     a              3         3
grA.2002.lesion2.after          A  2After     b              3         3
grB.2011.lesion1.before         B 1Before     a              1         4
grB.2011.lesion2.before         B 1Before     b              1         4
grB.2011.lesion1.after          B  2After     a              1         4
grB.2011.lesion2.after          B  2After     b              1         4
grA.3052.lesion1.before         A 1Before     a              2         5
grA.3052.lesion2.before         A 1Before     b              2         5
grA.3052.lesion1.after          A  2After     a              2         5
grA.3052.lesion2.after          A  2After     b              2         5
grB.3058.lesion1.before         B 1Before     a              3         6
grB.3058.lesion2.before         B 1Before     b              3         6
grB.3058.lesion1.after          B  2After     a              3         6
grB.3058.lesion2.after          B  2After     b              3         6

Certainly the most important comparisons in the model are: design=formula(~time*treatment)
And this works.

However, I need help including within patient variance.

~patient.n + time*treatment does not work:

Error in checkFullRank(modelMatrix) : 
      the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

Again, when making the patient.Repeat column redundant across the treatments, and inputting design=formula(~patient.Repeat + time*treatment), it gives results, but comparisons are spuriously made with patient1 as the reference, without comparing patient2 to patient3, and there is a spurious combination across the treatment groups:

~patient.Repeat + time*treatment
resultsNames(dds)
[1] "Intercept"              "patient.Repeat_2_vs_1"      "patient.Repeat_3_vs_1"     
[4] "time_2After_vs_1Before" "treatment_A_vs_B"     "time2After.treatmentA"  

Of course the software is doing what is told, I'm not blaming the software, just trying to ask how to properly represent the experimental model.

I have tried everything I can think of (input as design=formula(*) into DESeqDataSetFromMatrix, prior to running DESeq). They all either fail, or seem to misrepresent the situation.

The best I have come up with that works:

~pairs:treatment + treatment*time
resultsNames(dds)
[1] "Intercept"              "treatment_A_vs_B"     "time_2After_vs_1Before"
[4] "treatmentB.pairsb"       "treatmentA.pairsb"       "treatmentA.time2After"  

Similarly:

~pairs:treatment:time + treatment*time
resultsNames(dds)
[1] "Intercept"                    "treatment_A_vs_B"          
[3] "time_2After_vs_1Before"       "treatmentA.time2After"        
[5] "treatmentB.time1Before.pairsb" "treatmentA.time1Before.pairsb"
[7] "treatmentB.time2After.pairsb"  "treatmentA.time2After.pairsb" 

Again, this runs, and I believe makes sense. I am challenged by wrapping my head around a lack of "pairsA" being directly represented - seems okay as this would be the reference to analyze "pairsB." I was hoping to have "time1Before" be the reference in this 3-way combination, but haven't figured out how to do that. 

And, of course the "treatment2Treatment.time2After" should come last, but again haven't figured out how to make it so. Yet I can just insert this into contrast:

results(dds,contrast=list(c("treatmentA.time2After")))

#or

results(dds,contrast=list(c("treatmentB.time2After")))


This seems like it might be okay... what do y'all think? Do I need betaPrior=FALSE?

One final idea is to combine the treatment variables, then set the contrasts manually:

samples$Group <- factor(paste0(samples$treatment,samples$time,samples$patient.n))

dds <- DESeqDataSetFromMatrix(countData = CountTable, colData=samples, design=formula(~Group))

dds <- DESeq(dds)
> resultsNames(dds)
 [1] "Intercept"          "GroupB1Before2" "GroupB1Before4"
 [4] "GroupB1Before6" "GroupB2After2"  "GroupB2After4" 
 [7] "GroupB2After6"  "GroupA1Before1" "GroupA1Before3"
[10] "GroupA1Before5" "GroupA2After1"  "GroupA2After3" 
[13] "GroupA2After5"

results(dds,contrast=list(c("GroupA2After1","GroupA2After3","GroupA2After5"),c("GroupA1Before1","GroupA1Before3","GroupA1Before5")))

#or

results(dds,contrast=list(c("GroupB2After2","GroupB2After4","GroupB2After6"),c("GroupB1Before2","GroupB1Before4","GroupB1Before6")))


Help!

 

Further details: 
- this happens to be taxonomic calls (~microbiome data).

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] DESeq2_1.14.1              SummarizedExperiment_1.4.0
[3] Biobase_2.34.0             GenomicRanges_1.26.2      
[5] GenomeInfoDb_1.10.2        IRanges_2.8.1             
[7] S4Vectors_0.12.1           BiocGenerics_0.20.0       
deseq2 multifactorial design paired samples biologicalreplicates • 3.9k views
ADD COMMENT
0
Entering edit mode

(updated to fix some variable names)

ADD REPLY
0
Entering edit mode

Thank you for your time Prof. Love. Here are the resulting comparisons:

​resultsNames(dds)
[1] "Intercept"             "treatment_B_vs_A"      "treatmentA.patient2"  
[4] "treatmentB.patient2"   "treatmentA.patient3"   "treatmentB.patient3"  
[7] "treatmentA.time2After" "treatmentB.time2After"

Terms 1,2,7,8 make perfect sense to me. For the remaining, I wonder if it is possible to make terms txB.pt1, txB.pt2, txB.pt3 instead of what is given? Wouldn't that be a more representative comparison?

Also, if you could, please respond to the option above shown under "One final idea is to combine the treatment variables, then set the contrasts manually"

ADD REPLY
0
Entering edit mode

I'd suggest you meet with a local statistician who can help to explain what the terms in the design mean. The design as you have it now will allow you to compare the After vs Before effect for treatment A or B separately, and to contrast them, and it controls for the patient effects (separately for the patients in each treatment).

ADD REPLY
0
Entering edit mode

Thanks very much. Will do. Case closed.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

If I got it right, you have a design like this (with two replicates per row):

A before 1
A after  1
A before 2
A after  2
A before 3
A after  3
B before 4
B after  4
B before 5
B after  5
B before 6
B after  6

I would recommend you recode like this:

A before 1
A after  1
A before 2
A after  2
A before 3
A after  3
B before 1
B after  1
B before 2
B after  2
B before 3
B after  3

(with patient as a factor obviously)

And then follow the example in the vignette, using a design of ~treatment + treatment:patient + treatment:time

ADD COMMENT

Login before adding your answer.

Traffic: 634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6