Entering edit mode
Hi,
I am trying to use DESeq2 to perform DE analysis on longitudinal data with sample groups as mentioned below:
subject day condition
patient1 T0 baseline
patient2 T0 baseline
patient3 T0 baseline
patient1 T1 trtA
patient1 T1 trtB
patient2 T1 trtA
patient2 T1 trtB
patient3 T1 trtA
patient3 T1 trtB
patient1 T2 trtA
patient1 T2 trtB
patient2 T2 trtA
patient2 T2 trtB
patient3 T2 trtA
patient3 T2 trtB
patient1 T3 trtA
patient1 T3 trtB
patient2 T3 trtA
patient2 T3 trtB
patient3 T3 trtA
patient3 T3 trtB
patient1 T4 trtA
patient1 T4 trtB
patient2 T4 trtA
patient2 T4 trtB
patient3 T4 trtA
patient3 T4 trtB
The comparisons of interest are:
- T1 vs. T0, T2 vs. T0, T3 vs. T0, and T4 vs. T0 for trtA and trtB
- trtA vs. trtB at T1, trtA vs. trtB at T2, trtA vs. trtB at T3, and trtA vs. trtB at T4
Following the time course experiment section from the link below: http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#time-course-experiments
ddsTC <- DESeqDataSetFromMatrix(countData = round(PooledCounts), colData = sg,
+ design = ~ condition + day + condition:day)
converting counts to integer mode
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
Levels or combinations of levels without any samples have resulted in
column(s) of zeros in the model matrix.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
In addition: Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
some variables in design formula are characters, converting to factors
Can you please help with this?
Thanks
Thank you for the reply. The baseline (T0) is common for both treatments. How would you suggest defining the design in DESeq2?
Yes, I understand, but the way the software works is if all the T0 are also Baseline, then the software will throw that error. You can't do it like that.
To just compare one subset of samples to another, the far simple way is to make a new column of Treatment_Time, and make that your design, and use contrasts to specify which Treatment_Time to compare to what other Treatment_Time.
Thank you for the feedback. For the approach you mentioned, it's not clear how the design controls for baseline conditions when comparing two treatments at a specific time point. And, how matched samples are being accounted for?
You don't "control for baseline conditions", like that. If you want to compare two subsets to each other, just compare them. Adding the variability and uncertainty from a third set of samples won't improve things.
Just add subject to the design, like you would for batch, or any other factor that you want the software to account for, but don't care about for the question at hand.