Hello,
I am analyzing an RNA-seq dataset generated by a previous lab member and I have a question regarding DESeq design for a time-series analysis. The data was generated from a drug treatment performed in vitro (same cell line for all samples) and I have the following samples to work with:
0h - No treatment, a single set of duplicate samples collected 2 days - duplicates collected for drug treatment and DMSO control 10 days - duplicates collected for drug treatment and DMSO control 42 days - duplicates collected for drug treatment and DMSO control
My goal with the analysis is two-fold. First, I want to identify genes that are differentially expressed between drug treatment vs. control at each time point. For this, I plan to analyze only the 2d, 10d, and 42d time points using a design of ~ condition + time + condition:time
and performing the analysis similar to res <- results(dds, name="condX.time10h", test="Wald")
for each time point (DESeq2 analysis for RNAseq data at different time points and different conditions).
Second, I want to identify genes that exhibit a different expression pattern over time for the two treatment groups (ie. starting at the reference level of 0h, gene A's expression increases over time in the drug treatment condition while it remains constant in the DMSO condition).
My question is with regards to how the second analysis should be performed. It seems it would have been ideal to collect separate duplicates at the 0h time point, one for the DMSO condition and one for the drug treatment condition and then a design of ~ condition + time + condition:time
would be used for the full model and a design of ~ condition + time
would be used for the reduced model and a LRT would be performed. However, I was hoping to still include the 0h time point as a reference level if at all possible. As it stands now, in my sampleTable I have a condition column with three conditions: NoTreatment (assigned only to the 0h duplicates), DMSO, and drug and I have a time column with four time points: 0h, 48h, 240h, 1008h.
When I attempted to construct the DESeqDataSet with the following design:
dds_time_course <- DESeqDataSetFromTximport(txi, colData = sampleTable, design = ~ condition + time + condition:time)
I received the following error:
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
Levels or combinations of levels without any samples have resulted in
column(s) of zeros in the model matrix.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
I read through the vignette but was not sure how to appropriately design the DESeq analysis given that I only have one set of duplicates for the 0h time point. I apologize if a similar question has been answered elsewhere and I greatly appreciate any help!
I just wanted to provide an update that for the analysis towards my first goal, I modified the approach to use a design of
condition + time
only and then generated DE results for each time point in the following way:I believe the original approach I described using
~ condition + time + condition:time
andres <- results(dds, name="condX.time10h", test="Wald")
is helpful to control for differences that existed for treatment vs. control at time 0h, but I only have one set of duplicates at time 0h. I hope this makes sense and if this does not seem like the proper way to achieve my first analysis goal, I really appreciate suggestions!