I have a question regarding doing time course analysis with (bulk) RNASeq data.
Currently, I have wildtype vs. Condition at Day Zero (undifferentiated), Day 7, Day 14, Day 21, and Day 28.
Using this Vignette, I used the following code to find a list of genes that had a condition-specific effect at any given time:
ddsTxi <- DESeqDataSetFromTximport(txi,
colData = sample_colData,
design = ~ condition + age + condition:age)
ddsTxiTC <- DESeq(ddsTxi, test="LRT", reduced = ~ condition + age) #test for interaction term between condition/age
#list of genes that are significantly different at different time points
resTC <- results(ddsTxiTC)
Following the vignette, am I correct in interpreting that by looking at the result name "conditionProband.ageDIV7", I am comparing Wildtype Day7 VS Condition Day 7, or is there a different comparison being made?
If I can't look at those specific result names, and I wanted to compare wildtype vs condition at every time point, would I need to subset the data to compare those 2 specifically, or is there a clever way to set contrasts?
resultsNames(ddsTxiTC)
[1] "Intercept" "condition_Proband_vs_Control" "age_DIV7_vs_Undiff" "age_DIV14_vs_Undiff"
[5] "age_DIV21_vs_Undiff" "age_DIV28_vs_Undiff" "conditionProband.ageDIV7" "conditionProband.ageDIV14"
[9] "conditionProband.ageDIV21" "conditionProband.ageDIV28"
resDIV7 <- results(ddsTxiTC, name="conditionProband.ageDIV7", test="Wald")
Also, would the superset of the list of DE genes at each time point for the comparison above be equivalent to the list of genes that had a condition-specific effect (from the LRT)?
Lastly, I thought an interesting follow up might be to categorize the genes with species/condition specific differences by their behavior (e.g. those that are always higher in condition at every time point, those that are higher in the first few time points, before becoming similar, etc.)
I was considering clustering the genes' differences in mean counts (generated from plotCounts). Would there be another suggested way within DESeq2?
Thank you so much.
Thanks for the clarification.
With regards to your last comment about the clustering by the coefficients/log2fold changes, I wanted to clarify that the bottom 6 genes (SPAC11D3.01c to SPAC1002.17c) are (per the vignette) : those that show a strong expression
To confirm, results name minute_X_vs_0, was the comparison of (Wildtype + Condition @ Minute X) vs (Wildtype + Condition @ Minute 0). And the results name strainmut.minuteX was the comparison of (Wildtype @ Minute X) vs (Condition @ Minute X).
So the statement is that overall both wildtype and condition have an increased expression of these 6 genes, though there is little difference between wildtype/control (i.e. they are similarly increased in expression regardless of wildtype or condition). Am I correct in this interpretation?
I want to make sure that my interpretations are correct, before I attempt clustering.
Yes. If you wanted to focus on differences, you could cluster on the subset of coefficients representing differences (interactions).
I see. You'd mean clustering using only the right 5 columns (in the above example), right?