Question

RNASeq Time Course Question

0

Entering edit mode

jihoon.kim.1495 ▴ 10

@4d585356

Last seen 15 months ago

United States

I have a question regarding doing time course analysis with (bulk) RNASeq data.

Currently, I have wildtype vs. Condition at Day Zero (undifferentiated), Day 7, Day 14, Day 21, and Day 28.

Using this Vignette, I used the following code to find a list of genes that had a condition-specific effect at any given time:


ddsTxi <- DESeqDataSetFromTximport(txi, 
                                   colData = sample_colData,
                                   design = ~ condition + age + condition:age)
ddsTxiTC <- DESeq(ddsTxi, test="LRT", reduced = ~ condition + age) #test for interaction term between condition/age

#list of genes that are significantly different at different time points
resTC <- results(ddsTxiTC)

Following the vignette, am I correct in interpreting that by looking at the result name "conditionProband.ageDIV7", I am comparing Wildtype Day7 VS Condition Day 7, or is there a different comparison being made?

If I can't look at those specific result names, and I wanted to compare wildtype vs condition at every time point, would I need to subset the data to compare those 2 specifically, or is there a clever way to set contrasts?

resultsNames(ddsTxiTC)
 [1] "Intercept"                    "condition_Proband_vs_Control" "age_DIV7_vs_Undiff"           "age_DIV14_vs_Undiff"         
 [5] "age_DIV21_vs_Undiff"          "age_DIV28_vs_Undiff"          "conditionProband.ageDIV7"     "conditionProband.ageDIV14"   
 [9] "conditionProband.ageDIV21"    "conditionProband.ageDIV28"   

resDIV7 <- results(ddsTxiTC, name="conditionProband.ageDIV7", test="Wald")

Also, would the superset of the list of DE genes at each time point for the comparison above be equivalent to the list of genes that had a condition-specific effect (from the LRT)?

Lastly, I thought an interesting follow up might be to categorize the genes with species/condition specific differences by their behavior (e.g. those that are always higher in condition at every time point, those that are higher in the first few time points, before becoming similar, etc.)

I was considering clustering the genes' differences in mean counts (generated from plotCounts). Would there be another suggested way within DESeq2?

Thank you so much.

DESeq2 RNASeq TimeCourse • 1.0k views

ADD COMMENT • link 15 months ago jihoon.kim.1495 ▴ 10

score 1 · Answer 1 · 2024-10-08

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 10 hours ago

United States

I am comparing Wildtype Day7 VS Condition Day 7, or is there a different comparison being made?

yes.

Remember, from the workflow, we say: "Keep in mind that the interaction terms are the difference between the two groups at a given time after accounting for the difference at time 0."

Also, would the superset of the list of DE genes at each time point for the comparison above be equivalent to the list of genes that had a condition-specific effect (from the LRT)?

Roughly, yes, although the power for these tests is slightly different so it's not exactly identical. The LRT is preferred.

Lastly, I thought an interesting follow up might be to categorize the genes with species/condition specific differences by their behavior (e.g. those that are always higher in condition at every time point, those that are higher in the first few time points, before becoming similar, etc.) I was considering clustering the genes' differences in mean counts (generated from plotCounts). Would there be another suggested way within DESeq2?

In the workflow we clustering by the coefficients. We say,

"We can furthermore cluster significant genes by their profiles. We extract a matrix of the log2 fold changes using the coef function. Note that these are the maximum likelihood estimates (MLE). For shrunken LFC, one must obtain them one coefficient at a time using lfcShrink."

Both are fine, but we have code for the MLE in the workflow.

ADD COMMENT • link 15 months ago Michael Love 43k

0

Entering edit mode

Thanks for the clarification.

With regards to your last comment about the clustering by the coefficients/log2fold changes, I wanted to clarify that the bottom 6 genes (SPAC11D3.01c to SPAC1002.17c) are (per the vignette) : those that show a strong expression

"baseline samples in minutes 15-60 (red boxes in the bottom left corner), but then have slight differences for the mutant strain (shown in the boxes in the bottom right corner)."

To confirm, results name minute_X_vs_0, was the comparison of (Wildtype + Condition @ Minute X) vs (Wildtype + Condition @ Minute 0). And the results name strainmut.minuteX was the comparison of (Wildtype @ Minute X) vs (Condition @ Minute X).

So the statement is that overall both wildtype and condition have an increased expression of these 6 genes, though there is little difference between wildtype/control (i.e. they are similarly increased in expression regardless of wildtype or condition). Am I correct in this interpretation?

I want to make sure that my interpretations are correct, before I attempt clustering.

enter image description here

ADD REPLY • link 15 months ago jihoon.kim.1495 ▴ 10

0

Entering edit mode

Yes. If you wanted to focus on differences, you could cluster on the subset of coefficients representing differences (interactions).

ADD REPLY • link 15 months ago Michael Love 43k

1

Entering edit mode

I see. You'd mean clustering using only the right 5 columns (in the above example), right?

ADD REPLY • link 15 months ago jihoon.kim.1495 ▴ 10