Question: DESeq2 time series tissue specific differences over time
0
2.2 years ago by
Nicholas Owen0 wrote:

Hi all,

I have an RNA-seq data set of samples that I wish to use for analysis of differential expression with DESeq2.

The data has three time points; 1 day, 5 day and 10 days with two different tissue samples taken from each time point (Tissue A, reference and Tissue B), 7 replicates of each.

Initially to test differences between two timepoints or tissues I created a new factor (group) that was a combination of tissue_days and carried out DESeq2 analysis with the design of ~group. However, what I am ideally after is differential expression between the tissues over time, I am interested in those events at all time points including day 1 so I dont want that to be ignored/modelled out.

I have been following the helpful information at : http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments however I do not want to remove any tissue specific

differences over time, they are important to the experiment.

I have used a design formula of  ~ time + tissue + tissue:time  and ended up with the resultsNames(dds) of:

[1] "Intercept"         "Tissue_B_vs_A"  "time_5_vs_1"     "time_10_vs_1"     "TissueB.time5"
[6] "TissueB.time10"

which I have taken from this that:

Tissue_B_vs_A - will give me the results between the two tissues at the reference time point day 1 - is this correct as being the reference time point as seems not from count plots at time 1?
time_5_vs_1 - results between two time points day 5 and 1 for the reference tissue (A)
time_10_vs_1 - results between two time points day 10 and 1 for the reference tissue (A)
TissueB.time5 -  results of changes seen between Tissue B and A at time between time points day 5 and day 1 which is one of the results I am definately after. To get the results for Tissue B vs A at time point 5 vs 1 I believe I need to get results with contrast=list(c("time_5_vs_1", "tissueB.time5") ?
TissueB.time10 - results as for TissueB.time5 but between time points day 10 and day 1 - also what I am after.

However, I would like to see the results for the changes between tissues and time points day 10 vs day 5. Would I need to relevel the reference day time point which is currently 1 to 5 and run the analysis again or can get to this through a better design formula? Likewise I am interested in the time_10_vs_5 result which is not listed but would this also happen with releveling?

I am not sure I have set the design correctly for this experiment so would like and appreciate your feedback.

Questions I am trying to answer:
- Are there differentially expressed genes between tissues at time point 1, or 5, or 10 (I think this is answered with the combined factor) ie time day 1 Tissue B vs A (does not take into account any other changes or if those genes show changes at other time points).
- Are there tissue specific changes over time? What genes show a change in Tissue B over the time points that arent changed in Tissue A or vice versa. Is this possible with a single design ?

-Should I be using LRT with a reduced formula instead?

Many thanks for your time, I do appreciate any feedback.

modified 2.2 years ago • written 2.2 years ago by Nicholas Owen0

Thank you Gavin for taking the time to reply - I appreciate your insight. (snipped as comment moved to under Gavin's post) sorry.

Take a look at ?results.

You need to specify test="Wald" in order to generate p-values for a specific coefficient. Otherwise, the table is the LRT that you specified by 'full' and 'reduced' arguments when you ran test="LRT".

Answer: DESeq2 time series tissue specific differences over time
2
2.2 years ago by
Gavin Kelly560
United Kingdom / London / Francis Crick Institute
Gavin Kelly560 wrote:

I always like to tabulate group-averages in terms of their component estimates when I'm talking through possible comparisons

 1 5 10 A I I+'5' I + 10 B I + B I+B + 5 + B.5 I+ B + 10 +B.10

(the summands all correspond to things in resultsNames in a hopefully obvious manner). Each cell corresponds to samples that are in the corresponding row and column groups.  So bottom-right is timepoint 10 tissue B.

So you can then subtract two (or more) entries to get algebraic expressions for effects.   So the tissue effect at timepoint 1 is the bottom-left cell, subtract the top-left, which is (I+B)-(B), =B, which means you interpreted the Tissue_B_vs_A correctly as the effect of tissue in the first timepoint.

I think your next two (time_5_vs_1 and time_10_vs_1) are correct interpretations, you can do in a similar manner

TissueB.time5 (which I've called B.5 for brevity), you correctly interpret as the difference in tissue effect between timepoints 5 and 1, or equivalently the difference in time response (between 5 and 1) of the two tissues.   To access it, you only need contrast=list( "tissueB.time5").    You can see this as  [(I+B+5 + B.5) - (I+5)] -  [(I+B) -I] =B.5 , and the two bits in the square brackets are the row differences in the second and first column respectively.

You can work out tissue effects at different timepoints by (e.g for the 5-timepoint), [I+B+5+B.5] - [I+5] = B+B.5, so that contrast=list(c("B", "B.5)), similarly for the difference within a tissue for the 5 vs 10 timepoint comparison

Regarding LRT,  if you used the current design as the full, and a reduced that omitted the interaction term. then what you'd get is something approaching your "Are there tissue specific changes over time" question, which is genes that are showing different time profiles between the two different conditions (more general than changing in one but not the other)

Thank you Gavin for taking the time to reply - I appreciate your insight. Apologies for the delay in response, I have been getting my head around things a little more. I wanted to clarify though, but you mentioned the summands should correspond to things in the resultsNames, could you elaborate a little more please?

As far as the model design, I had a look at the matrix.model of the design to get a better understanding of how the resultsNames() came about which is fine.

The section you wrote about TissueB.time5 - thank you for explaining in detail the calculation for getting to that and how to access that resultname. I was looking at how to get the results in a similar manner to compare the difference in tissue effect between time points 10 and 5, (based upon your calculation [(I+B+5 + B.5) - (I+5)] -  [(I+B) -I] =B.5 but for day 10 vs 5)

would be:  [(I + B+ 10 + B.10) - (I + 10)] - [( I + B + 5 + B.5) - (I + 5)] = ( B + B.10 ) - (B + B.5) = B.10 - B.5 ? However I am not sure how to access contrasts substracted, is it possible or have I made a mistake (more than likely).

Having a look further into the results, one thing I have noticed and maybe have been naive about, is that for each of the resultsNames() for example tissueB.time5 when I filter the results tables by adjusted p value <0.05 for example, I always get the same number of significant genes. Looking at the DESeq2 vignette am I right in thinking this is due to the "LRT p value being the likelihood ratio test of all the variables and all the levels" ? I am just surprised with the dataset that differences at results from tissue effect over time point 5 (tissueB.time5), would have the same number as time point 10 (tissueB.time10) as well as tissueB_vs_A.

I think overall, my best approach may be to have multiple designs to get the most out of the data. I was thinking that with ~tissue+time and reduced being ~tissue I would specifically answer if any genes show a difference over time, and the same full with reduced formula of ~time would show any difference over the tissue type. Does that make any sense ?

On a side note, I was concerned that doing the LRT method would result in differences between tissue expression at the reference time point would be ignored (mentioned in the workflow: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments ), is that the case?

Again thank you for your support, sorry for all the questions!

(apologies I also commented on my own post not yours. edited)

"summands should correspond to things in the resultsNames"  just means I was too lazy to write TissueB.Time5, and just wrote B.5 in the table, that you should be able to work out which I really meant, and that all of the individual subcomponents should correspond to something you see in resultsNames.

For the difference in tissues between 10 and 5, you're pretty much there.  After your algebra, you get B.10 - B.5:  the B.10 has a positive contribution, so will be in the first list-slot of your contrast, and the B.5 has a negative contribution, so will be in the second, giving

results(..., contrast=list(c("tissueB.time10"),c("tissueB.time5"))

because the 'list' formulation of DESeq2::results subtracts the sum of the second component from the sum of the first component (here you've only got one term in each component, but I prefer to put in the 'c' as it's all to easy for me to make a typo - indeed I usually put a line-break between the two parts of the list, to make things extra clear).

Michael has already addressed your use of LRT - any time you're interested in the significance of a specific contrast, then Wald is the way to go.  If you're interested in the difference between two models, then LRT is the approach to take. Roughly speaking, LRT'ing ~ time * tissue against a null of ~time + tissue would give you transcripts that are having a tissue-dependent time response (think of it as a generalisation of combining the three contrasts tissueB.time5, tissueB.time10 and tissueB.time5 - tissueB.time10 tissue effects between timepoint pairs).  If you LRT ~time + tissue against ~ time, you'll get (roughly)  tissue differences consistent across timepoints.

Probably worth plotting the expression profile of a few transcripts that are significant for the different contrasts, just to get a feel for what they look like.