deseq2 complex time course data with out replicates
2
0
Entering edit mode
@novicebioinforesearcher-13572
Last seen 3.7 years ago
Hello All,

I have a question regarding one of the experiment that we just got rna seq data for, let me first show the design it looks like this,

 sample type time WT Vehicle  10 min KO Vehicle  10 min WT Drug 30min KO Drug 30min WT Drug 1 hour KO Drug 1 hour WT Drug 2 hour KO Drug 2hour WT Drug 3 hour KO Drug 3hour

We do not have replicates for any model (I hope we can still get some data from this model)

we would like to compare changes in gene expression and junction/exon expression between each time point to the vehicle

i.e (veh vs 30min, veh vs 1 hour, beh vs 2 hours, veh vs 3 hours)

My question is  what should the design model look like for deseq2  and dexseq and what are the steps that we need to consider before we do this analysis, would not having replicates be a big issue?

Apologies for asking help upfront, we do not have a statistician that we can reach out on our facility.

Many thanks.

Abi

deseq2 timecourse rnaseq • 1.4k views
0
Entering edit mode

I have been reading some posts and came across this

## Can I use DESeq2 to analyze a dataset without replicates?

If a DESeqDataSet is provided with an experimental design without replicates, a warning is printed, that the samples are treated as replicates for estimation of dispersion. This kind of analysis is only useful for exploring the data, but will not provide the kind of proper statistical inference on differences between groups. Without biological replicates, it is not possible to estimate the biological variability of each gene. More details can be found in the manual page for ?DESeq

Still hoping if we can make some sense as we have experimental validation for the same.

2
Entering edit mode
Gavin Kelly ▴ 590
@gavin-kelly-6944
Last seen 14 months ago
United Kingdom / London / Francis Crick…

While completely agreeing with Michael's answer on the impossibility of testing pairs of timepoints, I'm wondering if you could take some prior expectations into account.  If you're expecting WT and KO to behave similarly at certain timepoints (may be the 10min placebo), then you could label those samples as being replicates - or similarly if you expect one of the conditions to be constant during a certain time-interval, you could label those as replicates (by which I mean giving them all the same <sample, time, type> value.  Not ideal, by any means, as you're saying any variability between these 'virtual' replicates truly reflects the unobserved true replicate variability.

Expanding Michael's point about time profiles, you could model design\$time <- c(10/60, 10/60, 30/60, 30/60, 1, 1, 2, 2, 3, 3) and then having time in the DESeq2 model, would look for things that look linear when plotting log expression against time.  You if you had prior biological reason to believe that the profiles should peak at t=1.5hr,  one could similarly have a '(time-1.5)^2` term. But you'd need to choose a model once-and-for-all, rather than exploring different models until you find something interesting.  As Michael says, it's definitely a case where consulting a statistician would be advised.

0
Entering edit mode

Hello Gavin,

Thank you for your advise, in the past we only looked at 3 hour timepoint so its difficult to assume what would happen in the earlier time point. I am not so familiar with R but I was able to execute sample data and tutorial provided online for deseq2, could you please help with how would my design file look like and what would be my parameters in the Deseq() step?

many thanks

1
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

While you don't have replicates at each time point and so you can't make comparisons at each time point, you can use R to build models of the log of gene expression over time and compare the profiles overall. Have you seen how examples of genes of interest change over time in terms of normalized counts (on log scale)?

0
Entering edit mode

Dr. Michael Love,

Thank you for your reply, I am not sure when you ask "build models of the log of gene expression over time and compare the profiles overall. " about this do you mean just take the gene counts and plot them with out any normalization or statistics? we got this data analyzed by a 3rd party firm they gave use to files one htseq counts and one with cufflinks output. I am guessing when you say counts it would be from htseq (please correct me if i am wrong)

Sincerely,

Abi

0
Entering edit mode
You can't compare at each time point because you don't have replicates, so instead you have to do some modeling of counts over time. This involves making decisions about how complex of functions you want to use for modeling. I'd recommend partnering with a statistician to help with this analysis. Most of all, with few time points and no replicates, you're forced to use only simple function like assuming the way log gene expression changes over time is linear.
0
Entering edit mode

Thank you,  so I should simply take htseq counts and convert it to log scale and see for changes in linear over time, could you please if possible point me to some modeling snippets that i can look over, we will look out for possible collaboration with a statistician.