Question: DESeq2 time series: error while measuring gene dispersion.
0
4.5 years ago by
anand m t40
Singapore
anand m t40 wrote:

Hi all,

I have been trying DESeq2 to analyse my RNAseq data. Data is from sequencing a cell line treated with a drug at different time point intervals. My sampleTable looks like this:

DataFrame with 5 rows and 2 columns
treatment     time
<factor> <factor>
cl_0h  untreated        0
cl_2h    treated        2
cl_4h    treated        4
cl_6h    treated        6
cl_12h   treated       12

I want to find expression change in genes with respect to base line time (0 hour). My design and subsequent commands are as follows, but I get an error at estimateDispersions step:

design(dds) <- formula(~ time + treatment + treatment:time)

dds <- estimateSizeFactors(dds)

dds <- estimateDispersions(dds)
gene-wise dispersion estimates
Error in optim(betaRow, objectiveFn, method = "L-BFGS-B", lower = -30,  :
L-BFGS-B needs finite values of 'fn'
Is it because lack of replicates? I tried changing fitType argument but I get the same error. And is my design correct for the above sampleTable ?

Thanks.

timecourse deseq2 • 1.1k views
modified 4.5 years ago by Michael Love24k • written 4.5 years ago by anand m t40
Answer: DESeq2 time series: error while measuring gene dispersion.
2
4.5 years ago by
Michael Love24k
United States
Michael Love24k wrote:

Yes, it is because of a lack of replicates. If you had provided this design at object construction, you would get a more helpful message, and in the current version of DESeq2 (1.6) you would also get a more useful message from estimateDispersions(). If possible, we recommend users updating to the most current release of R and Bioconductor.

There are a few options with an experimental design like this:

add more untreated samples ( treated samples if possible), and use a design of ~time. Then you can contrast each non-zero time point against time = 0 using the 'contrast' argument of ?results.

2) or you can add degrees of freedom by assuming a linear relationship between time and log counts. I'd prefer adding biological replicates in a case like this, because without biological replicates, it's hard to know if the parametrization is correct. The linear assumption has few parameters, so you're less likely to have problems overfitting a curve to 5 points. This would look like:

dds\$time = c(0,2,4,6,12)

design(dds) = ~ time

What you will miss with this design is genes where the expression rises and falls. Another option is a quadratic relationship between time and log counts:

dds = DESeq(dds, test="LRT", full = ~ 1 + time + I(time^2), reduced = ~ 1)

Then you would capture the rising and falling case (or falling and rising).

Hi Mike,

Thank you for a detailed description. I have updated my R and bioc. As you mentioned newer version (1.6.3) does give detailed error desciption. However I have some question. Since we don not have biological replicates (blame expensive technology!) first option is not possible. When you say "add degrees of freedom by assuming a linear relationship between time and log counts" does this mean I add pseudo replicates based on counts from linear relationships? But, the quadratic relationship worked out fine. This also captured overall changes as you mentioned which is nice. Thanks again.

ADD REPLYlink written 4.5 years ago by anand m t40
1

"does this mean I add pseudo replicates based on counts from linear relationships?"

no, I just meant: use a design that assumes a simple functional relationship between log counts and time. If quadratic works for your purposes, then I'd stick with that.

The residual degrees of freedom is (# of samples) - (# of parameters). When you try to fit time as a factor, you have 5 samples and 5 parameters to fit in the model (one for each level of the factor). This leaves no residual degrees of freedom. However, if you assume the functional relationship is linear or quadratic, you have 2 or 3 parameters to fit, respectively. Either of these gives you some degrees of freedom.