Question

DESeq2- developmental time series

0

Entering edit mode

stradermarie • 0

@stradermarie-7212

Last seen 9.3 years ago

United States

I have an RNAseq dataset that is a developmental time series across 13 time points with 4 biological replicates. I'm interested in identifying differentially expressed genes through time and generate a variance stabilized dataset (vsd) for further analysis.

My counts file (countData) contains 43469 transcripts across 48 samples. There are some samples missing from the later time points. I set up this model:

>dds<-DESeqDataSetFromMatrix(countData=countData, colData=colData, design= ~replicate + day )

>ddsTC <- DESeq(dds, test="LRT", reduced = ~day )

>res<-results(ddsTC)

This model runs, however, once I call the 'res' function, 90% of the genes get filtered out, leaving 4369 transcripts. To me this seems like quite a lot.

>attr(res, "filterThreshold")

># 90%

>#37.87345

When I ran this dataset through the original DESeq, considering time points as independent factors which I know is incorrect, I got thousands of differentially expressed genes. I feel like there is a better way to run the model in DESeq2. There is no treatment between time series and running the model above in DESeq2 shows very few genes differentially expressed between biological replicates (<10) which is what I would expect

Thank you for any input.

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] RColorBrewer_1.1-2 ggplot2_1.0.0 gplots_2.15.0
[4] DESeq2_1.4.5 RcppArmadillo_0.4.500.0 Rcpp_0.11.3
[7] GenomicRanges_1.16.4 GenomeInfoDb_1.0.2 IRanges_1.22.10
[10] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1
[13] Biobase_2.24.0 BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] annotate_1.42.1 AnnotationDbi_1.26.1 bitops_1.0-6
[4] caTools_1.17.1 colorspace_1.2-4 DBI_0.3.1
[7] digest_0.6.4 gdata_2.13.3 genefilter_1.46.1
[10] geneplotter_1.42.0 grid_3.1.0 gtable_0.1.2
[13] gtools_3.4.1 KernSmooth_2.23-13 MASS_7.3-35
[16] munsell_0.4.2 plyr_1.8.1 proto_0.3-10
[19] reshape2_1.4.1 RSQLite_1.0.0 scales_0.2.4
[22] splines_3.1.0 stats4_3.1.0 stringr_0.6.2
[25] survival_2.37-7 XML_3.98-1.1 xtable_1.7-4
[28] XVector_0.4.0

deseq2 • 3.0k views

ADD COMMENT • link updated 9.3 years ago by Michael Love 41k • written 9.3 years ago by stradermarie • 0

score 0 · Answer 1 · 2015-01-03

The definition in the man page for ?DESeq for the reduced argument is:

reduced : a reduced formula to compare against, e.g. the full model with a term or terms of interest removed, only used by the likelihood ratio test

The likelihood ratio test compares the difference in explanatory power of a full model to the reduced model. You should remove the variable of interest, which would be 'day', to have a reduced model of ~ replicate, in order to test for any significant differences due to the day variable (differences on any day).

If you update to the latest release version, DESeq2 v1.6, it is also easy after running,

dds = DESeq(dds, test="LRT", reduced=~replicate)
resLRT = results(dds)

to make a specific comparison of any two days:

resDay2 = results(dds, test="Wald", contrast=c("day","2","0"))