DESeq2- developmental time series
1
0
Entering edit mode
@stradermarie-7212
Last seen 9.3 years ago
United States

I have an RNAseq dataset that is a developmental time series across 13 time points with 4 biological replicates. I'm interested in identifying differentially expressed genes through time and generate a variance stabilized dataset (vsd) for further analysis. 

My counts file (countData) contains 43469 transcripts across 48 samples. There are some samples missing from the later time points. I set up this model:

>dds<-DESeqDataSetFromMatrix(countData=countData, colData=colData, design= ~replicate + day )

>ddsTC <- DESeq(dds, test="LRT", reduced = ~day )

>res<-results(ddsTC)

This model runs, however, once I call the 'res' function, 90% of the genes get filtered out, leaving 4369 transcripts. To me this seems like quite a lot.

>attr(res, "filterThreshold") 

>#     90% 

>#37.87345 

When I ran this dataset through the original DESeq, considering time points as independent factors which I know is incorrect, I got thousands of differentially expressed genes. I feel like there is a better way to run the model in DESeq2. There is no treatment between time series and running the model above in DESeq2 shows very few genes differentially expressed between biological replicates (<10) which is what I would expect

Thank you for any input. 

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] RColorBrewer_1.1-2      ggplot2_1.0.0           gplots_2.15.0          
 [4] DESeq2_1.4.5            RcppArmadillo_0.4.500.0 Rcpp_0.11.3            
 [7] GenomicRanges_1.16.4    GenomeInfoDb_1.0.2      IRanges_1.22.10        
[10] DESeq_1.16.0            lattice_0.20-29         locfit_1.5-9.1         
[13] Biobase_2.24.0          BiocGenerics_0.10.0    

loaded via a namespace (and not attached):
 [1] annotate_1.42.1      AnnotationDbi_1.26.1 bitops_1.0-6        
 [4] caTools_1.17.1       colorspace_1.2-4     DBI_0.3.1           
 [7] digest_0.6.4         gdata_2.13.3         genefilter_1.46.1   
[10] geneplotter_1.42.0   grid_3.1.0           gtable_0.1.2        
[13] gtools_3.4.1         KernSmooth_2.23-13   MASS_7.3-35         
[16] munsell_0.4.2        plyr_1.8.1           proto_0.3-10        
[19] reshape2_1.4.1       RSQLite_1.0.0        scales_0.2.4        
[22] splines_3.1.0        stats4_3.1.0         stringr_0.6.2       
[25] survival_2.37-7      XML_3.98-1.1         xtable_1.7-4        
[28] XVector_0.4.0  

deseq2 • 3.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

The definition in the man page for ?DESeq for the reduced argument is:

reduced : a reduced formula to compare against, e.g. the full model with a term or terms of interest removed, only used by the likelihood ratio test

The likelihood ratio test compares the difference in explanatory power of a full model to the reduced model. You should remove the variable of interest, which would be 'day', to have a reduced model of ~ replicate, in order to test for any significant differences due to the day variable (differences on any day).

If you update to the latest release version, DESeq2 v1.6, it is also easy after running, 

dds = DESeq(dds, test="LRT", reduced=~replicate)
resLRT = results(dds)

to make a specific comparison of any two days:

resDay2 = results(dds, test="Wald", contrast=c("day","2","0"))
ADD COMMENT

Login before adding your answer.

Traffic: 732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6