Hi all, I'm running differential expression analysis on Ribosome Profiling Time-series data (ie I would like to study differences in translational efficiencies at different time points).
My design matrix looks something like this:
sample_name time assay
rna_0_r1 t0 rna
rna_1_r1 t1 rna
rna_2_r1 t2 rna
rna_3_r1 t3 rna
rna_4_r1 t4 rna
rna_5_r1 t5 rna
rna_6_r1 t6 rna
rna_0_r2 t0 rna
rna_1_r2 t1 rna
rna_2_r2 t2 rna
rna_3_r2 t3 rna
rna_4_r2 t4 rna
rna_5_r2 t5 rna
rna_6_r2 t6 rna
rpf_0_r1 t0 rpf
rpf_1_r1 t1 rpf
rpf_2_r1 t2 rpf
rpf_3_r1 t3 rpf
rpf_4_r1 t4 rpf
rpf_5_r1 t5 rpf
rpf_6_r1 t6 rpf
rpf_0_r2 t0 rpf
rpf_1_r2 t1 rpf
rpf_2_r2 t2 rpf
rpf_3_r2 t3 rpf
rpf_4_r2 t4 rpf
rpf_5_r2 t5 rpf
rpf_6_r2 t6 rpf
I've followed the methods described in http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#eda for time-series experiment, so what I've done:
design = ~ assay + time + assay:time
dds <- DESeqDataSetFromMatrix(countData = df_te, colData = colData,design = design)
reduced = ~ assay + time dds <- DESeq(dds, test='LRT', reduced = reduced)
resultsNames(dds) gives me:
[1] "Intercept" "assay_rpf_vs_rna" "time_t1_vs_t0" "time_t2_vs_t0" "time_t3_vs_t0"
[6] "time_t4_vs_t0" "time_t5_vs_t0" "time_t6_vs_t0" "assayrpf.timet1" "assayrpf.timet2"
[11] "assayrpf.timet3" "assayrpf.timet4" "assayrpf.timet5" "assayrpf.timet6"
Then, when I want to find out translationally differentially expressed genes (with reference to time point 0), let's say at time point 1,
res_1_te_0 <- results(dds, name = 'assayrpf.timet1', test = "Wald")
I get very few genes (14 DE genes, 6410 low count genes) when I used Wald test (as suggested from the workflow above)
When I removed Wald test, ie
res_1_te_0 <- results(dds, name = 'assayrpf.timet1')
I get much more genes (2517 DE genes). Does anyone know why the two tests give very different results? Thank you.
*the DESeq2 version used is DESeq2_1.22.1
the sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] DESeq2_1.22.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[4] BiocParallel_1.16.2 matrixStats_0.54.0 Biobase_2.42.0
[7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0
[10] S4Vectors_0.20.1 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] bit64_0.9-7 splines_3.5.1 Formula_1.2-3
[4] assertthat_0.2.0 latticeExtra_0.6-28 blob_1.1.1
[7] GenomeInfoDbData_1.2.0 yaml_2.2.0 pillar_1.3.1
[10] RSQLite_2.1.1 backports_1.1.3 lattice_0.20-38
[13] glue_1.3.0 digest_0.6.18 RColorBrewer_1.1-2
[16] XVector_0.22.0 checkmate_1.8.5 colorspace_1.3-2
[19] htmltools_0.3.6 Matrix_1.2-15 plyr_1.8.4
[22] XML_3.98-1.16 pkgconfig_2.0.2 genefilter_1.64.0
[25] zlibbioc_1.28.0 purrr_0.2.5 xtable_1.8-3
[28] scales_1.0.0 htmlTable_1.12 tibble_1.4.2
[31] annotate_1.60.0 ggplot2_3.1.0 nnet_7.3-12
[34] lazyeval_0.2.1 survival_2.43-3 magrittr_1.5
[37] crayon_1.3.4 memoise_1.1.0 foreign_0.8-71
[40] tools_3.5.1 data.table_1.11.8 stringr_1.3.1
[43] locfit_1.5-9.1 munsell_0.5.0 cluster_2.0.7-1
[46] AnnotationDbi_1.44.0 bindrcpp_0.2.2 compiler_3.5.1
[49] rlang_0.3.0.1 grid_3.5.1 RCurl_1.95-4.11
[52] rstudioapi_0.8 htmlwidgets_1.3 bitops_1.0-6
[55] base64enc_0.1-3 gtable_0.2.0 DBI_1.0.0
[58] R6_2.3.0 gridExtra_2.3 knitr_1.21
[61] dplyr_0.7.8 bit_1.1-14 bindr_0.1.1
[64] Hmisc_4.1-1 stringi_1.2.4 Rcpp_1.0.0
[67] geneplotter_1.60.0 rpart_4.1-13 acepack_1.4.1
[70] tidyselect_0.2.5 xfun_0.4
Hi Mike, thank you for the reply. I apologize for not including this section from your reply that can be found here Difference in Wald and LRT test for two condition RNAseq data. where you said that
So, I expected that both Wald and LRT test will give similar results. Or perhaps, my problem and the problem found in the link are totally different problem?
And my 2nd question is when I did
res_1_te_0 = results(dds, name = 'assayrpf.timet1'), with 'LRT' test, how does the hypothesis testing look like (ie which coefficient that I tested)? (I'm sorry for the very basic question)
Yes, your problem and the one at that link are very different. You are testing more than one coefficient with the LRT. The results will not be similar to a Wald test of one coefficient.
I’m sorry but your second question is exactly what we’ve answered in the help sections, both in ?results and in the vignette. I don’t know how to phrase it better than it is stated in the documentation.
Hi, Mike, thank you for the help and sorry for the basic question! I missed out the details from ?results.