Question

DESeq2 Ribosome Profiling Time-series: Significantly Different Results for Wald and LRT test

0

Entering edit mode

indrik.wijaya ▴ 20

@indrikwijaya-19180

Last seen 3.9 years ago

Hi all, I'm running differential expression analysis on Ribosome Profiling Time-series data (ie I would like to study differences in translational efficiencies at different time points).

My design matrix looks something like this:

sample_name time assay rna_0_r1 t0 rna rna_1_r1 t1 rna rna_2_r1 t2 rna rna_3_r1 t3 rna rna_4_r1 t4 rna rna_5_r1 t5 rna rna_6_r1 t6 rna rna_0_r2 t0 rna rna_1_r2 t1 rna rna_2_r2 t2 rna rna_3_r2 t3 rna rna_4_r2 t4 rna rna_5_r2 t5 rna rna_6_r2 t6 rna rpf_0_r1 t0 rpf rpf_1_r1 t1 rpf rpf_2_r1 t2 rpf rpf_3_r1 t3 rpf rpf_4_r1 t4 rpf rpf_5_r1 t5 rpf rpf_6_r1 t6 rpf rpf_0_r2 t0 rpf rpf_1_r2 t1 rpf rpf_2_r2 t2 rpf rpf_3_r2 t3 rpf rpf_4_r2 t4 rpf rpf_5_r2 t5 rpf rpf_6_r2 t6 rpf

I've followed the methods described in http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#eda for time-series experiment, so what I've done:

design = ~ assay + time + assay:time 
dds <- DESeqDataSetFromMatrix(countData = df_te, colData = colData,design = design)
reduced = ~ assay + time 
dds <- DESeq(dds, test='LRT', reduced = reduced)

resultsNames(dds) gives me:

[1] "Intercept" "assay_rpf_vs_rna" "time_t1_vs_t0" "time_t2_vs_t0" "time_t3_vs_t0"
[6] "time_t4_vs_t0" "time_t5_vs_t0" "time_t6_vs_t0" "assayrpf.timet1" "assayrpf.timet2"
[11] "assayrpf.timet3" "assayrpf.timet4" "assayrpf.timet5" "assayrpf.timet6"

Then, when I want to find out translationally differentially expressed genes (with reference to time point 0), let's say at time point 1,

res_1_te_0 <- results(dds, name = 'assayrpf.timet1', test = "Wald")

I get very few genes (14 DE genes, 6410 low count genes) when I used Wald test (as suggested from the workflow above)

When I removed Wald test, ie

res_1_te_0 <- results(dds, name = 'assayrpf.timet1')

I get much more genes (2517 DE genes). Does anyone know why the two tests give very different results? Thank you.

*the DESeq2 version used is DESeq2_1.22.1

the sessionInfo()

R version 3.5.1 (2018-07-02)

Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods [9] base

other attached packages: [1] DESeq2_1.22.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0 [4] BiocParallel_1.16.2 matrixStats_0.54.0 Biobase_2.42.0 [7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0 [10] S4Vectors_0.20.1 BiocGenerics_0.28.0

loaded via a namespace (and not attached): [1] bit64_0.9-7 splines_3.5.1 Formula_1.2-3 [4] assertthat_0.2.0 latticeExtra_0.6-28 blob_1.1.1 [7] GenomeInfoDbData_1.2.0 yaml_2.2.0 pillar_1.3.1 [10] RSQLite_2.1.1 backports_1.1.3 lattice_0.20-38 [13] glue_1.3.0 digest_0.6.18 RColorBrewer_1.1-2 [16] XVector_0.22.0 checkmate_1.8.5 colorspace_1.3-2 [19] htmltools_0.3.6 Matrix_1.2-15 plyr_1.8.4 [22] XML_3.98-1.16 pkgconfig_2.0.2 genefilter_1.64.0 [25] zlibbioc_1.28.0 purrr_0.2.5 xtable_1.8-3 [28] scales_1.0.0 htmlTable_1.12 tibble_1.4.2 [31] annotate_1.60.0 ggplot2_3.1.0 nnet_7.3-12 [34] lazyeval_0.2.1 survival_2.43-3 magrittr_1.5 [37] crayon_1.3.4 memoise_1.1.0 foreign_0.8-71 [40] tools_3.5.1 data.table_1.11.8 stringr_1.3.1 [43] locfit_1.5-9.1 munsell_0.5.0 cluster_2.0.7-1 [46] AnnotationDbi_1.44.0 bindrcpp_0.2.2 compiler_3.5.1 [49] rlang_0.3.0.1 grid_3.5.1 RCurl_1.95-4.11 [52] rstudioapi_0.8 htmlwidgets_1.3 bitops_1.0-6 [55] base64enc_0.1-3 gtable_0.2.0 DBI_1.0.0 [58] R6_2.3.0 gridExtra_2.3 knitr_1.21 [61] dplyr_0.7.8 bit_1.1-14 bindr_0.1.1 [64] Hmisc_4.1-1 stringi_1.2.4 Rcpp_1.0.0 [67] geneplotter_1.60.0 rpart_4.1-13 acepack_1.4.1 [70] tidyselect_0.2.5 xfun_0.4

deseq2 ribosome profiling wald lrt • 1.3k views

ADD COMMENT • link updated 5.6 years ago by Michael Love 42k • written 5.6 years ago by indrik.wijaya ▴ 20

score 0 · Answer 1 · 2019-01-03

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 9 hours ago

United States

This is explained in ?results in the section on the LRT and in the vignette section on the LRT: when you use test="LRT" you are testing the null hypothesis of multiple coefficients being equal to zero. When you use test="Wald" and specify a single coefficient with the 'name' argument, you are testing the null hypothesis of that coefficient being equal to zero. And you can expect that different null hypotheses will generate different p-values.

ADD COMMENT • link 5.6 years ago Michael Love 42k

0

Entering edit mode

Hi Mike, thank you for the reply. I apologize for not including this section from your reply that can be found here Difference in Wald and LRT test for two condition RNAseq data. where you said that

They really give very similar inference. I think the difference in a few genes you are seeing is because of the fold change thresholding.

Running DESeq() with test="LRT" turns off the shrinkage of log fold changes (see vignette, or DESeq2 paper for more on LFC shrinkage).

We recommend that you use results() with 'lfcThreshold' argument rather than the more arbitrary combination of a null hypothesis test of LFC=0 followed by LFC filtering. See the DESeq2 paper for more on this as well.

Either Wald or LRT is appropriate, it's just that the Wald pipeline incorporates LFC shrinkage which we see as a benefit.

So, I expected that both Wald and LRT test will give similar results. Or perhaps, my problem and the problem found in the link are totally different problem?

And my 2nd question is when I did
res_1_te_0 = results(dds, name = 'assayrpf.timet1'), with 'LRT' test, how does the hypothesis testing look like (ie which coefficient that I tested)? (I'm sorry for the very basic question)

ADD REPLY • link 5.6 years ago indrik.wijaya ▴ 20

0

Entering edit mode

Yes, your problem and the one at that link are very different. You are testing more than one coefficient with the LRT. The results will not be similar to a Wald test of one coefficient.

I’m sorry but your second question is exactly what we’ve answered in the help sections, both in ?results and in the vignette. I don’t know how to phrase it better than it is stated in the documentation.

ADD REPLY • link 5.6 years ago Michael Love 42k

0

Entering edit mode

Hi, Mike, thank you for the help and sorry for the basic question! I missed out the details from ?results.

ADD REPLY • link 5.6 years ago indrik.wijaya ▴ 20