DESeq2 Ribosome Profiling Time-series: Significantly Different Results for Wald and LRT test
Entering edit mode
Last seen 3.5 years ago

Hi all, I'm running differential expression analysis on Ribosome Profiling Time-series data (ie I would like to study differences in translational efficiencies at different time points).

My design matrix looks something like this:

sample_name    time    assay
rna_0_r1    t0    rna
rna_1_r1    t1    rna
rna_2_r1    t2    rna
rna_3_r1    t3    rna
rna_4_r1    t4    rna
rna_5_r1    t5    rna
rna_6_r1    t6    rna
rna_0_r2    t0    rna
rna_1_r2    t1    rna
rna_2_r2    t2    rna
rna_3_r2    t3    rna
rna_4_r2    t4    rna
rna_5_r2    t5    rna
rna_6_r2    t6    rna
rpf_0_r1    t0    rpf
rpf_1_r1    t1    rpf
rpf_2_r1    t2    rpf
rpf_3_r1    t3    rpf
rpf_4_r1    t4    rpf
rpf_5_r1    t5    rpf
rpf_6_r1    t6    rpf
rpf_0_r2    t0    rpf
rpf_1_r2    t1    rpf
rpf_2_r2    t2    rpf
rpf_3_r2    t3    rpf
rpf_4_r2    t4    rpf
rpf_5_r2    t5    rpf
rpf_6_r2    t6    rpf

I've followed the methods described in for time-series experiment, so what I've done:

design = ~ assay + time + assay:time 
dds <- DESeqDataSetFromMatrix(countData = df_te, colData = colData,design = design)
reduced = ~ assay + time 
dds <- DESeq(dds, test='LRT', reduced = reduced)

resultsNames(dds) gives me:

[1] "Intercept" "assay_rpf_vs_rna" "time_t1_vs_t0" "time_t2_vs_t0" "time_t3_vs_t0"
[6] "time_t4_vs_t0" "time_t5_vs_t0" "time_t6_vs_t0" "assayrpf.timet1" "assayrpf.timet2"
[11] "assayrpf.timet3" "assayrpf.timet4" "assayrpf.timet5" "assayrpf.timet6"

Then, when I want to find out translationally differentially expressed genes (with reference to time point 0), let's say at time point 1,

res_1_te_0 <- results(dds, name = 'assayrpf.timet1', test = "Wald")

I get very few genes (14 DE genes, 6410 low count genes) when I used Wald test (as suggested from the workflow above)

When I removed Wald test, ie

res_1_te_0 <- results(dds, name = 'assayrpf.timet1')

I get much more genes (2517 DE genes). Does anyone know why the two tests give very different results? Thank you.

*the DESeq2 version used is DESeq2_1.22.1

the sessionInfo()

R version 3.5.1 (2018-07-02)

Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] DESeq2_1.22.1               SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [4] BiocParallel_1.16.2         matrixStats_0.54.0          Biobase_2.42.0             
 [7] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1         IRanges_2.16.0             
[10] S4Vectors_0.20.1            BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.5.1          Formula_1.2-3         
 [4] assertthat_0.2.0       latticeExtra_0.6-28    blob_1.1.1            
 [7] GenomeInfoDbData_1.2.0 yaml_2.2.0             pillar_1.3.1          
[10] RSQLite_2.1.1          backports_1.1.3        lattice_0.20-38       
[13] glue_1.3.0             digest_0.6.18          RColorBrewer_1.1-2    
[16] XVector_0.22.0         checkmate_1.8.5        colorspace_1.3-2      
[19] htmltools_0.3.6        Matrix_1.2-15          plyr_1.8.4            
[22] XML_3.98-1.16          pkgconfig_2.0.2        genefilter_1.64.0     
[25] zlibbioc_1.28.0        purrr_0.2.5            xtable_1.8-3          
[28] scales_1.0.0           htmlTable_1.12         tibble_1.4.2          
[31] annotate_1.60.0        ggplot2_3.1.0          nnet_7.3-12           
[34] lazyeval_0.2.1         survival_2.43-3        magrittr_1.5          
[37] crayon_1.3.4           memoise_1.1.0          foreign_0.8-71        
[40] tools_3.5.1            data.table_1.11.8      stringr_1.3.1         
[43] locfit_1.5-9.1         munsell_0.5.0          cluster_2.0.7-1       
[46] AnnotationDbi_1.44.0   bindrcpp_0.2.2         compiler_3.5.1        
[49] rlang_0.3.0.1          grid_3.5.1             RCurl_1.95-4.11       
[52] rstudioapi_0.8         htmlwidgets_1.3        bitops_1.0-6          
[55] base64enc_0.1-3        gtable_0.2.0           DBI_1.0.0             
[58] R6_2.3.0               gridExtra_2.3          knitr_1.21            
[61] dplyr_0.7.8            bit_1.1-14             bindr_0.1.1           
[64] Hmisc_4.1-1            stringi_1.2.4          Rcpp_1.0.0            
[67] geneplotter_1.60.0     rpart_4.1-13           acepack_1.4.1         
[70] tidyselect_0.2.5       xfun_0.4              

deseq2 ribosome profiling wald lrt • 1.2k views
Entering edit mode
Last seen 2 hours ago
United States

This is explained in ?results in the section on the LRT and in the vignette section on the LRT: when you use test="LRT" you are testing the null hypothesis of multiple coefficients being equal to zero. When you use test="Wald" and specify a single coefficient with the 'name' argument, you are testing the null hypothesis of that coefficient being equal to zero. And you can expect that different null hypotheses will generate different p-values.

Entering edit mode

Hi Mike, thank you for the reply. I apologize for not including this section from your reply that can be found here Difference in Wald and LRT test for two condition RNAseq data. where you said that 

They really give very similar inference. I think the difference in a few genes you are seeing is because of the fold change thresholding.

Running DESeq() with test="LRT" turns off the shrinkage of log fold changes (see vignette, or DESeq2 paper for more on LFC shrinkage).

We recommend that you use results() with 'lfcThreshold' argument rather than the more arbitrary combination of a null hypothesis test of LFC=0 followed by LFC filtering. See the DESeq2 paper for more on this as well.

Either Wald or LRT is appropriate, it's just that the Wald pipeline incorporates LFC shrinkage which we see as a benefit.

So, I expected that both Wald and LRT test will give similar results. Or perhaps, my problem and the problem found in the link are totally different problem?

And my 2nd question is when I did 
res_1_te_0 = results(dds, name = 'assayrpf.timet1'), with 'LRT' test, how does the hypothesis testing look like (ie which coefficient that I tested)? (I'm sorry for the very basic question)

Entering edit mode

Yes, your problem and the one at that link are very different. You are testing more than one coefficient with the LRT. The results will not be similar to a Wald test of one coefficient.

I’m sorry but your second question is exactly what we’ve answered in the help sections, both in ?results and in the vignette. I don’t know how to phrase it better than it is stated in the documentation.

Entering edit mode

Hi, Mike, thank you for the help and sorry for the basic question! I missed out the details from ?results. 


Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6