Question: DESeq2 Ribosome Profiling Time-series: Significantly Different Results for Wald and LRT test
gravatar for indrik.wijaya
18 days ago by
indrik.wijaya0 wrote:

Hi all, I'm running differential expression analysis on Ribosome Profiling Time-series data (ie I would like to study differences in translational efficiencies at different time points).

My design matrix looks something like this:

sample_name    time    assay
rna_0_r1    t0    rna
rna_1_r1    t1    rna
rna_2_r1    t2    rna
rna_3_r1    t3    rna
rna_4_r1    t4    rna
rna_5_r1    t5    rna
rna_6_r1    t6    rna
rna_0_r2    t0    rna
rna_1_r2    t1    rna
rna_2_r2    t2    rna
rna_3_r2    t3    rna
rna_4_r2    t4    rna
rna_5_r2    t5    rna
rna_6_r2    t6    rna
rpf_0_r1    t0    rpf
rpf_1_r1    t1    rpf
rpf_2_r1    t2    rpf
rpf_3_r1    t3    rpf
rpf_4_r1    t4    rpf
rpf_5_r1    t5    rpf
rpf_6_r1    t6    rpf
rpf_0_r2    t0    rpf
rpf_1_r2    t1    rpf
rpf_2_r2    t2    rpf
rpf_3_r2    t3    rpf
rpf_4_r2    t4    rpf
rpf_5_r2    t5    rpf
rpf_6_r2    t6    rpf

I've followed the methods described in for time-series experiment, so what I've done:

design = ~ assay + time + assay:time 
dds <- DESeqDataSetFromMatrix(countData = df_te, colData = colData,design = design)
reduced = ~ assay + time 
dds <- DESeq(dds, test='LRT', reduced = reduced)

resultsNames(dds) gives me:

[1] "Intercept" "assay_rpf_vs_rna" "time_t1_vs_t0" "time_t2_vs_t0" "time_t3_vs_t0"
[6] "time_t4_vs_t0" "time_t5_vs_t0" "time_t6_vs_t0" "assayrpf.timet1" "assayrpf.timet2"
[11] "assayrpf.timet3" "assayrpf.timet4" "assayrpf.timet5" "assayrpf.timet6"

Then, when I want to find out translationally differentially expressed genes (with reference to time point 0), let's say at time point 1,

res_1_te_0 <- results(dds, name = 'assayrpf.timet1', test = "Wald")

I get very few genes (14 DE genes, 6410 low count genes) when I used Wald test (as suggested from the workflow above)

When I removed Wald test, ie

res_1_te_0 <- results(dds, name = 'assayrpf.timet1')

I get much more genes (2517 DE genes). Does anyone know why the two tests give very different results? Thank you.

*the DESeq2 version used is DESeq2_1.22.1

the sessionInfo()

R version 3.5.1 (2018-07-02)

Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] DESeq2_1.22.1               SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [4] BiocParallel_1.16.2         matrixStats_0.54.0          Biobase_2.42.0             
 [7] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1         IRanges_2.16.0             
[10] S4Vectors_0.20.1            BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.5.1          Formula_1.2-3         
 [4] assertthat_0.2.0       latticeExtra_0.6-28    blob_1.1.1            
 [7] GenomeInfoDbData_1.2.0 yaml_2.2.0             pillar_1.3.1          
[10] RSQLite_2.1.1          backports_1.1.3        lattice_0.20-38       
[13] glue_1.3.0             digest_0.6.18          RColorBrewer_1.1-2    
[16] XVector_0.22.0         checkmate_1.8.5        colorspace_1.3-2      
[19] htmltools_0.3.6        Matrix_1.2-15          plyr_1.8.4            
[22] XML_3.98-1.16          pkgconfig_2.0.2        genefilter_1.64.0     
[25] zlibbioc_1.28.0        purrr_0.2.5            xtable_1.8-3          
[28] scales_1.0.0           htmlTable_1.12         tibble_1.4.2          
[31] annotate_1.60.0        ggplot2_3.1.0          nnet_7.3-12           
[34] lazyeval_0.2.1         survival_2.43-3        magrittr_1.5          
[37] crayon_1.3.4           memoise_1.1.0          foreign_0.8-71        
[40] tools_3.5.1            data.table_1.11.8      stringr_1.3.1         
[43] locfit_1.5-9.1         munsell_0.5.0          cluster_2.0.7-1       
[46] AnnotationDbi_1.44.0   bindrcpp_0.2.2         compiler_3.5.1        
[49] rlang_0.3.0.1          grid_3.5.1             RCurl_1.95-4.11       
[52] rstudioapi_0.8         htmlwidgets_1.3        bitops_1.0-6          
[55] base64enc_0.1-3        gtable_0.2.0           DBI_1.0.0             
[58] R6_2.3.0               gridExtra_2.3          knitr_1.21            
[61] dplyr_0.7.8            bit_1.1-14             bindr_0.1.1           
[64] Hmisc_4.1-1            stringi_1.2.4          Rcpp_1.0.0            
[67] geneplotter_1.60.0     rpart_4.1-13           acepack_1.4.1         
[70] tidyselect_0.2.5       xfun_0.4              

ADD COMMENTlink modified 18 days ago by Michael Love21k • written 18 days ago by indrik.wijaya0
Answer: DESeq2 Ribosome Profiling Time-series: Significantly Different Results for Wald
gravatar for Michael Love
18 days ago by
Michael Love21k
United States
Michael Love21k wrote:

This is explained in ?results in the section on the LRT and in the vignette section on the LRT: when you use test="LRT" you are testing the null hypothesis of multiple coefficients being equal to zero. When you use test="Wald" and specify a single coefficient with the 'name' argument, you are testing the null hypothesis of that coefficient being equal to zero. And you can expect that different null hypotheses will generate different p-values.

ADD COMMENTlink written 18 days ago by Michael Love21k

Hi Mike, thank you for the reply. I apologize for not including this section from your reply that can be found here Difference in Wald and LRT test for two condition RNAseq data. where you said that 

They really give very similar inference. I think the difference in a few genes you are seeing is because of the fold change thresholding.

Running DESeq() with test="LRT" turns off the shrinkage of log fold changes (see vignette, or DESeq2 paper for more on LFC shrinkage).

We recommend that you use results() with 'lfcThreshold' argument rather than the more arbitrary combination of a null hypothesis test of LFC=0 followed by LFC filtering. See the DESeq2 paper for more on this as well.

Either Wald or LRT is appropriate, it's just that the Wald pipeline incorporates LFC shrinkage which we see as a benefit.

So, I expected that both Wald and LRT test will give similar results. Or perhaps, my problem and the problem found in the link are totally different problem?

And my 2nd question is when I did 
res_1_te_0 = results(dds, name = 'assayrpf.timet1'), with 'LRT' test, how does the hypothesis testing look like (ie which coefficient that I tested)? (I'm sorry for the very basic question)

ADD REPLYlink modified 17 days ago • written 17 days ago by indrik.wijaya0

Yes, your problem and the one at that link are very different. You are testing more than one coefficient with the LRT. The results will not be similar to a Wald test of one coefficient.

I’m sorry but your second question is exactly what we’ve answered in the help sections, both in ?results and in the vignette. I don’t know how to phrase it better than it is stated in the documentation.

ADD REPLYlink written 17 days ago by Michael Love21k

Hi, Mike, thank you for the help and sorry for the basic question! I missed out the details from ?results. 

ADD REPLYlink written 17 days ago by indrik.wijaya0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 336 users visited in the last hour