Search
Question: Deviance decomposition with DESeq2
0
gravatar for Giovanni Bacci
4 months ago by
Italy, Florence, University of Florence
Giovanni Bacci0 wrote:

Hello,

I'm using DESeq2 to analyse a large dataset with basically three predictors:

1 - Subjects = the ID of the subject involved in the experiment (categorical)

2 - Culture medium = the culture medium (categorical)

3 - Time = the number of days (continuous)

I would like to inspect the proportion of variance explained by the three predictors for each gene to categorize them based on the most influencing factor (this is only an exploratory analysis). Following the DESeq2 manual, the paper, and this post, I wrote this little script but I'm not sure if it is correct:

models <- list(
  all = "~ medium + subject + days",
  no.food = "~ subject + days",
  no.subject = "~ days",
  null = "~ 1"
)

deviances <- lapply(models, function(m){
  design(dds) <- as.formula(m)
  dds <- DESeq(dds, test = "Wald", fitType = "parametric")
  mcols(dds)$deviance
})

residual <- deviances[[length(deviances)]]

deviances <- lapply(seq_along(deviances)[-1], function(i){
  n <- i - 1
  abs(deviances[[i]] - deviances[[n]])
})

deviances <- data.frame(do.call(cbind, deviances))
deviances$residual <- residual - rowSums(deviances)
deviances <- data.frame(t(apply(deviances, 1, function(x) x / sum(x))))
deviances <- setNames(deviances, c("medium", "subject", "days", "residual"))

Probably I'm doing something wrong since I ended up with negative residuals for some genes ...

Thanks in advance!

cheers

Giovanni

 

sessionInfo():

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggtern_2.2.1               vegan_2.4-6                lattice_0.20-35           
 [4] permute_0.9-4              ggsci_2.8                  scales_0.5.0.9000         
 [7] bindrcpp_0.2               dplyr_0.7.4                ape_5.0                   
[10] ggthemes_3.4.0             DESeq2_1.18.1              SummarizedExperiment_1.8.1
[13] DelayedArray_0.4.1         matrixStats_0.53.1         GenomicRanges_1.30.2      
[16] GenomeInfoDb_1.14.0        IRanges_2.12.0             S4Vectors_0.16.0          
[19] phyloseq_1.23.1            metagenomeSeq_1.20.1       RColorBrewer_1.1-2        
[22] glmnet_2.0-13              foreach_1.4.4              Matrix_1.2-11             
[25] limma_3.34.9               Biobase_2.38.0             BiocGenerics_0.24.0       
[28] gridExtra_2.3              ggplot2_2.2.1             

loaded via a namespace (and not attached):
 [1] nlme_3.1-131.1         bitops_1.0-6           bit64_0.9-7            latex2exp_0.4.0       
 [5] tensorA_0.36           tools_3.4.4            backports_1.1.2        utf8_1.1.3            
 [9] R6_2.2.2               rpart_4.1-13           KernSmooth_2.23-15     Hmisc_4.1-1           
[13] DBI_0.7                lazyeval_0.2.1         mgcv_1.8-23            colorspace_1.3-2      
[17] ade4_1.7-10            nnet_7.3-12            bayesm_3.1-0.1         bit_1.1-12            
[21] compiler_3.4.4         compositions_1.40-1    cli_1.0.0              htmlTable_1.11.2      
[25] labeling_0.3           caTools_1.17.1         checkmate_1.8.5        DEoptimR_1.0-8        
[29] robustbase_0.92-8      genefilter_1.60.0      stringr_1.3.0          digest_0.6.15         
[33] foreign_0.8-69         XVector_0.18.0         base64enc_0.1-3        pkgconfig_2.0.1       
[37] htmltools_0.3.6        htmlwidgets_1.0        rlang_0.2.0            rstudioapi_0.7        
[41] RSQLite_2.0            bindr_0.1              energy_1.7-2           jsonlite_1.5          
[45] BiocParallel_1.12.0    gtools_3.5.0           acepack_1.4.1          RCurl_1.95-4.10       
[49] magrittr_1.5           GenomeInfoDbData_1.0.0 Formula_1.2-2          biomformat_1.6.0      
[53] Rcpp_0.12.15           munsell_0.4.3          proto_1.0.0            stringi_1.1.6         
[57] yaml_2.1.16            MASS_7.3-49            zlibbioc_1.24.0        rhdf5_2.22.0          
[61] gplots_3.0.1           plyr_1.8.4             blob_1.1.0             gdata_2.18.0          
[65] crayon_1.3.4           Biostrings_2.46.0      splines_3.4.4          multtest_2.34.0       
[69] annotate_1.56.1        locfit_1.5-9.1         knitr_1.20             pillar_1.1.0          
[73] igraph_1.1.2           boot_1.3-20            geneplotter_1.56.0     reshape2_1.4.3        
[77] codetools_0.2-15       glue_1.2.0             XML_3.98-1.10          latticeExtra_0.6-28   
[81] data.table_1.10.4-3    gtable_0.2.0           assertthat_0.2.0       xtable_1.8-2          
[85] survival_2.41-3        tibble_1.4.2           iterators_1.0.9        AnnotationDbi_1.40.0  
[89] memoise_1.1.0          cluster_2.0.6         

 

ADD COMMENTlink modified 4 months ago by Michael Love19k • written 4 months ago by Giovanni Bacci0
0
gravatar for Michael Love
4 months ago by
Michael Love19k
United States
Michael Love19k wrote:

The deviance I am returning in mcols(dds) is -2 * log likelihood.

The LRT statistic I return is 2 * (full log likelihood - reduced log likelihood) = reduced deviance - full deviance, which is assumed to follow a chi square distribution. Since the log likelihood increases when coefficients are added to the model, and deviance as defined above has the opposite sign, it's expected that reduced deviance > full deviance. You should not put an absolute value in the difference calculation above.

ADD COMMENTlink written 4 months ago by Michael Love19k

Sorry if I come back to this after quite a long time, but I'm still struggling to produce reliable results. It seems that the above calculation produced some negative values. I would say that the factor considered is somehow worsening the model by increasing the deviance of the full model in respect with the reduced one but I don't know how to deal with it.  Should I treat those values as zeros or should I leave them as negative values?

Thanks again!

Giovanni

ADD REPLYlink written 9 weeks ago by Giovanni Bacci0

I would suggest using the dispersion from the full model. If you run separate models, you are re-estimating dispersion each time. You can run estimateDispersion(dds) for the full model and then for the reduced models, you can set dispersion with dispersions(dds_reduce) <- dispersions(dds_full). Then you only need to re-run the nbinomWald() function, not the whole procedure.

ADD REPLYlink written 9 weeks ago by Michael Love19k

Dear Michael,
thanks for your help! I changed my code as follows:

# Full model
dds.full <- DESeq(dds.full, test = "Wald", fitType = "parametric", betaPrior = T)

# Deviance estimation of reduced models
deviances <- sapply(models[-1], function(m){
  dds <- dds.full # Copying the full model
  design(dds) <- as.formula(m) # Changing design
  dds <- nbinomWaldTest(dds, betaPrior = T) # Re-run wald test
  return(mcols(dds)$deviance) # Returning deviances
})
deviances <- cbind(mcols(dds.full)$deviance, deviances) # Adding deviances of the full model

This way I need only to change the design of my experiment leaving everything else as already estimated in the full model . Then, to build the full deviance table:

deviances <- data.frame(medium = deviances[,2] - deviances[,1],
                        food = deviances[,3] - deviances[,2],
                        days = deviances[,4] - deviances[,3]) %>%
  mutate(residual = deviances[,4] - (medium + food + days)) %>%
  as.matrix() %>%
  prop.table(margin = 1) %>%
  data.frame(row.names = rownames(dds.full))

This seems to work, now I get only positive values and they seem reasonable. Does it sound good to you?

Thaks again,

ciao,

Giovanni

 

ADD REPLYlink written 9 weeks ago by Giovanni Bacci0

I wouldn't use the beta prior if you're calculating deviances and comparing them across models. The LRT theory is not for biased estimates of coefficients (the beta prior induces a bias).

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Michael Love19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 123 users visited in the last hour