DESeq2 : LRT with factorial and continuous data
1
0
Entering edit mode
gmchaput • 0
@gmchaput-22280
Last seen 19 days ago
United States

I wanted to check with the community if the way I set up my DESeq2 analysis is correct for the question I would like to address.

I want to look at the shifts in taxa abundance across Latitude. I've looked at various discussions relating to time series experiments as well as the DESeq workflow & DESeq vignette. And if I understand correctly, I can use LRT to reduce the full model to ~latitude and get the sig differential taxa. Change in abundance is per unit of latitude increase too, correct?

Details about the data:

  • I am using a count matrix of taxa (bacteria/archaea; obtained via kraken/bracken on Illumina read data). There are 3871 obs.
  • Factorial data is Water Body (2 levels) : North_Atlantic and North_Pacific
  • Continuous data is the latitude of where the sample was collected
  • There is no "control" for this data

The full design is ~WaterBody + Lat +WaterBody:Lat This was decided based on PERMANOVA where all three have a significant effect.

Here is my current script:

# Create DESeq2Dataset object
dds = DESeqDataSetFromMatrix(countData = Simp_BactArch, colData = Sample_Data,  design= ~WaterBody + Lat + WaterBody:Lat, tidy=FALSE)

#normalize with estimateSizeFactors() and poscounts to remove genes with zeros
dds = DESeq2::estimateSizeFactors(dds, type="poscounts") #https://support.bioconductor.org/p/63229/ & https://support.bioconductor.org/p/115090/

# Run DESeq2 differential expression analysis
dds2 = DESeq(dds)

#test whether there are taxa abundance differences across Latitude
ddsLAT = DESeq(dds2, test="LRT", reduced = ~ Lat)
resLAT = results(ddsLAT)
resLAT$symbol = mcols(ddsLAT)$symbol
head(resLAT[order(resLAT$padj),],4)

sessionInfo:

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[2] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[2] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [2] RRPP_1.3.0                  RColorBrewer_1.1-3          pheatmap_1.0.12            
 [4] DESeq2_1.36.0               SummarizedExperiment_1.26.1 Biobase_2.56.0             
 [7] MatrixGenerics_1.8.1        matrixStats_0.62.0          GenomicRanges_1.48.0       
[10] GenomeInfoDb_1.32.4         IRanges_2.30.1              S4Vectors_0.34.0           
[13] BiocGenerics_0.42.0         maps_3.4.0                  reshape2_1.4.4             
[16] vegan_2.6-2                 lattice_0.20-45             permute_0.9-7              
[19] phyloseq_1.40.0             statmod_1.4.37              edgeR_3.38.4               
[22] limma_3.52.2                forcats_0.5.2               stringr_1.4.1              
[25] dplyr_1.0.10                purrr_0.3.4                 readr_2.1.2                
[28] tidyr_1.2.0                 tibble_3.1.8                ggplot2_3.3.6              
[31] tidyverse_1.3.2            

loaded via a namespace (and not attached):
  [2] googledrive_2.0.0      colorspace_2.0-3       ellipsis_0.3.2         XVector_0.36.0        
  [5] fs_1.5.2               rstudioapi_0.14        farver_2.1.1           bit64_4.0.5           
  [9] AnnotationDbi_1.58.0   fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3            
 [13] codetools_0.2-18       splines_4.2.1          cachem_1.0.6           geneplotter_1.74.0    
 [17] knitr_1.40             ade4_1.7-19            jsonlite_1.8.0         broom_1.0.1           
 [21] annotate_1.74.0        cluster_2.1.4          dbplyr_2.2.1           png_0.1-7             
 [25] compiler_4.2.1         httr_1.4.4             backports_1.4.1        assertthat_0.2.1      
 [29] Matrix_1.4-1           fastmap_1.1.0          gargle_1.2.0           cli_3.3.0             
 [33] htmltools_0.5.3        tools_4.2.1            igraph_1.3.4           gtable_0.3.1          
 [37] glue_1.6.2             GenomeInfoDbData_1.2.8 Rcpp_1.0.9             cellranger_1.1.0      
 [41] vctrs_0.4.1            Biostrings_2.64.1      rhdf5filters_1.8.0     multtest_2.52.0       
 [45] ape_5.6-2              nlme_3.1-159           iterators_1.0.14       xfun_0.32             
 [49] rvest_1.0.3            lifecycle_1.0.1        XML_3.99-0.10          googlesheets4_1.0.1   
 [53] zlibbioc_1.42.0        MASS_7.3-58.1          scales_1.2.1           vroom_1.5.7           
 [57] hms_1.1.2              parallel_4.2.1         biomformat_1.24.0      rhdf5_2.40.0          
 [61] yaml_2.3.5             memoise_2.0.1          stringi_1.7.8          RSQLite_2.2.16        
 [65] genefilter_1.78.0      foreach_1.5.2          BiocParallel_1.30.3    rlang_1.0.5           
 [69] pkgconfig_2.0.3        bitops_1.0-7           evaluate_0.16          Rhdf5lib_1.18.2       
 [73] labeling_0.4.2         bit_4.0.4              tidyselect_1.1.2       plyr_1.8.7            
 [77] magrittr_2.0.3         R6_2.5.1               generics_0.1.3         DelayedArray_0.22.0   
 [81] DBI_1.1.3              withr_2.5.0            pillar_1.8.1           haven_2.5.1           
 [85] mgcv_1.8-40            survival_3.4-0         KEGGREST_1.36.3        RCurl_1.98-1.8        
 [89] modelr_0.1.9           crayon_1.5.1           utf8_1.2.2             rmarkdown_2.16        
 [93] tzdb_0.3.0             readxl_1.4.1           locfit_1.5-9.6         grid_4.2.1            
 [97] data.table_1.14.2      blob_1.2.3             reprex_2.0.2           digest_0.6.29         
[101] xtable_1.8-4           munsell_0.5.0    

```

DESeq2 DifferentialExpression • 76 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

For interpreting linear models, I'd recommend working with a local statistician or someone familiar with linear models in R. I restrict my time to answering software related questions on the support site, but I can't help guide statistical analysis choices.

~WaterBody + Lat + WaterBody:Lat

compared to

~Lat

is testing that there is any effect of water body, allowing for differences in the water body effect across levels of lat.

ADD COMMENT

Login before adding your answer.

Traffic: 337 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6