Hello Mike, please kindly advise on the following question regarding analysis with deseq2. This might be a very general inquiry about running deseq2 with a continuous variable, but the characteristic of this variable is confusing me out.
My dataset is from a human bulk RNA-sequencing, consisting of samples age ranging between 16-76. The age isn't completely continuous, regarding that there are missing ages and some ages have multiple samples of that age. My goal is to get a list of differentially expressed genes across this entire age range. To do so I'm considering my variable as continuous, but I'm not sure if I'm conducting this correctly within DESeq2.
Below is part of my coldata:
> coldata
Age Sex
N1864 16 M
N1854 18 M
N1724 19 M
N1744 20 F
N1216 22 F
N1505 24 M
N1311 25 M
N1749 25 M
and so on.
I ran the below codes to set up the data and run the analysis using LRT.
Data Setup
```{r} cts<-as.matrix(read.csv("counts.protein.ordered.reduced.csv",fileEncoding="UTF-8-BOM", row.names=1)) colnames(cts)<-NULL coldata<-read.csv("sample.info.reduced.csv",fileEncoding="UTF-8-BOM", row.names=1) coldata<-coldata[,c("Age","Sex")] coldata$Age<-as.factor(coldata$Age) dds <- DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~Sex+Age)
# Run Analysis
```{r}
dds <- DESeq(dds, test="LRT", reduced=~Sex)
resultsNames(dds)
res <- results(dds)
res
When I do so, I get the below:
> res
log2 fold change (MLE): Age 76 vs 16
LRT p-value: '~ Sex + Age' vs '~ Sex'
DataFrame with 18015 rows and 6 columns
This made me think that the reported log2FC is simply comparing the oldest vs. youngest sample, and not the change per unit change of age. (Is that true?) So I ran this again without factorizing age, which seemed like it worked. However, then the plotcounts function would not work since intgroup=age is not a factor.
Is this the correct way to approach this kind of variable in DESeq2? Please let me know your opinion on this. Thanks! Jasmine
Session info attached:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggrepel_0.8.1 pcaExplorer_2.10.1 RColorBrewer_1.1-2
[4] vsn_3.52.0 pheatmap_1.0.12 IHW_1.12.0
[7] apeglm_1.6.0 ggplot2_3.2.1 DESeq2_1.24.0
[10] SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1
[13] matrixStats_0.55.0 Biobase_2.44.0 GenomicRanges_1.36.1
[16] GenomeInfoDb_1.20.0 IRanges_2.18.3 S4Vectors_0.22.1
[19] BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] GOstats_2.50.0 backports_1.1.5 Hmisc_4.2-0 NMF_0.21.0
[5] plyr_1.8.4 igraph_1.2.4.1 lazyeval_0.2.2 GSEABase_1.46.0
[9] shinydashboard_0.7.1 splines_3.6.1 crosstalk_1.0.0 gridBase_0.4-7
[13] lpsymphony_1.12.0 digest_0.6.21 foreach_1.4.7 htmltools_0.4.0
[17] GO.db_3.8.2 magrittr_1.5 checkmate_1.9.4 memoise_1.1.0
[21] cluster_2.1.0 doParallel_1.0.15 limma_3.40.6 annotate_1.62.0
[25] prettyunits_1.0.2 colorspace_1.4-1 blob_1.2.0 xfun_0.11
[29] dplyr_0.8.3 crayon_1.3.4 RCurl_1.95-4.12 jsonlite_1.6
[33] graph_1.62.0 genefilter_1.66.0 zeallot_0.1.0 survival_2.44-1.1
[37] iterators_1.0.12 glue_1.3.1 registry_0.5-1 gtable_0.3.0
[41] zlibbioc_1.30.0 XVector_0.24.0 Rgraphviz_2.28.0 SparseM_1.77
[45] scales_1.0.0 DBI_1.0.0 rngtools_1.4 bibtex_0.4.2
[49] Rcpp_1.0.2 xtable_1.8-4 progress_1.2.2 emdbook_1.3.11
[53] htmlTable_1.13.2 foreign_0.8-72 bit_1.1-14 preprocessCore_1.46.0
[57] Formula_1.2-3 DT_0.10 AnnotationForge_1.26.0 htmlwidgets_1.5.1
[61] httr_1.4.1 threejs_0.3.1 shinyAce_0.4.1 acepack_1.4.1
[65] pkgconfig_2.0.3 XML_3.98-1.20 nnet_7.3-12 locfit_1.5-9.1
[69] tidyselect_0.2.5 rlang_0.4.0 reshape2_1.4.4 later_1.0.0
[73] AnnotationDbi_1.46.1 munsell_0.5.0 tools_3.6.1 RSQLite_2.1.2
[77] fdrtool_1.2.15 evaluate_0.14 shinyBS_0.61 stringr_1.4.0
[81] fastmap_1.0.1 knitr_1.26 bit64_0.9-7 purrr_0.3.3
[85] RBGL_1.60.0 mime_0.7 slam_0.1-45 biomaRt_2.40.5
[89] compiler_3.6.1 rstudioapi_0.10 png_0.1-7 affyio_1.54.0
[93] tibble_2.1.3 geneplotter_1.62.0 stringi_1.4.3 lattice_0.20-38
[97] Matrix_1.2-17 vctrs_0.2.0 pillar_1.4.2 lifecycle_0.1.0
[101] BiocManager_1.30.9 d3heatmap_0.6.1.2 data.table_1.12.4 bitops_1.0-6
[105] httpuv_1.5.2 R6_2.4.1 latticeExtra_0.6-28 affy_1.62.0
[109] promises_1.1.0 topGO_2.36.0 gridExtra_2.3 codetools_0.2-16
[113] MASS_7.3-51.4 assertthat_0.2.1 Category_2.50.0 pkgmaker_0.27
[117] withr_2.1.2 GenomeInfoDbData_1.2.1 hms_0.5.2 grid_3.6.1
[121] rpart_4.1-15 tidyr_1.0.0 coda_0.19-3 rmarkdown_1.17
[125] bbmle_1.0.20 numDeriv_2016.8-1.1 shiny_1.4.0 base64enc_0.1-3