I am running DESeq2 to find associations between microbiome abundance and gene expression. The results include accession numbers that look valid but do not appear in the reference that was used when calculating the gene counts and according to the ensembl ID converter were never valid ensembl IDs implying the problem is not that I used an old reference.
This is the code that is running DEseq2 and consolidating the results
##Loop through the microbiome abundance data
##to modify create colData for each microbiome group
for(i in seq(1,nrow(m_microbiome))){
microbiome_group<-m_microbiome[i,]
df_meta$MicrobiomeGroup<-as.vector(t(microbiome_group))
dds<-DESeqDataSetFromMatrix(countData = m_expression,
colData = df_meta,
design= ~Timepoint+MicrobiomeGroup)
dds<-DESeq(dds,
parallel=TRUE,BPPARAM=MulticoreParam(4))
df_results<-results(dds)
df_results$MicrobiomeGroup<-rownames(m_microbiome)[[i]]
l_results[[i]]<-as.data.frame(df_results)
}
df_combined<-do.call("rbind",l_results)
df_combined$FDR<-p.adjust(df_combined$pvalue)
write.csv(df_combined,outfile,row.names = T)
Here is an example of a gene that is not in the input but is in the output:
> print(m_expression[which(rownames(m_expression)=="gene:ENSG000001628921"),])
Early1 Early2 Early4 Late1 Late2 Late3 Late4
> print(df_combined[which(rownames(df_combined)=="gene:ENSG000001628921"),])
baseMean log2FoldChange lfcSE stat pvalue padj MicrobiomeGroup FDR
gene:ENSG000001628921 436.3101 -2.112225 0.3147359 -6.711102 1.931604e-11 2.348638e-07 Firmicutes 9.773919e-07
Here is the session info:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.25 microbiome_1.8.0 ggplot2_3.2.1 phyloseq_1.29.0
[5] DESeq2_1.26.0 SummarizedExperiment_1.15.10 DelayedArray_0.11.8 BiocParallel_1.19.5
[9] matrixStats_0.55.0 Biobase_2.45.1 GenomicRanges_1.37.17 GenomeInfoDb_1.21.2
[13] IRanges_2.19.18 S4Vectors_0.23.25 BiocGenerics_0.31.6
loaded via a namespace (and not attached):
[1] nlme_3.1-141 bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2 tools_3.6.1
[6] backports_1.1.5 R6_2.4.0 vegan_2.5-6 rpart_4.1-15 mgcv_1.8-29
[11] Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.2 colorspace_1.4-1 permute_0.9-5
[16] ade4_1.7-13 nnet_7.3-12 withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3
[21] bit_1.1-14 compiler_3.6.1 htmlTable_1.13.2 scales_1.0.0 checkmate_1.9.4
[26] genefilter_1.68.0 stringr_1.4.0 digest_0.6.21 foreign_0.8-72 XVector_0.25.0
[31] base64enc_0.1-3 pkgconfig_2.0.3 htmltools_0.4.0 htmlwidgets_1.5.1 rlang_0.4.1
[36] rstudioapi_0.10 RSQLite_2.1.2 jsonlite_1.6 acepack_1.4.1 dplyr_0.8.3
[41] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.2 Formula_1.2-3 biomformat_1.13.0
[46] Matrix_1.2-17 Rcpp_1.0.2 munsell_0.5.0 Rhdf5lib_1.7.6 ape_5.3
[51] lifecycle_0.1.0 stringi_1.4.3 MASS_7.3-51.4 zlibbioc_1.31.0 Rtsne_0.15
[56] rhdf5_2.29.6 plyr_1.8.4 grid_3.6.1 blob_1.2.0 crayon_1.3.4
[61] lattice_0.20-38 Biostrings_2.53.2 splines_3.6.1 multtest_2.41.0 annotate_1.64.0
[66] locfit_1.5-9.1 zeallot_0.1.0 pillar_1.4.2 igraph_1.2.4.1 geneplotter_1.64.0
[71] reshape2_1.4.3 codetools_0.2-16 XML_3.98-1.20 glue_1.3.1 latticeExtra_0.6-28
[76] data.table_1.12.6 vctrs_0.2.0 foreach_1.4.7 tidyr_1.0.0 gtable_0.3.0
[81] purrr_0.3.3 assertthat_0.2.1 xfun_0.10 xtable_1.8-4 survival_2.44-1.1
[86] tibble_2.1.3 iterators_1.0.12 AnnotationDbi_1.47.1 memoise_1.1.0 cluster_2.1.0
Any help explaining and fixing this behavior will be much appreciated.
Thank you