I have been trying to use the DESeq2 function DESeqDataSetFromTximport()
when trying to import the results generated from using RSEM, and I continue to get the following error:
using counts and average transcript lengths from tximport
Error in DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata, :
all(lengths > 0) is not TRUE
I have confirmed that the column names of txi.rsem$counts has the same names of the rows in the metadata file. Here is the code I used to generate this error which I used based on the recommendation from the DESEQ2 and tximpor vingettes :
metadata <- read.delim("metaData_2603VR.txt")
rownames(metadata) <- metadata$sampleID
files <- file.path(paste0(metadata$rsem_sample, ".genes.results"))
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
head(txi.rsem$counts)
I confirmed the sample names were the same by running the following:
all(colnames(txi.rsem$counts) == rownames(metadata))
which returned TRUE
I then ran the following:
dds <- DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata, design = ~ sample_group)
and it returned the following error:
using counts and average transcript lengths from tximport
Error in DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata, :
all(lengths > 0) is not TRUE
I have tried to Google and read the Vignettes and docs, and I don't read anything about needing to filter anything out before running the command: DESeqDataSetFromTximport
. Do I need to filter out the RSEM data prior to using the Tximport function for RSEM or did am I using the function incorrectly?
Any advice or suggestions is much appreciated!
Also, here is the sessionInfo information, in case there is a known incompatibility in the version of software pacakages I am using:
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximport_1.14.2 tximportData_1.14.0 edgeR_3.28.1 limma_3.42.2
[5] DESeq2_1.26.0 SummarizedExperiment_1.16.1 DelayedArray_0.12.3 BiocParallel_1.20.1
[9] matrixStats_0.56.0 Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
[13] IRanges_2.20.2 S4Vectors_0.24.4 BiocGenerics_0.32.0 Hmisc_4.4-0
[17] Formula_1.2-3 survival_3.2-3 lattice_0.20-41 data.table_1.12.8
[21] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0 purrr_0.3.4
[25] readr_1.3.1 tidyr_1.1.0 tibble_3.0.3 ggplot2_3.3.2
[29] tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 fs_1.4.2 bit64_0.9-7.1 lubridate_1.7.9 RColorBrewer_1.1-2
[6] httr_1.4.2 tools_3.6.3 backports_1.1.8 R6_2.4.1 rpart_4.1-15
[11] DBI_1.1.0 colorspace_1.4-1 nnet_7.3-14 withr_2.2.0 tidyselect_1.1.0
[16] gridExtra_2.3 bit_1.1-15.2 compiler_3.6.3 cli_2.0.2 rvest_0.3.5
[21] htmlTable_2.0.1 xml2_1.3.2 scales_1.1.1 checkmate_2.0.0 genefilter_1.68.0
[26] digest_0.6.25 foreign_0.8-76 XVector_0.26.0 base64enc_0.1-3 jpeg_0.1-8.1
[31] pkgconfig_2.0.3 htmltools_0.5.0 dbplyr_1.4.4 htmlwidgets_1.5.1 rlang_0.4.7
[36] readxl_1.3.1 RSQLite_2.2.0 rstudioapi_0.11 generics_0.0.2 jsonlite_1.7.0
[41] acepack_1.4.1 RCurl_1.98-1.2 magrittr_1.5 GenomeInfoDbData_1.2.2 Matrix_1.2-18
[46] Rcpp_1.0.5 munsell_0.5.0 fansi_0.4.1 lifecycle_0.2.0 stringi_1.4.6
[51] zlibbioc_1.32.0 grid_3.6.3 blob_1.2.1 crayon_1.3.4 haven_2.3.1
[56] splines_3.6.3 annotate_1.64.0 hms_0.5.3 locfit_1.5-9.4 knitr_1.29
[61] pillar_1.4.6 geneplotter_1.64.0 XML_3.99-0.3 reprex_0.3.0 glue_1.4.1
[66] latticeExtra_0.6-29 modelr_0.1.8 png_0.1-7 vctrs_0.3.2 cellranger_1.1.0
[71] gtable_0.3.0 assertthat_0.2.1 xfun_0.15 xtable_1.8-4 broom_0.7.0
[76] memoise_1.1.0 AnnotationDbi_1.48.0 cluster_2.1.0 ellipsis_0.3.1
Thank you so much for your quick reply. I will try the isoform level outputs from RSEM. If I use the isoform level, can I just collapse it to the gene level after the data is stored in a DESeq object?
Also, can I remove those genes with lengths of zero across all the RSEM data sets and proceed that way, or is there an added benefit of acutally using the isoform level outputs and then collapsing them to gene level outputs later?
I meant to use
txIn=TRUE
,txOut=FALSE
, with a tx2gene table, when running tximport() to make the gene lengths.You could also filter those genes with lengths of zero.
I think, aside from RSEM sometimes outputting a gene length of zero when no isoform is estimated to be expressed in a sample, the average transcript lengths are identical across RSEM and tximport (it's a weighted average of the effective transcript lengths).
Great, thank you for the explanation!