DESeq2 question about use of DESeqDataSetFromTximport() from RSEM data
Entering edit mode
tbrunetti • 0
Last seen 12 months ago

I have been trying to use the DESeq2 function DESeqDataSetFromTximport() when trying to import the results generated from using RSEM, and I continue to get the following error:

using counts and average transcript lengths from tximport
Error in DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata,  : 
  all(lengths > 0) is not TRUE

I have confirmed that the column names of txi.rsem$counts has the same names of the rows in the metadata file. Here is the code I used to generate this error which I used based on the recommendation from the DESEQ2 and tximpor vingettes :

metadata <- read.delim("metaData_2603VR.txt")
rownames(metadata) <- metadata$sampleID
files <- file.path(paste0(metadata$rsem_sample, ".genes.results"))
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)

I confirmed the sample names were the same by running the following:

all(colnames(txi.rsem$counts) == rownames(metadata))

which returned TRUE

I then ran the following:

dds <- DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata, design = ~ sample_group)

and it returned the following error:

using counts and average transcript lengths from tximport
Error in DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata,  : 
  all(lengths > 0) is not TRUE

I have tried to Google and read the Vignettes and docs, and I don't read anything about needing to filter anything out before running the command: DESeqDataSetFromTximport. Do I need to filter out the RSEM data prior to using the Tximport function for RSEM or did am I using the function incorrectly?

Any advice or suggestions is much appreciated!

Also, here is the sessionInfo information, in case there is a known incompatibility in the version of software pacakages I am using:

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximport_1.14.2             tximportData_1.14.0         edgeR_3.28.1                limma_3.42.2               
 [5] DESeq2_1.26.0               SummarizedExperiment_1.16.1 DelayedArray_0.12.3         BiocParallel_1.20.1        
 [9] matrixStats_0.56.0          Biobase_2.46.0              GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
[13] IRanges_2.20.2              S4Vectors_0.24.4            BiocGenerics_0.32.0         Hmisc_4.4-0                
[17] Formula_1.2-3               survival_3.2-3              lattice_0.20-41             data.table_1.12.8          
[21] forcats_0.5.0               stringr_1.4.0               dplyr_1.0.0                 purrr_0.3.4                
[25] readr_1.3.1                 tidyr_1.1.0                 tibble_3.0.3                ggplot2_3.3.2              
[29] tidyverse_1.3.0            

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           fs_1.4.2               bit64_0.9-7.1          lubridate_1.7.9        RColorBrewer_1.1-2    
 [6] httr_1.4.2             tools_3.6.3            backports_1.1.8        R6_2.4.1               rpart_4.1-15          
[11] DBI_1.1.0              colorspace_1.4-1       nnet_7.3-14            withr_2.2.0            tidyselect_1.1.0      
[16] gridExtra_2.3          bit_1.1-15.2           compiler_3.6.3         cli_2.0.2              rvest_0.3.5           
[21] htmlTable_2.0.1        xml2_1.3.2             scales_1.1.1           checkmate_2.0.0        genefilter_1.68.0     
[26] digest_0.6.25          foreign_0.8-76         XVector_0.26.0         base64enc_0.1-3        jpeg_0.1-8.1          
[31] pkgconfig_2.0.3        htmltools_0.5.0        dbplyr_1.4.4           htmlwidgets_1.5.1      rlang_0.4.7           
[36] readxl_1.3.1           RSQLite_2.2.0          rstudioapi_0.11        generics_0.0.2         jsonlite_1.7.0        
[41] acepack_1.4.1          RCurl_1.98-1.2         magrittr_1.5           GenomeInfoDbData_1.2.2 Matrix_1.2-18         
[46] Rcpp_1.0.5             munsell_0.5.0          fansi_0.4.1            lifecycle_0.2.0        stringi_1.4.6         
[51] zlibbioc_1.32.0        grid_3.6.3             blob_1.2.1             crayon_1.3.4           haven_2.3.1           
[56] splines_3.6.3          annotate_1.64.0        hms_0.5.3              locfit_1.5-9.4         knitr_1.29            
[61] pillar_1.4.6           geneplotter_1.64.0     XML_3.99-0.3           reprex_0.3.0           glue_1.4.1            
[66] latticeExtra_0.6-29    modelr_0.1.8           png_0.1-7              vctrs_0.3.2            cellranger_1.1.0      
[71] gtable_0.3.0           assertthat_0.2.1       xfun_0.15              xtable_1.8-4           broom_0.7.0           
[76] memoise_1.1.0          AnnotationDbi_1.48.0   cluster_2.1.0          ellipsis_0.3.1     
deseq2 software error • 215 views
Entering edit mode
Last seen 2 days ago
United States

The issue is that RSEM here has estimated a gene length of zero, which is incompatible with our use of log average transcript length as an offset.

I'd recommend letting tximport perform the calculation of gene lengths by importing isoform level data.

Entering edit mode

Thank you so much for your quick reply. I will try the isoform level outputs from RSEM. If I use the isoform level, can I just collapse it to the gene level after the data is stored in a DESeq object?

Also, can I remove those genes with lengths of zero across all the RSEM data sets and proceed that way, or is there an added benefit of acutally using the isoform level outputs and then collapsing them to gene level outputs later?

Entering edit mode

I meant to use txIn=TRUE, txOut=FALSE, with a tx2gene table, when running tximport() to make the gene lengths.

You could also filter those genes with lengths of zero.

I think, aside from RSEM sometimes outputting a gene length of zero when no isoform is estimated to be expressed in a sample, the average transcript lengths are identical across RSEM and tximport (it's a weighted average of the effective transcript lengths).

Entering edit mode

Great, thank you for the explanation!


Login before adding your answer.

Traffic: 392 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6