Question

DESeq2 question about use of DESeqDataSetFromTximport() from RSEM data

1

Entering edit mode

tbrunetti ▴ 10

@tbrunetti-23901

Last seen 5.5 years ago

I have been trying to use the DESeq2 function DESeqDataSetFromTximport() when trying to import the results generated from using RSEM, and I continue to get the following error:

using counts and average transcript lengths from tximport
Error in DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata,  : 
  all(lengths > 0) is not TRUE

I have confirmed that the column names of txi.rsem$counts has the same names of the rows in the metadata file. Here is the code I used to generate this error which I used based on the recommendation from the DESEQ2 and tximpor vingettes :

metadata <- read.delim("metaData_2603VR.txt")
rownames(metadata) <- metadata$sampleID
files <- file.path(paste0(metadata$rsem_sample, ".genes.results"))
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
head(txi.rsem$counts)

I confirmed the sample names were the same by running the following:

all(colnames(txi.rsem$counts) == rownames(metadata))

which returned TRUE

I then ran the following:

dds <- DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata, design = ~ sample_group)

and it returned the following error:

using counts and average transcript lengths from tximport
Error in DESeqDataSetFromTximport(txi = txi.rsem, colData = metadata,  : 
  all(lengths > 0) is not TRUE

I have tried to Google and read the Vignettes and docs, and I don't read anything about needing to filter anything out before running the command: DESeqDataSetFromTximport. Do I need to filter out the RSEM data prior to using the Tximport function for RSEM or did am I using the function incorrectly?

Any advice or suggestions is much appreciated!

Also, here is the sessionInfo information, in case there is a known incompatibility in the version of software pacakages I am using:

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximport_1.14.2             tximportData_1.14.0         edgeR_3.28.1                limma_3.42.2               
 [5] DESeq2_1.26.0               SummarizedExperiment_1.16.1 DelayedArray_0.12.3         BiocParallel_1.20.1        
 [9] matrixStats_0.56.0          Biobase_2.46.0              GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
[13] IRanges_2.20.2              S4Vectors_0.24.4            BiocGenerics_0.32.0         Hmisc_4.4-0                
[17] Formula_1.2-3               survival_3.2-3              lattice_0.20-41             data.table_1.12.8          
[21] forcats_0.5.0               stringr_1.4.0               dplyr_1.0.0                 purrr_0.3.4                
[25] readr_1.3.1                 tidyr_1.1.0                 tibble_3.0.3                ggplot2_3.3.2              
[29] tidyverse_1.3.0            

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           fs_1.4.2               bit64_0.9-7.1          lubridate_1.7.9        RColorBrewer_1.1-2    
 [6] httr_1.4.2             tools_3.6.3            backports_1.1.8        R6_2.4.1               rpart_4.1-15          
[11] DBI_1.1.0              colorspace_1.4-1       nnet_7.3-14            withr_2.2.0            tidyselect_1.1.0      
[16] gridExtra_2.3          bit_1.1-15.2           compiler_3.6.3         cli_2.0.2              rvest_0.3.5           
[21] htmlTable_2.0.1        xml2_1.3.2             scales_1.1.1           checkmate_2.0.0        genefilter_1.68.0     
[26] digest_0.6.25          foreign_0.8-76         XVector_0.26.0         base64enc_0.1-3        jpeg_0.1-8.1          
[31] pkgconfig_2.0.3        htmltools_0.5.0        dbplyr_1.4.4           htmlwidgets_1.5.1      rlang_0.4.7           
[36] readxl_1.3.1           RSQLite_2.2.0          rstudioapi_0.11        generics_0.0.2         jsonlite_1.7.0        
[41] acepack_1.4.1          RCurl_1.98-1.2         magrittr_1.5           GenomeInfoDbData_1.2.2 Matrix_1.2-18         
[46] Rcpp_1.0.5             munsell_0.5.0          fansi_0.4.1            lifecycle_0.2.0        stringi_1.4.6         
[51] zlibbioc_1.32.0        grid_3.6.3             blob_1.2.1             crayon_1.3.4           haven_2.3.1           
[56] splines_3.6.3          annotate_1.64.0        hms_0.5.3              locfit_1.5-9.4         knitr_1.29            
[61] pillar_1.4.6           geneplotter_1.64.0     XML_3.99-0.3           reprex_0.3.0           glue_1.4.1            
[66] latticeExtra_0.6-29    modelr_0.1.8           png_0.1-7              vctrs_0.3.2            cellranger_1.1.0      
[71] gtable_0.3.0           assertthat_0.2.1       xfun_0.15              xtable_1.8-4           broom_0.7.0           
[76] memoise_1.1.0          AnnotationDbi_1.48.0   cluster_2.1.0          ellipsis_0.3.1

deseq2 software error • 3.0k views

ADD COMMENT • link updated 5.5 years ago by Michael Love 43k • written 5.5 years ago by tbrunetti ▴ 10

score 2 · Accepted Answer · 2020-07-23

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

The issue is that RSEM here has estimated a gene length of zero, which is incompatible with our use of log average transcript length as an offset.

I'd recommend letting tximport perform the calculation of gene lengths by importing isoform level data.

ADD COMMENT • link 5.5 years ago Michael Love 43k

0

Entering edit mode

Thank you so much for your quick reply. I will try the isoform level outputs from RSEM. If I use the isoform level, can I just collapse it to the gene level after the data is stored in a DESeq object?

Also, can I remove those genes with lengths of zero across all the RSEM data sets and proceed that way, or is there an added benefit of acutally using the isoform level outputs and then collapsing them to gene level outputs later?

ADD REPLY • link 5.5 years ago tbrunetti ▴ 10

2

Entering edit mode

I meant to use txIn=TRUE, txOut=FALSE, with a tx2gene table, when running tximport() to make the gene lengths.

You could also filter those genes with lengths of zero.

I think, aside from RSEM sometimes outputting a gene length of zero when no isoform is estimated to be expressed in a sample, the average transcript lengths are identical across RSEM and tximport (it's a weighted average of the effective transcript lengths).