tximport fails when RSEM output options produce files with additional columns?
Alexander • 0
Last seen 6 months ago
United States

I downloaded the RSEM-generated gene counts files from ENCODE, and am hoping to produce a normalized matrix. Unfortunately, tximport doesn't seem to be able to read the RSEM files from ENCODE, which include additional output columns.

RSEM output format docs.

ENCODE's RSEM-generated gene count file headers and example row:

gene_id transcript_id(s)    length  effective_length    expected_count  TPM FPKM    posterior_mean_count    posterior_standard_deviation_of_count   pme_TPM pme_FPKM    TPM_ci_lower_bound  TPM_ci_upper_bound  TPM_coefficient_of_quartile_variation   FPKM_ci_lower_bound FPKM_ci_upper_bound FPKM_coefficient_of_quartile_variation
ENSG00000000003.14  ENST00000373020.8,ENST00000494424.1,ENST00000496771.5,ENST00000612152.4,ENST00000614008.4   1745.64 1646.64 8.00    0.12    0.15    8.00    0.00    0.24    0.30    0.0992221   0.38994 0.218542    0.126724    0.498276    0.218431

error message:

reading in files with read_tsv

Warning message:
“Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.”
Warning message:
“59526 parsing failures.
row col  expected     actual                                                                           file
  1  -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
  2  -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
  3  -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
  4  -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
  5  -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
... ... ......... .......... ..............................................................................
See problems(...) for more details.

The command was:

txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)

I'm new to R (from python) and so would benefit most from a detailed answer. Many thanks,


R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin20.4.0 (64-bit)
Running under: macOS Big Sur 11.3

Matrix products: default
BLAS:   /usr/local/Cellar/openblas/0.3.15_1/lib/libopenblasp-r0.3.15.dylib
LAPACK: /usr/local/Cellar/r/4.1.0/lib/R/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.0   tximportData_1.20.0 readr_1.4.0        
[4] tximport_1.20.0    

loaded via a namespace (and not attached):
 [1] magrittr_2.0.1    hms_1.1.0         uuid_0.1-4        R6_2.5.0         
 [5] rlang_0.4.11      fansi_0.5.0       tools_4.1.0       utf8_1.2.1       
 [9] htmltools_0.5.1.1 ellipsis_0.3.2    digest_0.6.27     tibble_3.1.2     
[13] lifecycle_1.0.0   crayon_1.4.1      IRdisplay_1.0     repr_1.1.3       
[17] base64enc_0.1-3   vctrs_0.3.8       IRkernel_1.2      evaluate_0.14    
[21] pbdZMQ_0.3-5      compiler_4.1.0    pillar_1.6.1      jsonlite_1.7.2   
[25] pkgconfig_2.0.3
swbarnes2 ▴ 970
Last seen 1 hour ago
San Diego

You should always include the error message given. "It doesn't work" isn't specific enough to allow anyone to help you.

Have you tried removing all the columns after FPKM?

I've updated the post with the error message. I _have_ tried chopping off the latter columns, and it appears to work.

Last seen 1 hour ago
United States

If the files are modified from their original software, you essentially have a custom format, you can just set type="none" and then manually specify these arguments:

geneIdCol, txIdCol, abundanceCol, countsCol, lengthCol

See ?tximport


