tximport rowsum error
1
1
Entering edit mode
Ezgi ▴ 60
@ezgi-24130
Last seen 2.8 years ago
United States

I'm trying to import some salmon quant.sf files and convert them to gene level TPMs with tximport. I have checked whether the files exist, and all files do. All these files have a long "transcript name" but I'm already using the appropriate data frame to map them to the gene ids. However, I get a rowsum error, saying something that's supposed to be numeric, isn't. I'm not sure how I can find the problem in my data, I would appreciate any suggestions.

Here's what I try to run:

txi <- tximport(salmon_paths, 
                type = "salmon",
                tx2gene = tx2gene)

Then I get the error:

reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 
removing duplicated transcript rows from tx2gene
transcripts missing from tx2gene: 92
summarizing abundance
summarizing counts
summarizing length
Error in rowsum.default(x[sub.idx, , drop = FALSE], geneId) : 
  'x' must be numeric

I tried importing the same files with map and read_tsv while binding each file as rows (map_dfr) or as columns (map_dfc) and I don't get any errors:

tpm_dfc <- purrr::map_dfc(salmon_paths, read_tsv)
tpm_dfr <- purrr::map_dfr(salmon_paths, read_tsv)

And I get the expected column specifications for each file. So I imagine it can't be that it's importing columns in the wrong format? What else might be going on here?

── Column specification ──────────────────────────────────────────────────────────────────────────────
cols(
  Name = col_character(),
  Length = col_double(),
  EffectiveLength = col_double(),
  TPM = col_double(),
  NumReads = col_double()
)

Here's an example header from one of my quant.sf files:

tibble::tribble(
  ~Name, ~Length, ~EffectiveLength,     ~TPM, ~NumReads,
  "ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|",    1657,         1490.931,        0,         0,
  "ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|",     632,          465.981,        0,         0,
  "ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|",    1351,         1184.931, 0.135787, 14.127527,
  "ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA|",      68,            22.69,        0,         0,
  "ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|RP11-34P13.3-001|RP11-34P13.3|712|lincRNA|",     712,          545.971,        0,         0,
  "ENST00000469289.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002841.2|RP11-34P13.3-002|RP11-34P13.3|535|lincRNA|",     535,          369.025,        0,         0
)

And the example header of my tx2gene data frame:

tibble::tribble(
  ~Name,     ~ensembl_gene,
  "ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|", "ENSG00000223972",
  "ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|", "ENSG00000223972",
  "ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|", "ENSG00000227232",
  "ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA|", "ENSG00000278267",
  "ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|RP11-34P13.3-001|RP11-34P13.3|712|lincRNA|", "ENSG00000243485",
  "ENST00000469289.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002841.2|RP11-34P13.3-002|RP11-34P13.3|535|lincRNA|", "ENSG00000243485"
)

And here's the session info:

sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /igm/apps/R/R-4.0.2_install/lib64/R/lib/libRblas.so
LAPACK: /igm/apps/R/R-4.0.2_install/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] datapasta_3.1.0     tximportData_1.16.0 tximport_1.16.1     forcats_0.5.0      
 [5] stringr_1.4.0       dplyr_1.0.2         purrr_0.3.4         readr_1.4.0        
 [9] tidyr_1.1.2         tibble_3.0.4        ggplot2_3.3.2       tidyverse_1.3.0    
[13] knitr_1.30         

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0    xfun_0.19           haven_2.3.1         colorspace_2.0-0   
 [5] vctrs_0.3.5         generics_0.1.0      utf8_1.1.4          blob_1.2.1         
 [9] rlang_0.4.8         pillar_1.4.7        glue_1.4.2          withr_2.3.0        
[13] DBI_1.1.0           bit64_4.0.5         dbplyr_2.0.0        modelr_0.1.8       
[17] readxl_1.3.1        lifecycle_0.2.0     munsell_0.5.0       gtable_0.3.0       
[21] cellranger_1.1.0    rvest_0.3.6         memoise_1.1.0       parallel_4.0.2     
[25] fansi_0.4.1         highr_0.8           broom_0.7.2         Rcpp_1.0.5         
[29] clipr_0.7.1         BiocManager_1.30.10 scales_1.1.1        backports_1.2.0    
[33] vroom_1.3.2         jsonlite_1.7.1      fs_1.5.0            bit_4.0.4          
[37] hms_0.5.3           digest_0.6.27       stringi_1.5.3       grid_4.0.2         
[41] cli_2.2.0           tools_4.0.2         magrittr_2.0.1      RSQLite_2.2.1      
[45] crayon_1.3.4        pkgconfig_2.0.3     ellipsis_0.3.1      xml2_1.3.2         
[49] reprex_0.3.0        lubridate_1.7.9.2   assertthat_0.2.1    httr_1.4.2         
[53] rstudioapi_0.13     R6_2.5.0            compiler_4.0.2
tximport • 1.2k views
ADD COMMENT
2
Entering edit mode
Ezgi ▴ 60
@ezgi-24130
Last seen 2.8 years ago
United States

Setting the dropInfReps = TRUE argument in tximport resolved the problem. 🤦‍♀️

Also I needed to import over 400 files and setting the importer as the vroom::vroom() function with column specifications improved the speed quite a bit. Here's the example code:

txi <- tximport(
  salmon_paths,
  type = "salmon",
  tx2gene = tx2gene,
  dropInfReps = TRUE,
  importer = function(x)
    vroom::vroom(
      x,
      col_types = cols(
        Name = col_character(),
        Length = col_double(),
        EffectiveLength = col_double(),
        TPM = col_double(),
        NumReads = col_double()
      )
    )
)
ADD COMMENT

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6