I'm trying to import some salmon quant.sf files and convert them to gene level TPMs with tximport
.
I have checked whether the files exist, and all files do.
All these files have a long "transcript name" but I'm already using the appropriate data frame to map them to the gene ids.
However, I get a rowsum
error, saying something that's supposed to be numeric, isn't. I'm not sure how I can find the problem in my data, I would appreciate any suggestions.
Here's what I try to run:
txi <- tximport(salmon_paths,
type = "salmon",
tx2gene = tx2gene)
Then I get the error:
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10
removing duplicated transcript rows from tx2gene
transcripts missing from tx2gene: 92
summarizing abundance
summarizing counts
summarizing length
Error in rowsum.default(x[sub.idx, , drop = FALSE], geneId) :
'x' must be numeric
I tried importing the same files with map
and read_tsv
while binding each file as rows (map_dfr
) or as columns (map_dfc
) and I don't get any errors:
tpm_dfc <- purrr::map_dfc(salmon_paths, read_tsv)
tpm_dfr <- purrr::map_dfr(salmon_paths, read_tsv)
And I get the expected column specifications for each file. So I imagine it can't be that it's importing columns in the wrong format? What else might be going on here?
── Column specification ──────────────────────────────────────────────────────────────────────────────
cols(
Name = col_character(),
Length = col_double(),
EffectiveLength = col_double(),
TPM = col_double(),
NumReads = col_double()
)
Here's an example header from one of my quant.sf
files:
tibble::tribble(
~Name, ~Length, ~EffectiveLength, ~TPM, ~NumReads,
"ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|", 1657, 1490.931, 0, 0,
"ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|", 632, 465.981, 0, 0,
"ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|", 1351, 1184.931, 0.135787, 14.127527,
"ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA|", 68, 22.69, 0, 0,
"ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|RP11-34P13.3-001|RP11-34P13.3|712|lincRNA|", 712, 545.971, 0, 0,
"ENST00000469289.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002841.2|RP11-34P13.3-002|RP11-34P13.3|535|lincRNA|", 535, 369.025, 0, 0
)
And the example header of my tx2gene
data frame:
tibble::tribble(
~Name, ~ensembl_gene,
"ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|", "ENSG00000223972",
"ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|", "ENSG00000223972",
"ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|", "ENSG00000227232",
"ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA|", "ENSG00000278267",
"ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|RP11-34P13.3-001|RP11-34P13.3|712|lincRNA|", "ENSG00000243485",
"ENST00000469289.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002841.2|RP11-34P13.3-002|RP11-34P13.3|535|lincRNA|", "ENSG00000243485"
)
And here's the session info:
sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /igm/apps/R/R-4.0.2_install/lib64/R/lib/libRblas.so
LAPACK: /igm/apps/R/R-4.0.2_install/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] datapasta_3.1.0 tximportData_1.16.0 tximport_1.16.1 forcats_0.5.0
[5] stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.4.0
[9] tidyr_1.1.2 tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0
[13] knitr_1.30
loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 xfun_0.19 haven_2.3.1 colorspace_2.0-0
[5] vctrs_0.3.5 generics_0.1.0 utf8_1.1.4 blob_1.2.1
[9] rlang_0.4.8 pillar_1.4.7 glue_1.4.2 withr_2.3.0
[13] DBI_1.1.0 bit64_4.0.5 dbplyr_2.0.0 modelr_0.1.8
[17] readxl_1.3.1 lifecycle_0.2.0 munsell_0.5.0 gtable_0.3.0
[21] cellranger_1.1.0 rvest_0.3.6 memoise_1.1.0 parallel_4.0.2
[25] fansi_0.4.1 highr_0.8 broom_0.7.2 Rcpp_1.0.5
[29] clipr_0.7.1 BiocManager_1.30.10 scales_1.1.1 backports_1.2.0
[33] vroom_1.3.2 jsonlite_1.7.1 fs_1.5.0 bit_4.0.4
[37] hms_0.5.3 digest_0.6.27 stringi_1.5.3 grid_4.0.2
[41] cli_2.2.0 tools_4.0.2 magrittr_2.0.1 RSQLite_2.2.1
[45] crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.1 xml2_1.3.2
[49] reprex_0.3.0 lubridate_1.7.9.2 assertthat_0.2.1 httr_1.4.2
[53] rstudioapi_0.13 R6_2.5.0 compiler_4.0.2