Hello,
I am using tximport for importing transcript level estimates from RSEM data.
This is how the first 2 rsem files look like (G19439, G19440):
> head(f1)
transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
1 ENSMUST00000000001 ENSMUSG00000000001 3262 3058.38 1465.00 61.15 52.58 100.00
2 ENSMUST00000000003 ENSMUSG00000000003 902 698.38 0.00 0.00 0.00 0.00
3 ENSMUST00000114041 ENSMUSG00000000003 697 493.39 0.00 0.00 0.00 0.00
4 ENSMUST00000000028 ENSMUSG00000000028 2143 1939.38 50.74 3.34 2.87 45.98
5 ENSMUST00000096990 ENSMUSG00000000028 1747 1543.38 43.77 3.62 3.11 49.85
6 ENSMUST00000115585 ENSMUSG00000000028 832 628.38 1.49 0.30 0.26 4.17
> head(f2)
transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
1 ENSMUST00000000001 ENSMUSG00000000001 3262 3064.62 1657.00 70.11 54.79 100.00
2 ENSMUST00000000003 ENSMUSG00000000003 902 704.62 0.00 0.00 0.00 0.00
3 ENSMUST00000114041 ENSMUSG00000000003 697 499.62 0.00 0.00 0.00 0.00
4 ENSMUST00000000028 ENSMUSG00000000028 2143 1945.62 48.81 3.25 2.54 60.69
5 ENSMUST00000096990 ENSMUSG00000000028 1747 1549.62 25.19 2.11 1.65 39.31
6 ENSMUST00000115585 ENSMUSG00000000028 832 634.62 0.00 0.00 0.00 0.00
When I'm using the abundanceCol="TPM" parameter it gives the correct TPM values:
> txi.rsem <- tximport(files, type = "rsem", txIn=TRUE,
+ txOut = TRUE,ignoreTxVersion = T, abundanceCol = 'TPM')
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12
> head(txi.rsem$abundance)
G19439 G19440 G19441 G19442 G19443 G19444 G19445 G19446 G19447
ENSMUST00000000001 61.15 70.11 54.24 58.09 62.05 59.11 44.37 57.59 32.33
ENSMUST00000000003 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
ENSMUST00000114041 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
ENSMUST00000000028 3.34 3.25 6.38 3.82 4.24 3.92 12.42 16.54 10.80
ENSMUST00000096990 3.62 2.11 0.00 3.36 0.00 2.63 0.10 9.79 3.35
ENSMUST00000115585 0.30 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.00
If I change to abundanceCol="FPKM" it gives me the same 'TPM' result
txi.rsem <- tximport(files, type = "rsem", txIn=TRUE,
+ txOut = TRUE,ignoreTxVersion = T, abundanceCol = 'FPKM')
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12
> head(txi.rsem$abundance)
G19439 G19440 G19441 G19442 G19443 G19444 G19445 G19446 G19447
ENSMUST00000000001 61.15 70.11 54.24 58.09 62.05 59.11 44.37 57.59 32.33
ENSMUST00000000003 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
ENSMUST00000114041 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
ENSMUST00000000028 3.34 3.25 6.38 3.82 4.24 3.92 12.42 16.54 10.80
ENSMUST00000096990 3.62 2.11 0.00 3.36 0.00 2.63 0.10 9.79 3.35
ENSMUST00000115585 0.30 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.00
So, it seems that the abundanceCol is not functioning, or there is something I'm doing incorrectly.
Bests, Maria
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] tools parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.2 Seurat_3.2.1 pheatmap_1.0.12
[4] RColorBrewer_1.1-2 devEMF_4.0-1 ReactomePA_1.32.0
[7] gplots_3.0.4 biomaRt_2.44.1 tximport_1.16.1
[10] factoextra_1.0.7 ggfortify_0.4.10 edgeR_3.30.3
[13] limma_3.44.3 DESeq2_1.28.1 SummarizedExperiment_1.18.2
[16] DelayedArray_0.14.1 matrixStats_0.56.0 Biobase_2.48.0
[19] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2 IRanges_2.22.2
[22] S4Vectors_0.26.1 BiocGenerics_0.34.0 ggpubr_0.4.0
[25] reshape2_1.4.4 ggplot2_3.3.2
loaded via a namespace (and not attached):
[1] reticulate_1.16 tidyselect_1.1.0 htmlwidgets_1.5.1 RSQLite_2.2.0
[5] AnnotationDbi_1.50.3 grid_4.0.2 BiocParallel_1.22.0 Rtsne_0.15
[9] scatterpie_0.1.5 munsell_0.5.0 codetools_0.2-16 ica_1.0-2
[13] miniUI_0.1.1.1 future_1.18.0 withr_2.2.0 colorspace_1.4-1
[17] GOSemSim_2.14.2 rstudioapi_0.11 ROCR_1.0-11 tensor_1.5
[21] ggsignif_0.6.0 DOSE_3.14.0 listenv_0.8.0 urltools_1.7.3
[25] GenomeInfoDbData_1.2.3 polyclip_1.10-0 bit64_4.0.5 farver_2.0.3
[29] vctrs_0.3.4 generics_0.0.2 BiocFileCache_1.12.1 R6_2.4.1
[33] graphlayouts_0.7.0 rsvd_1.0.3 locfit_1.5-9.4 spatstat.utils_1.17-0
[37] bitops_1.0-6 fgsea_1.14.0 gridGraphics_0.5-0 assertthat_0.2.1
[41] promises_1.1.1 scales_1.1.1 ggraph_2.0.3 enrichplot_1.8.1
[45] gtable_0.3.0 globals_0.13.0 goftest_1.2-2 tidygraph_1.2.0
[49] rlang_0.4.7 genefilter_1.70.0 splines_4.0.2 lazyeval_0.2.2
[53] rstatix_0.6.0 broom_0.7.0 europepmc_0.4 checkmate_2.0.0
[57] BiocManager_1.30.10 abind_1.4-5 backports_1.1.10 httpuv_1.5.4
[61] qvalue_2.20.0 ggplotify_0.0.5 ellipsis_0.3.1 ggridges_0.5.2
[65] Rcpp_1.0.5 plyr_1.8.6 progress_1.2.2 zlibbioc_1.34.0
[69] purrr_0.3.4 RCurl_1.98-1.2 prettyunits_1.1.1 deldir_0.1-29
[73] rpart_4.1-15 openssl_1.4.2 pbapply_1.4-3 viridis_0.5.1
[77] cowplot_1.1.0 zoo_1.8-8 haven_2.3.1 ggrepel_0.8.2
[81] cluster_2.1.0 magrittr_1.5 data.table_1.13.0 DO.db_2.9
[85] openxlsx_4.2.2 triebeard_0.3.0 lmtest_0.9-38 RANN_2.6.1
[89] reactome.db_1.70.0 fitdistrplus_1.1-1 patchwork_1.0.1 mime_0.9
[93] hms_0.5.3 xtable_1.8-4 XML_3.99-0.5 rio_0.5.16
[97] readxl_1.3.1 gridExtra_2.3 compiler_4.0.2 tibble_3.0.3
[101] KernSmooth_2.23-17 crayon_1.3.4 htmltools_0.5.0 mgcv_1.8-33
[105] later_1.1.0.1 tidyr_1.1.2 geneplotter_1.66.0 DBI_1.1.0
[109] tweenr_1.0.1 dbplyr_1.4.4 MASS_7.3-51.6 rappdirs_0.3.1
[113] readr_1.3.1 Matrix_1.2-18 car_3.0-9 gdata_2.18.0
[117] igraph_1.2.5 forcats_0.5.0 pkgconfig_2.0.3 rvcheck_0.1.8
[121] foreign_0.8-80 plotly_4.9.2.1 xml2_1.3.2 annotate_1.66.0
[125] XVector_0.28.0 stringr_1.4.0 digest_0.6.25 sctransform_0.2.1
[129] RcppAnnoy_0.0.16 graph_1.66.0 spatstat.data_1.4-3 cellranger_1.1.0
[133] leiden_0.3.3 fastmatch_1.1-0 uwot_0.1.8 curl_4.3
[137] shiny_1.5.0 gtools_3.8.2 graphite_1.34.0 nlme_3.1-149
[141] lifecycle_0.2.0 jsonlite_1.7.1 carData_3.0-4 viridisLite_0.3.0
[145] askpass_1.1 pillar_1.4.6 lattice_0.20-41 fastmap_1.0.1
[149] httr_1.4.2 survival_3.2-3 GO.db_3.11.4 glue_1.4.2
[153] spatstat_1.64-1 zip_2.1.1 png_0.1-7 bit_4.0.4
[157] ggforce_0.3.2 stringi_1.5.3 blob_1.2.1 caTools_1.18.0
[161] memoise_1.1.0 irlba_2.3.3 future.apply_1.6.0
Dear Mike, Thank you for the response! I think that the documentation was not very clearly written for these parameters. 1. There is no default for abundanceCol mentioned in the manual. 2 It is not explicitly written which arguments are auto-filled if you specify the “type” parameter. 3. The function output didn’t contain any warning that some arguments were not used. For me it was counter-intuitive that it would behave this way, I only found out by chance.
Thanks for the feedback.
I've updated the docs in the devel branch to be more explicit that those arguments are ignored unless
type="none"
.