tximport abundanceCol parameter not functioning for rsem data
1
0
Entering edit mode
m.metsger • 0
@mmetsger-16171
Last seen 4.0 years ago

Hello,

I am using tximport for importing transcript level estimates from RSEM data.

This is how the first 2 rsem files look like (G19439, G19440):

> head(f1)
       transcript_id            gene_id length effective_length expected_count  TPM  FPKM IsoPct
1 ENSMUST00000000001 ENSMUSG00000000001   3262          3058.38        1465.00 61.15 52.58 100.00
2 ENSMUST00000000003 ENSMUSG00000000003    902           698.38           0.00  0.00  0.00   0.00
3 ENSMUST00000114041 ENSMUSG00000000003    697           493.39           0.00  0.00  0.00   0.00
4 ENSMUST00000000028 ENSMUSG00000000028   2143          1939.38          50.74  3.34  2.87  45.98
5 ENSMUST00000096990 ENSMUSG00000000028   1747          1543.38          43.77  3.62  3.11  49.85
6 ENSMUST00000115585 ENSMUSG00000000028    832           628.38           1.49  0.30  0.26   4.17
> head(f2)
       transcript_id                gene_id length effective_length expected_count TPM  FPKM IsoPct
1 ENSMUST00000000001 ENSMUSG00000000001   3262          3064.62        1657.00 70.11 54.79 100.00
2 ENSMUST00000000003 ENSMUSG00000000003    902           704.62           0.00  0.00  0.00   0.00
3 ENSMUST00000114041 ENSMUSG00000000003    697           499.62           0.00  0.00  0.00   0.00
4 ENSMUST00000000028 ENSMUSG00000000028   2143          1945.62          48.81  3.25  2.54  60.69
5 ENSMUST00000096990 ENSMUSG00000000028   1747          1549.62          25.19  2.11  1.65  39.31
6 ENSMUST00000115585 ENSMUSG00000000028    832           634.62           0.00  0.00  0.00   0.00

When I'm using the abundanceCol="TPM" parameter it gives the correct TPM values:

> txi.rsem <- tximport(files, type = "rsem", txIn=TRUE,
+                      txOut = TRUE,ignoreTxVersion = T, abundanceCol = 'TPM')
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 
> head(txi.rsem$abundance)
                   G19439 G19440 G19441 G19442 G19443 G19444 G19445 G19446 G19447
ENSMUST00000000001  61.15  70.11  54.24  58.09  62.05  59.11  44.37  57.59  32.33
ENSMUST00000000003   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00
ENSMUST00000114041   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00
ENSMUST00000000028   3.34   3.25   6.38   3.82   4.24   3.92  12.42  16.54  10.80
ENSMUST00000096990   3.62   2.11   0.00   3.36   0.00   2.63   0.10   9.79   3.35
ENSMUST00000115585   0.30   0.00   0.00   0.00   0.00   0.00   0.00   0.20   0.00

If I change to abundanceCol="FPKM" it gives me the same 'TPM' result

 txi.rsem <- tximport(files, type = "rsem", txIn=TRUE,
+                      txOut = TRUE,ignoreTxVersion = T, abundanceCol = 'FPKM')
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 
> head(txi.rsem$abundance)
                   G19439 G19440 G19441 G19442 G19443 G19444 G19445 G19446 G19447
ENSMUST00000000001  61.15  70.11  54.24  58.09  62.05  59.11  44.37  57.59  32.33
ENSMUST00000000003   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00
ENSMUST00000114041   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.0
ENSMUST00000000028   3.34   3.25   6.38   3.82   4.24   3.92  12.42  16.54  10.80
ENSMUST00000096990   3.62   2.11   0.00   3.36   0.00   2.63   0.10   9.79   3.35
ENSMUST00000115585   0.30   0.00   0.00   0.00   0.00   0.00   0.00   0.20   0.00

So, it seems that the abundanceCol is not functioning, or there is something I'm doing incorrectly.

Bests, Maria

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] tools     parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dplyr_1.0.2                 Seurat_3.2.1                pheatmap_1.0.12            
 [4] RColorBrewer_1.1-2          devEMF_4.0-1                ReactomePA_1.32.0          
 [7] gplots_3.0.4                biomaRt_2.44.1              tximport_1.16.1            
[10] factoextra_1.0.7            ggfortify_0.4.10            edgeR_3.30.3               
[13] limma_3.44.3                DESeq2_1.28.1               SummarizedExperiment_1.18.2
[16] DelayedArray_0.14.1         matrixStats_0.56.0          Biobase_2.48.0             
[19] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2             
[22] S4Vectors_0.26.1            BiocGenerics_0.34.0         ggpubr_0.4.0               
[25] reshape2_1.4.4              ggplot2_3.3.2              

loaded via a namespace (and not attached):
  [1] reticulate_1.16        tidyselect_1.1.0       htmlwidgets_1.5.1      RSQLite_2.2.0         
  [5] AnnotationDbi_1.50.3   grid_4.0.2             BiocParallel_1.22.0    Rtsne_0.15            
  [9] scatterpie_0.1.5       munsell_0.5.0          codetools_0.2-16       ica_1.0-2             
 [13] miniUI_0.1.1.1         future_1.18.0          withr_2.2.0            colorspace_1.4-1      
 [17] GOSemSim_2.14.2        rstudioapi_0.11        ROCR_1.0-11            tensor_1.5            
 [21] ggsignif_0.6.0         DOSE_3.14.0            listenv_0.8.0          urltools_1.7.3        
 [25] GenomeInfoDbData_1.2.3 polyclip_1.10-0        bit64_4.0.5            farver_2.0.3          
 [29] vctrs_0.3.4            generics_0.0.2         BiocFileCache_1.12.1   R6_2.4.1              
 [33] graphlayouts_0.7.0     rsvd_1.0.3             locfit_1.5-9.4         spatstat.utils_1.17-0 
 [37] bitops_1.0-6           fgsea_1.14.0           gridGraphics_0.5-0     assertthat_0.2.1      
 [41] promises_1.1.1         scales_1.1.1           ggraph_2.0.3           enrichplot_1.8.1      
 [45] gtable_0.3.0           globals_0.13.0         goftest_1.2-2          tidygraph_1.2.0       
 [49] rlang_0.4.7            genefilter_1.70.0      splines_4.0.2          lazyeval_0.2.2        
 [53] rstatix_0.6.0          broom_0.7.0            europepmc_0.4          checkmate_2.0.0       
 [57] BiocManager_1.30.10    abind_1.4-5            backports_1.1.10       httpuv_1.5.4          
 [61] qvalue_2.20.0          ggplotify_0.0.5        ellipsis_0.3.1         ggridges_0.5.2        
 [65] Rcpp_1.0.5             plyr_1.8.6             progress_1.2.2         zlibbioc_1.34.0       
 [69] purrr_0.3.4            RCurl_1.98-1.2         prettyunits_1.1.1      deldir_0.1-29         
 [73] rpart_4.1-15           openssl_1.4.2          pbapply_1.4-3          viridis_0.5.1         
 [77] cowplot_1.1.0          zoo_1.8-8              haven_2.3.1            ggrepel_0.8.2         
 [81] cluster_2.1.0          magrittr_1.5           data.table_1.13.0      DO.db_2.9             
 [85] openxlsx_4.2.2         triebeard_0.3.0        lmtest_0.9-38          RANN_2.6.1            
 [89] reactome.db_1.70.0     fitdistrplus_1.1-1     patchwork_1.0.1        mime_0.9              
 [93] hms_0.5.3              xtable_1.8-4           XML_3.99-0.5           rio_0.5.16            
 [97] readxl_1.3.1           gridExtra_2.3          compiler_4.0.2         tibble_3.0.3          
[101] KernSmooth_2.23-17     crayon_1.3.4           htmltools_0.5.0        mgcv_1.8-33           
[105] later_1.1.0.1          tidyr_1.1.2            geneplotter_1.66.0     DBI_1.1.0             
[109] tweenr_1.0.1           dbplyr_1.4.4           MASS_7.3-51.6          rappdirs_0.3.1        
[113] readr_1.3.1            Matrix_1.2-18          car_3.0-9              gdata_2.18.0          
[117] igraph_1.2.5           forcats_0.5.0          pkgconfig_2.0.3        rvcheck_0.1.8         
[121] foreign_0.8-80         plotly_4.9.2.1         xml2_1.3.2             annotate_1.66.0       
[125] XVector_0.28.0         stringr_1.4.0          digest_0.6.25          sctransform_0.2.1     
[129] RcppAnnoy_0.0.16       graph_1.66.0           spatstat.data_1.4-3    cellranger_1.1.0      
[133] leiden_0.3.3           fastmatch_1.1-0        uwot_0.1.8             curl_4.3              
[137] shiny_1.5.0            gtools_3.8.2           graphite_1.34.0        nlme_3.1-149          
[141] lifecycle_0.2.0        jsonlite_1.7.1         carData_3.0-4          viridisLite_0.3.0     
[145] askpass_1.1            pillar_1.4.6           lattice_0.20-41        fastmap_1.0.1         
[149] httr_1.4.2             survival_3.2-3         GO.db_3.11.4           glue_1.4.2            
[153] spatstat_1.64-1        zip_2.1.1              png_0.1-7              bit_4.0.4             
[157] ggforce_0.3.2          stringi_1.5.3          blob_1.2.1             caTools_1.18.0        
[161] memoise_1.1.0          irlba_2.3.3            future.apply_1.6.0   
tximport rsem transcript abundance TPM FPKM • 1.2k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

Those columns are for use when type="none" (the default):

This argument is used to autofill the arguments below (geneIdCol, etc.) "none" means that the user will specify these columns.

So when you specify type="rsem" it fills in the arguments and ignores your input.

ADD COMMENT
0
Entering edit mode

Dear Mike, Thank you for the response! I think that the documentation was not very clearly written for these parameters. 1. There is no default for abundanceCol mentioned in the manual. 2 It is not explicitly written which arguments are auto-filled if you specify the “type” parameter. 3. The function output didn’t contain any warning that some arguments were not used. For me it was counter-intuitive that it would behave this way, I only found out by chance.

ADD REPLY
1
Entering edit mode

Thanks for the feedback.

ADD REPLY
0
Entering edit mode

I've updated the docs in the devel branch to be more explicit that those arguments are ignored unless type="none".

ADD REPLY

Login before adding your answer.

Traffic: 716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6