Question

Issues with Tximport

0

Entering edit mode

bryce.thomas • 0

@ae57026f

Last seen 2.8 years ago

Australia

Hi all,

I'm following along with DIYTranscriptomics

I cannot get Tximport to run on my own samples. I run the code below, and get the following output.


> Txi_gene.D.labrax <- tximport(path.D.labrax.3,  
+                      type = "kallisto", 
+                      tx2gene = Tx.D.labrax, 
+                      txOut = FALSE, 
+                      countsFromAbundance = "lengthScaledTPM",
+                      ignoreTxVersion = TRUE,
+                      ignoreAfterBar = TRUE)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3 
Error in .local(object, ...) : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Example IDs (file): [ENSDLAT00005064321, ENSDLAT00005064323, ENSDLAT00005005748, ...]

Example IDs (tx2gene): [ENSDLAT00005000002.1, ENSDLAT00005000003.1, ENSDLAT00005000004.1, ...]

  This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.

The thing I don't understand is when I manually search in the tsv file for a transcript from the tibble created from the Ensembl transcriptome e.g. ENSDLAT005example I can find it, and the same is true for the reverse. In the tsv and the tibble, both of the columns are called target_id. I have no idea where to go from here. It should work, but it isn't. All of these things have been done/installed in the last 2 weeks so I'm confident I'm not using outdated packages or something.



sessionInfo( )
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tximport_1.18.0 biomaRt_2.46.3 

loaded via a namespace (and not attached):
 [1] progress_1.2.2       tinytex_0.32         tidyselect_1.1.1     xfun_0.23            purrr_0.3.4         
 [6] rhdf5_2.34.0         vctrs_0.3.8          generics_0.1.0       htmltools_0.5.1.1    stats4_4.0.4        
[11] BiocFileCache_1.14.0 yaml_2.2.1           utf8_1.2.1           blob_1.2.1           XML_3.99-0.6        
[16] rlang_0.4.11         pillar_1.6.1         glue_1.4.2           DBI_1.1.1            rappdirs_0.3.3      
[21] BiocGenerics_0.36.1  bit64_4.0.5          dbplyr_2.1.1         lifecycle_1.0.0      stringr_1.4.0       
[26] memoise_2.0.0        evaluate_0.14        Biobase_2.50.0       knitr_1.33           IRanges_2.24.1      
[31] fastmap_1.1.0        parallel_4.0.4       curl_4.3.1           AnnotationDbi_1.52.0 fansi_0.5.0         
[36] Rcpp_1.0.6           readr_1.4.0          openssl_1.4.4        cachem_1.0.5         S4Vectors_0.28.1    
[41] bit_4.0.4            hms_1.1.0            askpass_1.1          digest_0.6.27        stringi_1.5.3       
[46] dplyr_1.0.6          rhdf5filters_1.2.1   tools_4.0.4          magrittr_2.0.1       RSQLite_2.2.7       
[51] tibble_3.1.2         crayon_1.4.1         pkgconfig_2.0.3      ellipsis_0.3.2       xml2_1.3.2          
[56] prettyunits_1.1.1    assertthat_0.2.1     rmarkdown_2.8        httr_1.4.2           Rhdf5lib_1.12.1     
[61] R6_2.5.0             compiler_4.0.4

tximport • 1.2k views

ADD COMMENT • link 2.9 years ago • updated 2.8 years ago bryce.thomas • 0

score 2 · Answer 1 · 2021-06-17

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 16 hours ago

United States

Your file has no version numbers, while your tx2gene does have version numbers. While the names look similar, they are not a match. tximport will not perform guesswork to try to match strings.

The arguments ignoreAfter... are for ignoring extra characters in your files (which are difficult to manipulate in R). tx2gene is easy to manipulate in R: it is a data.frame in your session.

tx2gene[,1] <- sub("\\..*", "", tx2gene[,1]) # match period plus any following characters

ADD COMMENT • link 2.9 years ago Michael Love 41k

0

Entering edit mode

While this line of code did fix the dataframe called by tx2gene it still has issues. However, if you toggle txOut = FALSE to txOut = TRUE, the code works. I know this is swapping from transcript level data instead of gene level data, but more than that I cannot explain/do not understand why it works.

I just want to say thank you for your continued responses and help!

ADD REPLY • link 2.8 years ago bryce.thomas • 0