Issues with Tximport
1
0
Entering edit mode
@ae57026f
Last seen 19 days ago
Australia

Hi all,

I'm following along with DIYTranscriptomics

I cannot get Tximport to run on my own samples. I run the code below, and get the following output.


> Txi_gene.D.labrax <- tximport(path.D.labrax.3,  
+                      type = "kallisto", 
+                      tx2gene = Tx.D.labrax, 
+                      txOut = FALSE, 
+                      countsFromAbundance = "lengthScaledTPM",
+                      ignoreTxVersion = TRUE,
+                      ignoreAfterBar = TRUE)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3 
Error in .local(object, ...) : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Example IDs (file): [ENSDLAT00005064321, ENSDLAT00005064323, ENSDLAT00005005748, ...]

Example IDs (tx2gene): [ENSDLAT00005000002.1, ENSDLAT00005000003.1, ENSDLAT00005000004.1, ...]

  This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.

The thing I don't understand is when I manually search in the tsv file for a transcript from the tibble created from the Ensembl transcriptome e.g. ENSDLAT005example I can find it, and the same is true for the reverse. In the tsv and the tibble, both of the columns are called target_id. I have no idea where to go from here. It should work, but it isn't. All of these things have been done/installed in the last 2 weeks so I'm confident I'm not using outdated packages or something.



sessionInfo( )
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tximport_1.18.0 biomaRt_2.46.3 

loaded via a namespace (and not attached):
 [1] progress_1.2.2       tinytex_0.32         tidyselect_1.1.1     xfun_0.23            purrr_0.3.4         
 [6] rhdf5_2.34.0         vctrs_0.3.8          generics_0.1.0       htmltools_0.5.1.1    stats4_4.0.4        
[11] BiocFileCache_1.14.0 yaml_2.2.1           utf8_1.2.1           blob_1.2.1           XML_3.99-0.6        
[16] rlang_0.4.11         pillar_1.6.1         glue_1.4.2           DBI_1.1.1            rappdirs_0.3.3      
[21] BiocGenerics_0.36.1  bit64_4.0.5          dbplyr_2.1.1         lifecycle_1.0.0      stringr_1.4.0       
[26] memoise_2.0.0        evaluate_0.14        Biobase_2.50.0       knitr_1.33           IRanges_2.24.1      
[31] fastmap_1.1.0        parallel_4.0.4       curl_4.3.1           AnnotationDbi_1.52.0 fansi_0.5.0         
[36] Rcpp_1.0.6           readr_1.4.0          openssl_1.4.4        cachem_1.0.5         S4Vectors_0.28.1    
[41] bit_4.0.4            hms_1.1.0            askpass_1.1          digest_0.6.27        stringi_1.5.3       
[46] dplyr_1.0.6          rhdf5filters_1.2.1   tools_4.0.4          magrittr_2.0.1       RSQLite_2.2.7       
[51] tibble_3.1.2         crayon_1.4.1         pkgconfig_2.0.3      ellipsis_0.3.2       xml2_1.3.2          
[56] prettyunits_1.1.1    assertthat_0.2.1     rmarkdown_2.8        httr_1.4.2           Rhdf5lib_1.12.1     
[61] R6_2.5.0             compiler_4.0.4
tximport • 126 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

Your file has no version numbers, while your tx2gene does have version numbers. While the names look similar, they are not a match. tximport will not perform guesswork to try to match strings.

The arguments ignoreAfter... are for ignoring extra characters in your files (which are difficult to manipulate in R). tx2gene is easy to manipulate in R: it is a data.frame in your session.

tx2gene[,1] <- sub("\\..*", "", tx2gene[,1]) # match period plus any following characters
ADD COMMENT
0
Entering edit mode

While this line of code did fix the dataframe called by tx2gene it still has issues. However, if you toggle txOut = FALSE to txOut = TRUE, the code works. I know this is swapping from transcript level data instead of gene level data, but more than that I cannot explain/do not understand why it works.

I just want to say thank you for your continued responses and help!

ADD REPLY

Login before adding your answer.

Traffic: 561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6