Hi all,
I'm following along with DIYTranscriptomics
I cannot get Tximport to run on my own samples. I run the code below, and get the following output.
> Txi_gene.D.labrax <- tximport(path.D.labrax.3,
+ type = "kallisto",
+ tx2gene = Tx.D.labrax,
+ txOut = FALSE,
+ countsFromAbundance = "lengthScaledTPM",
+ ignoreTxVersion = TRUE,
+ ignoreAfterBar = TRUE)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3
Error in .local(object, ...) :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
Example IDs (file): [ENSDLAT00005064321, ENSDLAT00005064323, ENSDLAT00005005748, ...]
Example IDs (tx2gene): [ENSDLAT00005000002.1, ENSDLAT00005000003.1, ENSDLAT00005000004.1, ...]
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.
The thing I don't understand is when I manually search in the tsv file for a transcript from the tibble created from the Ensembl transcriptome e.g. ENSDLAT005example I can find it, and the same is true for the reverse. In the tsv and the tibble, both of the columns are called target_id. I have no idea where to go from here. It should work, but it isn't. All of these things have been done/installed in the last 2 weeks so I'm confident I'm not using outdated packages or something.
sessionInfo( )
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximport_1.18.0 biomaRt_2.46.3
loaded via a namespace (and not attached):
[1] progress_1.2.2 tinytex_0.32 tidyselect_1.1.1 xfun_0.23 purrr_0.3.4
[6] rhdf5_2.34.0 vctrs_0.3.8 generics_0.1.0 htmltools_0.5.1.1 stats4_4.0.4
[11] BiocFileCache_1.14.0 yaml_2.2.1 utf8_1.2.1 blob_1.2.1 XML_3.99-0.6
[16] rlang_0.4.11 pillar_1.6.1 glue_1.4.2 DBI_1.1.1 rappdirs_0.3.3
[21] BiocGenerics_0.36.1 bit64_4.0.5 dbplyr_2.1.1 lifecycle_1.0.0 stringr_1.4.0
[26] memoise_2.0.0 evaluate_0.14 Biobase_2.50.0 knitr_1.33 IRanges_2.24.1
[31] fastmap_1.1.0 parallel_4.0.4 curl_4.3.1 AnnotationDbi_1.52.0 fansi_0.5.0
[36] Rcpp_1.0.6 readr_1.4.0 openssl_1.4.4 cachem_1.0.5 S4Vectors_0.28.1
[41] bit_4.0.4 hms_1.1.0 askpass_1.1 digest_0.6.27 stringi_1.5.3
[46] dplyr_1.0.6 rhdf5filters_1.2.1 tools_4.0.4 magrittr_2.0.1 RSQLite_2.2.7
[51] tibble_3.1.2 crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.2
[56] prettyunits_1.1.1 assertthat_0.2.1 rmarkdown_2.8 httr_1.4.2 Rhdf5lib_1.12.1
[61] R6_2.5.0 compiler_4.0.4
While this line of code did fix the dataframe called by
tx2gene
it still has issues. However, if you toggletxOut = FALSE
totxOut = TRUE
, the code works. I know this is swapping from transcript level data instead of gene level data, but more than that I cannot explain/do not understand why it works.I just want to say thank you for your continued responses and help!