Hello Bioconductor community,
The error I am having is with makeTxDbFromGFF()
. My goal is to use tximeta
through se <- tximeta(coldata)
to import my salmon
quant.sf
files from a bulk RNA-seq experiment.
The warning I get (below) happens during se <- tximeta(coldata)
.
I think the pathway is sort-of like this... AnnotationHub does not have the Gencode M30 mouse on the AH-server so then tximeta
attempts to use makeTxDbFromGFF()
to make it from the .gtf
file from Gencode (https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/gencode.vM30.annotation.gtf.gz).
This works, but also gives a warning:
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
Below is a reproducible example, with the issue isolated to GenomicFeatures
. I obtained the .gtf
from Gencode here and the direct link on to the .gtf
I am using from that page is here:
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/gencode.vM30.annotation.gtf.gz
txdb <- makeTxDbFromGFF("/data/references/gencode.vM30.annotation.gtf")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
I realize this may not be a top priority right now. I am going to dig through the code on the GitHub, if anyone has any hints on where to look in the code to resolve this, I'd be really grateful. Thank you in advance.
SessionInfo:
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.49.6 AnnotationDbi_1.59.1 BiocParallel_1.31.12 GenomicState_0.99.15 AnnotationHub_3.5.1
[6] BiocFileCache_2.5.0 dbplyr_2.2.1 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10
[11] purrr_0.3.4 readr_2.1.2 tidyr_1.2.1 tibble_3.1.8 tidyverse_1.3.2
[16] viridis_0.6.2 viridisLite_0.4.1 EnhancedVolcano_1.15.0 ggrepel_0.9.1 plotly_4.10.0
[21] ggplot2_3.3.6 tximeta_1.15.2 biomaRt_2.53.2 DESeq2_1.37.6 SummarizedExperiment_1.27.3
[26] Biobase_2.57.1 MatrixGenerics_1.9.1 matrixStats_0.62.0 GenomicRanges_1.49.1 GenomeInfoDb_1.33.7
[31] IRanges_2.31.2 S4Vectors_0.35.3 BiocGenerics_0.43.4
loaded via a namespace (and not attached):
[1] readxl_1.4.1 backports_1.4.1 plyr_1.8.7 lazyeval_0.2.2
[5] splines_4.2.1 digest_0.6.29 ensembldb_2.21.4 htmltools_0.5.3
[9] fansi_1.0.3 magrittr_2.0.3 memoise_2.0.1 googlesheets4_1.0.1
[13] tzdb_0.3.0 Biostrings_2.65.6 annotate_1.75.0 modelr_0.1.9
[17] prettyunits_1.1.1 colorspace_2.0-3 blob_1.2.3 rvest_1.0.3
[21] rappdirs_0.3.3 haven_2.5.1 xfun_0.33 crayon_1.5.1
[25] RCurl_1.98-1.8 jsonlite_1.8.0 tximport_1.25.1 genefilter_1.79.0
[29] survival_3.4-0 glue_1.6.2 gtable_0.3.1 gargle_1.2.1
[33] zlibbioc_1.43.0 XVector_0.37.1 DelayedArray_0.23.2 SingleCellExperiment_1.19.0
[37] scales_1.2.1 pheatmap_1.0.12 DBI_1.1.3 Rcpp_1.0.9
[41] xtable_1.8-4 progress_1.2.2 bit_4.0.4 DT_0.25
[45] dittoSeq_1.9.3 htmlwidgets_1.5.4 httr_1.4.4 RColorBrewer_1.1-3
[49] ellipsis_0.3.2 pkgconfig_2.0.3 XML_3.99-0.10 locfit_1.5-9.6
[53] utf8_1.2.2 tidyselect_1.1.2 rlang_1.0.5 later_1.3.0
[57] munsell_0.5.0 BiocVersion_3.16.0 cellranger_1.1.0 tools_4.2.1
[61] cachem_1.0.6 cli_3.4.0 generics_0.1.3 RSQLite_2.2.17
[65] ggridges_0.5.3 broom_1.0.1 evaluate_0.16 fastmap_1.1.0
[69] yaml_2.3.5 knitr_1.40 bit64_4.0.5 fs_1.5.2
[73] KEGGREST_1.37.3 AnnotationFilter_1.21.0 mime_0.12 xml2_1.3.3
[77] compiler_4.2.1 rstudioapi_0.14 filelock_1.0.2 curl_4.3.2
[81] png_0.1-7 interactiveDisplayBase_1.35.0 reprex_2.0.2 geneplotter_1.75.0
[85] stringi_1.7.8 lattice_0.20-45 ProtGenerics_1.29.0 Matrix_1.5-1
[89] vctrs_0.4.1 pillar_1.8.1 lifecycle_1.0.2 BiocManager_1.30.18
[93] cowplot_1.1.1 data.table_1.14.2 bitops_1.0-7 httpuv_1.6.6
[97] rtracklayer_1.57.0 R6_2.5.1 BiocIO_1.7.1 promises_1.2.0.1
[101] gridExtra_2.3 codetools_0.2-18 assertthat_0.2.1 rjson_0.2.21
[105] withr_2.5.0 GenomicAlignments_1.33.1 Rsamtools_2.13.4 GenomeInfoDbData_1.2.8
[109] parallel_4.2.1 hms_1.1.2 grid_4.2.1 rmarkdown_2.16
[113] googledrive_2.0.0 shiny_1.7.2 lubridate_1.8.0 restfulr_0.0.15
Yup. No error here, this happens for all Gencode, and you can just ignore it.