Question

warning message from makeTxDbFromGFF in GenomicFeatures package

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 5 months ago

United States

Hi, while trying to make TxDb from gff file, I received the following warning message. Should I be concerned? How to interpret the warnings? Many thanks for your help!

Best regards,

Julie

gtffile <- file.path(dir,"ENSEMBLchrExt.v14.no.rRNA.gtf")
(ensemblchrExt_v14_no_rRNA_txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs =character()))

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: /s4s/s4s_nathan_lawson/SC3primeSeq/ENSEMBLchrExt.v14.no.rRNA.gtf
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# transcript_nrow: 54275
# exon_nrow: 333697
# cds_nrow: 271232
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2016-11-10 13:59:09 -0500 (Thu, 10 Nov 2016)
# GenomicFeatures version at creation time: 1.26.0
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
Warning messages:
1: In .reject_transcripts(bad_tx, because) :
The following transcripts were rejected because they have CDSs that
cannot be mapped to an exon: ENSDART00000000102, ENSDART00000000804,
ENSDART00000002299, ENSDART00000002571, ENSDART00000002856,
ENSDART00000002922, ENSDART00000002949, ENSDART00000003690,
ENSDART00000004209, ENSDART00000004721, ENSDART00000005195,
ENSDART00000005499, ENSDART00000005740, ENSDART00000007064,
ENSDART00000007247, ENSDART00000007914, ENSDART00000008554,
ENSDART00000009363, ENSDART00000009443, ENSDART00000009755,
ENSDART00000012507, ENSDART00000012551, ENSDART00000013534,
ENSDART00000014528, ENSDART00000014877, ENSDART00000016019,
ENSDART00000016183, ENSDART00000016582, ENSDART00000016753,
ENSDART00000018425, ENSDART00000019006, ENSDART00000019992,
ENSDART00000020040, ENSDART00000020116, ENSDART00000020234,
ENSDART00000020765, ENSDART00000021469, ENSDART00000021977,
ENSDART00000022349, ENSDART00000023039, ENSDART00000024009,
ENSDART00000024593, ENSDART00000024846, ENSDART000000 [... truncated]
2: In .reject_transcripts(bad_tx, because) :
The following transcripts were rejected because they have stop codons
that cannot be mapped to an exon: ENSDART00000000102,
ENSDART00000000804, ENSDART00000002299, ENSDART00000002571,
ENSDART00000002856, ENSDART00000002922, ENSDART00000002949,
ENSDART00000004209, ENSDART00000004721, ENSDART00000005195,
ENSDART00000005499, ENSDART00000005740, ENSDART00000007064,
ENSDART00000007247, ENSDART00000007914, ENSDART00000008554,
ENSDART00000009363, ENSDART00000009443, ENSDART00000009755,
ENSDART00000012507, ENSDART00000012551, ENSDART00000013534,
ENSDART00000014528, ENSDART00000016019, ENSDART00000016183,
ENSDART00000016582, ENSDART00000016753, ENSDART00000018425,
ENSDART00000019006, ENSDART00000019992, ENSDART00000020040,
ENSDART00000020116, ENSDART00000020234, ENSDART00000020765,
ENSDART00000021469, ENSDART00000021977, ENSDART00000023039,
ENSDART00000024009, ENSDART00000024593, ENSDART00000024846,
ENSDART00000024861, ENSDART00000024967, ENSDART00000024992,
ENSD [... truncated]

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] GenomicFeatures_1.26.0 AnnotationDbi_1.36.0 Biobase_2.34.0
[4] GenomicRanges_1.26.1 GenomeInfoDb_1.10.0 IRanges_2.8.0
[7] S4Vectors_0.12.0 BiocGenerics_0.20.0 Rsubread_1.24.0

loaded via a namespace (and not attached):
[1] XVector_0.14.0 zlibbioc_1.20.0
[3] GenomicAlignments_1.10.0 BiocParallel_1.8.0
[5] lattice_0.20-34 tools_3.3.1
[7] SummarizedExperiment_1.4.0 grid_3.3.1
[9] DBI_0.5-1 Matrix_1.2-7.1
[11] rtracklayer_1.34.0 bitops_1.0-6
[13] RCurl_1.95-4.8 biomaRt_2.30.0
[15] RSQLite_1.0.0 Biostrings_2.42.0
[17] Rsamtools_1.26.0 XML_3.98-1.4

genomicfeatures • 778 views

ADD COMMENT • link updated 7.4 years ago by Johannes Rainer ★ 2.0k • written 7.4 years ago by Julie Zhu ★ 4.3k

score 0 · Answer 1 · 2016-11-11

As it looks like the GTF you're using is from Ensembl - eventually you could also give a shot at the ensembldb package and create an EnsDb database/package instead (same functionality as the TxDb, but tables and annotations tailored for Ensembl annotations).

library(ensembldb)
## First create the SQLite database
db <- ensDbFromGtf(gtf = gtffile)
## Load that database and use it
edb <- EnsDb(db)
edb

You can use the same methods than with TxDbs. Have also a look at the ensembldb vignette for some more informations and capabilities.

cheers, jo