warning message from makeTxDbFromGFF in GenomicFeatures package
1
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 5 months ago
United States

Hi, while trying to make TxDb from gff file, I received the following warning message. Should I be concerned? How to interpret the warnings? Many thanks for your help!

Best regards,

Julie

gtffile <- file.path(dir,"ENSEMBLchrExt.v14.no.rRNA.gtf")
(ensemblchrExt_v14_no_rRNA_txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs =character()))

 

 

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: /s4s/s4s_nathan_lawson/SC3primeSeq/ENSEMBLchrExt.v14.no.rRNA.gtf
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# transcript_nrow: 54275
# exon_nrow: 333697
# cds_nrow: 271232
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2016-11-10 13:59:09 -0500 (Thu, 10 Nov 2016)
# GenomicFeatures version at creation time: 1.26.0
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
Warning messages:
1: In .reject_transcripts(bad_tx, because) :
  The following transcripts were rejected because they have CDSs that
  cannot be mapped to an exon: ENSDART00000000102, ENSDART00000000804,
  ENSDART00000002299, ENSDART00000002571, ENSDART00000002856,
  ENSDART00000002922, ENSDART00000002949, ENSDART00000003690,
  ENSDART00000004209, ENSDART00000004721, ENSDART00000005195,
  ENSDART00000005499, ENSDART00000005740, ENSDART00000007064,
  ENSDART00000007247, ENSDART00000007914, ENSDART00000008554,
  ENSDART00000009363, ENSDART00000009443, ENSDART00000009755,
  ENSDART00000012507, ENSDART00000012551, ENSDART00000013534,
  ENSDART00000014528, ENSDART00000014877, ENSDART00000016019,
  ENSDART00000016183, ENSDART00000016582, ENSDART00000016753,
  ENSDART00000018425, ENSDART00000019006, ENSDART00000019992,
  ENSDART00000020040, ENSDART00000020116, ENSDART00000020234,
  ENSDART00000020765, ENSDART00000021469, ENSDART00000021977,
  ENSDART00000022349, ENSDART00000023039, ENSDART00000024009,
  ENSDART00000024593, ENSDART00000024846, ENSDART000000 [... truncated]
2: In .reject_transcripts(bad_tx, because) :
  The following transcripts were rejected because they have stop codons
  that cannot be mapped to an exon: ENSDART00000000102,
  ENSDART00000000804, ENSDART00000002299, ENSDART00000002571,
  ENSDART00000002856, ENSDART00000002922, ENSDART00000002949,
  ENSDART00000004209, ENSDART00000004721, ENSDART00000005195,
  ENSDART00000005499, ENSDART00000005740, ENSDART00000007064,
  ENSDART00000007247, ENSDART00000007914, ENSDART00000008554,
  ENSDART00000009363, ENSDART00000009443, ENSDART00000009755,
  ENSDART00000012507, ENSDART00000012551, ENSDART00000013534,
  ENSDART00000014528, ENSDART00000016019, ENSDART00000016183,
  ENSDART00000016582, ENSDART00000016753, ENSDART00000018425,
  ENSDART00000019006, ENSDART00000019992, ENSDART00000020040,
  ENSDART00000020116, ENSDART00000020234, ENSDART00000020765,
  ENSDART00000021469, ENSDART00000021977, ENSDART00000023039,
  ENSDART00000024009, ENSDART00000024593, ENSDART00000024846,
  ENSDART00000024861, ENSDART00000024967, ENSDART00000024992,
  ENSD [... truncated]

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicFeatures_1.26.0 AnnotationDbi_1.36.0   Biobase_2.34.0        
[4] GenomicRanges_1.26.1   GenomeInfoDb_1.10.0    IRanges_2.8.0         
[7] S4Vectors_0.12.0       BiocGenerics_0.20.0    Rsubread_1.24.0       

loaded via a namespace (and not attached):
 [1] XVector_0.14.0             zlibbioc_1.20.0           
 [3] GenomicAlignments_1.10.0   BiocParallel_1.8.0        
 [5] lattice_0.20-34            tools_3.3.1               
 [7] SummarizedExperiment_1.4.0 grid_3.3.1                
 [9] DBI_0.5-1                  Matrix_1.2-7.1            
[11] rtracklayer_1.34.0         bitops_1.0-6              
[13] RCurl_1.95-4.8             biomaRt_2.30.0            
[15] RSQLite_1.0.0              Biostrings_2.42.0         
[17] Rsamtools_1.26.0           XML_3.98-1.4 

genomicfeatures • 778 views
ADD COMMENT
0
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 16 days ago
Italy

As it looks like the GTF you're using is from Ensembl - eventually you could also give a shot at the ensembldb package and create an EnsDb database/package instead (same functionality as the TxDb, but tables and annotations tailored for Ensembl annotations).

library(ensembldb)
## First create the SQLite database
db <- ensDbFromGtf(gtf = gtffile)
## Load that database and use it
edb <- EnsDb(db)
edb

You can use the same methods than with TxDbs. Have also a look at the ensembldb vignette for some more informations and capabilities.

cheers, jo

ADD COMMENT

Login before adding your answer.

Traffic: 727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6