Hi I'm getting some warnings in
makeTxDbFromGFF()
here is full stacktrace:
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: /home/weir/RNAedit/human_test/reference/GCF_000001405.38_GRCh38.p12_genomic.gff
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# transcript_nrow: 178581
# exon_nrow: 1945509
# cds_nrow: 1460272
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2019-06-17 22:31:22 +0800 (Mon, 17 Jun 2019)
# GenomicFeatures version at creation time: 1.34.8
# RSQLite version at creation time: 2.1.1
# DBSCHEMAVERSION: 1.2
Warning messages:
1: In .extract_exons_from_GRanges(exon_IDX, gr, ID, Name, Parent, feature = "exon", :
The following orphan exon were dropped (showing only the 6 first):
seqid start end strand ID
1 NC_000001.11 15542166 15542304 + exon-NR_135613.1-1
2 NC_000001.11 27834401 27834566 + exon-NR_002997.1-1
3 NC_000001.11 109100193 109100612 + exon-NR_003023.1-1
4 NC_000001.11 144875032 144875095 - exon-id-LOC107985528-1
5 NC_000001.11 144874355 144874907 - exon-id-LOC107985528-2
6 NC_000001.11 155679108 155679255 - exon-NR_132762.1-1
Parent Name
1 rna-NR_135613.1 exon-NR_135613.1-1
2 rna-NR_002997.1 exon-NR_002997.1-1
3 rna-NR_003023.1 exon-NR_003023.1-1
4 id-LOC107985528 exon-id-LOC107985528-1
5 id-LOC107985528 exon-id-LOC107985528-2
6 rna-NR_132762.1 exon-NR_132762.1-1
2: In .extract_exons_from_GRanges(cds_IDX, gr, ID, Name, Parent, feature = "cds", :
The following orphan CDS were dropped (showing only the 6 first):
seqid start end strand ID Parent Name
1 NC_000001.11 144875032 144875080 - cds-LOC107985528 id-LOC107985528 <NA>
2 NC_000001.11 144874585 144874907 - cds-LOC107985528 id-LOC107985528 <NA>
3 NC_000002.12 88857361 88857683 - cds-IGKC id-IGKC <NA>
4 NC_000002.12 88860568 88860605 - cds-IGKJ5 id-IGKJ5 <NA>
5 NC_000002.12 88860886 88860923 - cds-IGKJ4 id-IGKJ4 <NA>
6 NC_000002.12 88861221 88861258 - cds-IGKJ3 id-IGKJ3 <NA>
3: In .find_exon_cds(exons, cds) :
The following transcripts have exons that contain more than one CDS
(only the first CDS was kept for each exon): rna-NM_001134939.1,
rna-NM_001172437.2, rna-NM_001184961.1, rna-NM_001301020.1,
rna-NM_001301302.1, rna-NM_001301371.1, rna-NM_002537.3,
rna-NM_004152.3, rna-NM_015068.3, rna-NM_016178.2
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)
Matrix products: default
BLAS/LAPACK: /home/weir/anaconda3/lib/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicFeatures_1.34.8 AnnotationDbi_1.44.0 Biobase_2.40.0
[4] GenomicRanges_1.34.0 GenomeInfoDb_1.16.0 IRanges_2.16.0
[7] S4Vectors_0.20.1 AnnotationHub_2.12.1 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] SummarizedExperiment_1.12.0 progress_1.2.0
[3] lattice_0.20-38 htmltools_0.3.6
[5] rtracklayer_1.42.0 yaml_2.2.0
[7] interactiveDisplayBase_1.18.0 blob_1.1.1
[9] XML_3.98-1.12 rlang_0.3.4
[11] later_0.8.0 DBI_1.0.0
[13] BiocParallel_1.16.0 bit64_0.9-7
[15] matrixStats_0.54.0 GenomeInfoDbData_1.1.0
[17] stringr_1.4.0 zlibbioc_1.26.0
[19] Biostrings_2.48.0 memoise_1.1.0
[21] biomaRt_2.38.0 httpuv_1.5.1
[23] BiocInstaller_1.30.0 curl_3.3
[25] Rcpp_1.0.1 xtable_1.8-3
[27] promises_1.0.1 DelayedArray_0.8.0
[29] XVector_0.22.0 mime_0.6
[31] bit_1.1-12 Rsamtools_1.34.0
[33] hms_0.4.2 digest_0.6.18
[35] stringi_1.4.3 shiny_1.2.0
[37] grid_3.5.1 tools_3.5.1
[39] bitops_1.0-6 magrittr_1.5
[41] RCurl_1.95-4.12 RSQLite_2.1.1
[43] crayon_1.3.4 pkgconfig_2.0.2
[45] Matrix_1.2-17 prettyunits_1.0.2
[47] assertthat_0.2.1 httr_1.4.0
[49] R6_2.4.0 GenomicAlignments_1.18.0
[51] compiler_3.5.1
The GFF file is download from https://www.ncbi.nlm.nih.gov/genome/?term=human
Can someone help me? Best wishes weir
Did that help?
Hi, I also meet the same problems, and I checked the warnings, and as you mentioned above, I use the command
grep -E "pre_miRNA" $HOME/datax/Genomes/IRGSP-1.0-release/gff/IRGSP-1.0.50.chrs.gff3 |grep -vE "ncRNA_gene" >items
, and find all the warning items, it seems that was resulted from "ncRNA_gene" or "pre_miRNA". But I don't know how to deal with that. Could you give me some suggestions?I added "pre_miRNA" to the end of ".TX_TYPES" exsisted in "
makeTxDbFromGRanges.R
", and then re-installed this packages. And it worked well with no warnings reported.I'm only seeing your post now, sorry.
Unfortunately according to the Sequence Ontology,
pre_miRNA
is not an offspring oftranscript
via the _is_a_ relationship, only via the _part_of_ relationship, which means that features of typepre_miRNA
are not considered transcripts. So I'm not sure that it was a good idea for the author of this GFF3 file to use thepre_miRNA
term for the purpose of describing the exon/transcript structure of the associated genes. Maybe they should have usedmiRNA_primary_transcript
instead?Anyways, because
pre_miRNA
are not transcripts, I'm reluctant to add the term to the list of terms thatmakeTxDbFromGRanges()
andmakeTxDbFromGFF()
should treat as transcripts.Best,
H.