Hello,
I have been trying to use makeTranscriptDbFromGFF() to read my gff file but I get the following error:
extracting transcript information
Error in .prepareGFF3TXS(data, useGenesAsTranscripts) :
Unexpected transcript duplicates
In addition: Warning message:
In .local(con, format, text, ...) :
gff-version directive indicates version is 3, not 3
I have read that this was a problem when he file does not have transcript features. My file does and I have checked all the transcript IDs are unique. This error did not happen in Bioconductor 2.14 but appeared when I upgraded to 3.0. useGenesAsTranscripts=TRUE does not help.
The file I am using seems legitimate, its has been validated by genometools gt gff3, which is pretty strict. I am including my R code, session info and the two genes from my gff3 (if I use just them everything is fine, but using the full file causes a problem, although all genes are formatted in the same manner, I am afraid in order to reproduce the error I would have to share the entire gff3 file).
txdb <- makeTranscriptDbFromGFF("annotation.gff3")
> sessionInfo()
R version 3.1.0 RC (2014-04-05 r65382)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] VariantAnnotation_1.10.5 Rsamtools_1.16.1 Biostrings_2.32.1
[4] XVector_0.4.0 GenomicFeatures_1.16.3 AnnotationDbi_1.26.1
[7] Biobase_2.22.0 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2
[10] IRanges_1.22.10 BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.3.1
[5] RCurl_1.95-4.3 RSQLite_0.11.4 rtracklayer_1.22.7 stats4_3.1.0
[9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.8.0
head -n 100 annotation.gff3
##gff-version 3
C1 JCVI gene 1449 1625 . - . ID=Bo1g001000;Note=cyclic nucleotide gated channel
C1 JCVI mRNA 1449 1625 . - . ID=Bo1g001000.1;Parent=Bo1g001000;Note=cyclic nucleotide gated channel
C1 JCVI exon 1449 1625 . - . ID=Bo1g001000.1_exon_1;Parent=Bo1g001000.1
C1 JCVI CDS 1449 1625 . - 0 ID=Bo1g001000.1_cds_1;Parent=Bo1g001000.1
###
C1 JCVI gene 1711 2833 . + . ID=Bo1g001010;Note=Amino acid transporter family protein
C1 JCVI mRNA 1711 2833 . + . ID=Bo1g001010.1;Parent=Bo1g001010;Note=Amino acid transporter family protein
C1 JCVI exon 1711 1799 . + . ID=Bo1g001010.1_exon_1;Parent=Bo1g001010.1
C1 JCVI CDS 1711 1799 . + 0 ID=Bo1g001010.1_cds_1;Parent=Bo1g001010.1
C1 JCVI exon 1937 2288 . + . ID=Bo1g001010.1_exon_2;Parent=Bo1g001010.1
C1 JCVI CDS 1937 2288 . + 1 ID=Bo1g001010.1_cds_2;Parent=Bo1g001010.1
C1 JCVI exon 2366 2833 . + . ID=Bo1g001010.1_exon_3;Parent=Bo1g001010.1
C1 JCVI CDS 2366 2833 . + 0 ID=Bo1g001010.1_cds_3;Parent=Bo1g001010.1
Thanks for the encouragements Thomas, always nice to hear! H.