Hey Marc,
I'm sorry, I came here via gmane.org and didn't see the posting guide.
I'll attach the relevant information this time.
I tried with the chrominfo argument, and in a sense it works. At least
there's no error about the missing chromosome size now. The main error
stays the same, though.
I checked my gff3 file with
http://modencode.oicr.on.ca/cgi-
bin/validate_gff3_online yesterday and according to them it is fine.
Here's the code:
library(VariantAnnotation)
library(GenomicFeatures)
library(BSgenome)
inf <- data.frame(cbind("NC_008463", 6537648, TRUE))
txdb <- makeTranscriptDbFromGFF(file="//CPI-
SL64001/spo12/BSgenome/annotation/NC_008463.gff", format="gff3",
dataSource="CDS", species="Pseudomonas aeruginosa", chrominfo=inf)
the error:
Prepare the 'metadata' data frame ... metadata: OK
Error in is.data.frame(arg) : object 'tables' not found
and the session info:
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] BSgenome_1.28.0 GenomicFeatures_1.12.3
AnnotationDbi_1.22.6
[4] Biobase_2.20.1 VariantAnnotation_1.6.7 Rsamtools_1.12.3
[7] Biostrings_2.28.0 GenomicRanges_1.12.4 IRanges_1.18.3
[10] BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] biomaRt_2.16.0 bitops_1.0-6 DBI_0.2-7
RCurl_1.95-4.1 RSQLite_0.11.4
[6] rtracklayer_1.20.4 stats4_3.0.1 tools_3.0.1
XML_3.98-1.1 zlibbioc_1.6.0
Date: Thu, 22 Aug 2013 11:27:39 -0700
From: Marc Carlson <mcarlson@fhcrc.org>
To: bioconductor at r-project.org
Subject: Re: [BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria
genomes
Message-ID: <5216581B.8090608 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 08/22/2013 02:12 AM, Sarah Pohl wrote:
> Cook, Malcolm <mec at="" ...=""> writes:
>
>> FYI, bioperl includes bp_genbank2gff3.pl
>>
>> which when run as
>>
>>> bp_genbank2gff3.pl NC_011025.gbk
>> produces NC_011025.gbk.gff (attached)
>>
>> which loaded without error with transcript:
>>
>>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff",
format="gff3",
> dataSource="NCBI",
>> species="Some bact")
>> extracting transcript information
>> Extracting gene IDs
>> extracting transcript information
>> Processing splicing information for gff3 file.
>> Deducing exon rank from relative coordinates provided
>> Prepare the 'metadata' data frame ... metadata: OK
>> Now generating chrominfo from available sequence names. No
chromosome
> length information is available.
>> Warning messages:
>> 1: In .deduceExonRankings(exs, format = "gff") :
>> Infering Exon Rankings. If this is not what you expected, then
please
> be sure that you have provided a valid
>> attribute for exonRankAttributeName
>> 2: In matchCircularity(chroms, circ_seqs) :
>> None of the strings in your circ_seqs argument match your
seqnames.
>>> txdb
>> TranscriptDb object:
>> | Db type: TranscriptDb
>> | Supporting package: GenomicFeatures
>> | Data source: NCBI
>> | Genus and Species: Some bact
>> | miRBase build ID: NA
>> | transcript_nrow: 631
>> | exon_nrow: 631
>> | cds_nrow: 631
>> | Db created by: GenomicFeatures package from Bioconductor
>> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013)
>> | GenomicFeatures version at creation time: 1.10.2
>> | RSQLite version at creation time: 0.11.2
>> | DBSCHEMAVERSION: 1.0
>
> Hey,
>
> I know I'm a bit late for this discussion, but I have a similar
problem.
>
> I have a bacterial GBK file which I tried to convert using the
> bp_genbank2gff3.pl script,
> perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/
> but I got the following error:
> "Can't call method "binomial" on an undefined value at
bp_genbank2gff3.pl
> line 672, <fh> line 208948."
> So instead I converted it with Biopython and the BCBio module, which
worked
> fine.
> Only now, when I try to load it with makeTranscriptDbFromGFF,
> txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff",
format="gff3",
> dataSource="CDS", species="Pseudomonas aeruginosa")
> I also get an error:
> Error in unique(tables[["transcripts"]][["tx_chrom"]]) :
> 'unique': Error: object 'tables' not found
>
> Why does this happen and what can I do about it?
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Sarah,
It's hard to help you because it's pretty difficult to know what
actually happened after reading your post. I can't be sure if the
other
scripts you mention produced a valid gff3 file and I have no idea
which
version of the software you are using. Please see our posting guide
here:
http://www.bioconductor.org/help/mailing-list/posting-guide/
But I will go out on a limb anyways and guess (based only the error
code
in your message), that your problem might get better if you passed in
a
value to the chrominfo argument. You can see an example of how to use
that argument in the manual page by pulling the manual page up like
this:
help(makeTranscriptDbFromGFF)
Hope this helps,
Marc
________________________________
Helmholtz-Zentrum f?r Infektionsforschung GmbH | Inhoffenstra?e 7 |
38124 Braunschweig | www.helmholtz-hzi.de
Das HZI ist seit 2007 zertifiziertes Mitglied im "audit
berufundfamilie"
Vorsitzende des Aufsichtsrates: MinDir?in B?rbel Brumme-Bothe,
Bundesministerium f?r Bildung und Forschung
Stellvertreter: R?diger Eichel, Abteilungsleiter Nieders?chsisches
Ministerium f?r Wissenschaft und Kultur
Gesch?ftsf?hrung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA
Gesellschaft mit beschr?nkter Haftung (GmbH)
Sitz der Gesellschaft: Braunschweig
Handelsregister: Amtsgericht Braunschweig, HRB 477