Search
Question: makeTranscriptDbFromGFF fails on NCBI Bacteria genomes
0
gravatar for Sarah Pohl
4.3 years ago by
Sarah Pohl30
Sarah Pohl30 wrote:
Cook, Malcolm <mec at="" ...=""> writes: > > FYI, bioperl includes bp_genbank2gff3.pl > > which when run as > > > bp_genbank2gff3.pl NC_011025.gbk > > produces NC_011025.gbk.gff (attached) > > which loaded without error with transcript: > > > txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3", dataSource="NCBI", > species="Some bact") > extracting transcript information > Extracting gene IDs > extracting transcript information > Processing splicing information for gff3 file. > Deducing exon rank from relative coordinates provided > Prepare the 'metadata' data frame ... metadata: OK > Now generating chrominfo from available sequence names. No chromosome length information is available. > Warning messages: > 1: In .deduceExonRankings(exs, format = "gff") : > Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid > attribute for exonRankAttributeName > 2: In matchCircularity(chroms, circ_seqs) : > None of the strings in your circ_seqs argument match your seqnames. > > txdb > TranscriptDb object: > | Db type: TranscriptDb > | Supporting package: GenomicFeatures > | Data source: NCBI > | Genus and Species: Some bact > | miRBase build ID: NA > | transcript_nrow: 631 > | exon_nrow: 631 > | cds_nrow: 631 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013) > | GenomicFeatures version at creation time: 1.10.2 > | RSQLite version at creation time: 0.11.2 > | DBSCHEMAVERSION: 1.0 Hey, I know I'm a bit late for this discussion, but I have a similar problem. I have a bacterial GBK file which I tried to convert using the bp_genbank2gff3.pl script, perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/ but I got the following error: "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl line 672, <fh> line 208948." So instead I converted it with Biopython and the BCBio module, which worked fine. Only now, when I try to load it with makeTranscriptDbFromGFF, txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3", dataSource="CDS", species="Pseudomonas aeruginosa") I also get an error: Error in unique(tables[["transcripts"]][["tx_chrom"]]) : 'unique': Error: object 'tables' not found Why does this happen and what can I do about it?
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Sarah Pohl30
0
gravatar for Marc Carlson
4.3 years ago by
Marc Carlson7.2k
United States
Marc Carlson7.2k wrote:
On 08/22/2013 02:12 AM, Sarah Pohl wrote: > Cook, Malcolm <mec at="" ...=""> writes: > >> FYI, bioperl includes bp_genbank2gff3.pl >> >> which when run as >> >>> bp_genbank2gff3.pl NC_011025.gbk >> produces NC_011025.gbk.gff (attached) >> >> which loaded without error with transcript: >> >>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3", > dataSource="NCBI", >> species="Some bact") >> extracting transcript information >> Extracting gene IDs >> extracting transcript information >> Processing splicing information for gff3 file. >> Deducing exon rank from relative coordinates provided >> Prepare the 'metadata' data frame ... metadata: OK >> Now generating chrominfo from available sequence names. No chromosome > length information is available. >> Warning messages: >> 1: In .deduceExonRankings(exs, format = "gff") : >> Infering Exon Rankings. If this is not what you expected, then please > be sure that you have provided a valid >> attribute for exonRankAttributeName >> 2: In matchCircularity(chroms, circ_seqs) : >> None of the strings in your circ_seqs argument match your seqnames. >>> txdb >> TranscriptDb object: >> | Db type: TranscriptDb >> | Supporting package: GenomicFeatures >> | Data source: NCBI >> | Genus and Species: Some bact >> | miRBase build ID: NA >> | transcript_nrow: 631 >> | exon_nrow: 631 >> | cds_nrow: 631 >> | Db created by: GenomicFeatures package from Bioconductor >> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013) >> | GenomicFeatures version at creation time: 1.10.2 >> | RSQLite version at creation time: 0.11.2 >> | DBSCHEMAVERSION: 1.0 > > Hey, > > I know I'm a bit late for this discussion, but I have a similar problem. > > I have a bacterial GBK file which I tried to convert using the > bp_genbank2gff3.pl script, > perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/ > but I got the following error: > "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl > line 672, <fh> line 208948." > So instead I converted it with Biopython and the BCBio module, which worked > fine. > Only now, when I try to load it with makeTranscriptDbFromGFF, > txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3", > dataSource="CDS", species="Pseudomonas aeruginosa") > I also get an error: > Error in unique(tables[["transcripts"]][["tx_chrom"]]) : > 'unique': Error: object 'tables' not found > > Why does this happen and what can I do about it? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Hi Sarah, It's hard to help you because it's pretty difficult to know what actually happened after reading your post. I can't be sure if the other scripts you mention produced a valid gff3 file and I have no idea which version of the software you are using. Please see our posting guide here: http://www.bioconductor.org/help/mailing-list/posting-guide/ But I will go out on a limb anyways and guess (based only the error code in your message), that your problem might get better if you passed in a value to the chrominfo argument. You can see an example of how to use that argument in the manual page by pulling the manual page up like this: help(makeTranscriptDbFromGFF) Hope this helps, Marc
ADD COMMENTlink written 4.3 years ago by Marc Carlson7.2k
0
gravatar for Sarah Pohl
4.3 years ago by
Sarah Pohl30
Sarah Pohl30 wrote:
Hey Marc, I'm sorry, I came here via gmane.org and didn't see the posting guide. I'll attach the relevant information this time. I tried with the chrominfo argument, and in a sense it works. At least there's no error about the missing chromosome size now. The main error stays the same, though. I checked my gff3 file with http://modencode.oicr.on.ca/cgi- bin/validate_gff3_online yesterday and according to them it is fine. Here's the code: library(VariantAnnotation) library(GenomicFeatures) library(BSgenome) inf <- data.frame(cbind("NC_008463", 6537648, TRUE)) txdb <- makeTranscriptDbFromGFF(file="//CPI- SL64001/spo12/BSgenome/annotation/NC_008463.gff", format="gff3", dataSource="CDS", species="Pseudomonas aeruginosa", chrominfo=inf) the error: Prepare the 'metadata' data frame ... metadata: OK Error in is.data.frame(arg) : object 'tables' not found and the session info: R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C LC_TIME=German_Germany.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome_1.28.0 GenomicFeatures_1.12.3 AnnotationDbi_1.22.6 [4] Biobase_2.20.1 VariantAnnotation_1.6.7 Rsamtools_1.12.3 [7] Biostrings_2.28.0 GenomicRanges_1.12.4 IRanges_1.18.3 [10] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] biomaRt_2.16.0 bitops_1.0-6 DBI_0.2-7 RCurl_1.95-4.1 RSQLite_0.11.4 [6] rtracklayer_1.20.4 stats4_3.0.1 tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0 Date: Thu, 22 Aug 2013 11:27:39 -0700 From: Marc Carlson <mcarlson@fhcrc.org> To: bioconductor at r-project.org Subject: Re: [BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria genomes Message-ID: <5216581B.8090608 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 08/22/2013 02:12 AM, Sarah Pohl wrote: > Cook, Malcolm <mec at="" ...=""> writes: > >> FYI, bioperl includes bp_genbank2gff3.pl >> >> which when run as >> >>> bp_genbank2gff3.pl NC_011025.gbk >> produces NC_011025.gbk.gff (attached) >> >> which loaded without error with transcript: >> >>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3", > dataSource="NCBI", >> species="Some bact") >> extracting transcript information >> Extracting gene IDs >> extracting transcript information >> Processing splicing information for gff3 file. >> Deducing exon rank from relative coordinates provided >> Prepare the 'metadata' data frame ... metadata: OK >> Now generating chrominfo from available sequence names. No chromosome > length information is available. >> Warning messages: >> 1: In .deduceExonRankings(exs, format = "gff") : >> Infering Exon Rankings. If this is not what you expected, then please > be sure that you have provided a valid >> attribute for exonRankAttributeName >> 2: In matchCircularity(chroms, circ_seqs) : >> None of the strings in your circ_seqs argument match your seqnames. >>> txdb >> TranscriptDb object: >> | Db type: TranscriptDb >> | Supporting package: GenomicFeatures >> | Data source: NCBI >> | Genus and Species: Some bact >> | miRBase build ID: NA >> | transcript_nrow: 631 >> | exon_nrow: 631 >> | cds_nrow: 631 >> | Db created by: GenomicFeatures package from Bioconductor >> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013) >> | GenomicFeatures version at creation time: 1.10.2 >> | RSQLite version at creation time: 0.11.2 >> | DBSCHEMAVERSION: 1.0 > > Hey, > > I know I'm a bit late for this discussion, but I have a similar problem. > > I have a bacterial GBK file which I tried to convert using the > bp_genbank2gff3.pl script, > perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/ > but I got the following error: > "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl > line 672, <fh> line 208948." > So instead I converted it with Biopython and the BCBio module, which worked > fine. > Only now, when I try to load it with makeTranscriptDbFromGFF, > txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3", > dataSource="CDS", species="Pseudomonas aeruginosa") > I also get an error: > Error in unique(tables[["transcripts"]][["tx_chrom"]]) : > 'unique': Error: object 'tables' not found > > Why does this happen and what can I do about it? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Hi Sarah, It's hard to help you because it's pretty difficult to know what actually happened after reading your post. I can't be sure if the other scripts you mention produced a valid gff3 file and I have no idea which version of the software you are using. Please see our posting guide here: http://www.bioconductor.org/help/mailing-list/posting-guide/ But I will go out on a limb anyways and guess (based only the error code in your message), that your problem might get better if you passed in a value to the chrominfo argument. You can see an example of how to use that argument in the manual page by pulling the manual page up like this: help(makeTranscriptDbFromGFF) Hope this helps, Marc ________________________________ Helmholtz-Zentrum f?r Infektionsforschung GmbH | Inhoffenstra?e 7 | 38124 Braunschweig | www.helmholtz-hzi.de Das HZI ist seit 2007 zertifiziertes Mitglied im "audit berufundfamilie" Vorsitzende des Aufsichtsrates: MinDir?in B?rbel Brumme-Bothe, Bundesministerium f?r Bildung und Forschung Stellvertreter: R?diger Eichel, Abteilungsleiter Nieders?chsisches Ministerium f?r Wissenschaft und Kultur Gesch?ftsf?hrung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA Gesellschaft mit beschr?nkter Haftung (GmbH) Sitz der Gesellschaft: Braunschweig Handelsregister: Amtsgericht Braunschweig, HRB 477
ADD COMMENTlink written 4.3 years ago by Sarah Pohl30
Thank you Sarah, That is much better. Is this the file you were parsing here? ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Pseudomonas_aeruginosa_UCB PP_PA14_uid57977/NC_008463.gff Marc On 08/23/2013 03:49 AM, Sarah Pohl wrote: > Hey Marc, > > I'm sorry, I came here via gmane.org and didn't see the posting guide. I'll attach the relevant information this time. > I tried with the chrominfo argument, and in a sense it works. At least there's no error about the missing chromosome size now. The main error stays the same, though. > > I checked my gff3 file with http://modencode.oicr.on.ca/cgi- bin/validate_gff3_online yesterday and according to them it is fine. > > Here's the code: > library(VariantAnnotation) > library(GenomicFeatures) > library(BSgenome) > inf <- data.frame(cbind("NC_008463", 6537648, TRUE)) > txdb <- makeTranscriptDbFromGFF(file="//CPI- SL64001/spo12/BSgenome/annotation/NC_008463.gff", format="gff3", dataSource="CDS", species="Pseudomonas aeruginosa", chrominfo=inf) > > the error: > Prepare the 'metadata' data frame ... metadata: OK > Error in is.data.frame(arg) : object 'tables' not found > > and the session info: > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 > [4] LC_NUMERIC=C LC_TIME=German_Germany.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BSgenome_1.28.0 GenomicFeatures_1.12.3 AnnotationDbi_1.22.6 > [4] Biobase_2.20.1 VariantAnnotation_1.6.7 Rsamtools_1.12.3 > [7] Biostrings_2.28.0 GenomicRanges_1.12.4 IRanges_1.18.3 > [10] BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 bitops_1.0-6 DBI_0.2-7 RCurl_1.95-4.1 RSQLite_0.11.4 > [6] rtracklayer_1.20.4 stats4_3.0.1 tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0 > Date: Thu, 22 Aug 2013 11:27:39 -0700 > From: Marc Carlson <mcarlson at="" fhcrc.org=""> > To: bioconductor at r-project.org > Subject: Re: [BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria > genomes > Message-ID: <5216581B.8090608 at fhcrc.org> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > On 08/22/2013 02:12 AM, Sarah Pohl wrote: >> Cook, Malcolm <mec at="" ...=""> writes: >> >>> FYI, bioperl includes bp_genbank2gff3.pl >>> >>> which when run as >>> >>>> bp_genbank2gff3.pl NC_011025.gbk >>> produces NC_011025.gbk.gff (attached) >>> >>> which loaded without error with transcript: >>> >>>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3", >> dataSource="NCBI", >>> species="Some bact") >>> extracting transcript information >>> Extracting gene IDs >>> extracting transcript information >>> Processing splicing information for gff3 file. >>> Deducing exon rank from relative coordinates provided >>> Prepare the 'metadata' data frame ... metadata: OK >>> Now generating chrominfo from available sequence names. No chromosome >> length information is available. >>> Warning messages: >>> 1: In .deduceExonRankings(exs, format = "gff") : >>> Infering Exon Rankings. If this is not what you expected, then please >> be sure that you have provided a valid >>> attribute for exonRankAttributeName >>> 2: In matchCircularity(chroms, circ_seqs) : >>> None of the strings in your circ_seqs argument match your seqnames. >>>> txdb >>> TranscriptDb object: >>> | Db type: TranscriptDb >>> | Supporting package: GenomicFeatures >>> | Data source: NCBI >>> | Genus and Species: Some bact >>> | miRBase build ID: NA >>> | transcript_nrow: 631 >>> | exon_nrow: 631 >>> | cds_nrow: 631 >>> | Db created by: GenomicFeatures package from Bioconductor >>> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013) >>> | GenomicFeatures version at creation time: 1.10.2 >>> | RSQLite version at creation time: 0.11.2 >>> | DBSCHEMAVERSION: 1.0 >> Hey, >> >> I know I'm a bit late for this discussion, but I have a similar problem. >> >> I have a bacterial GBK file which I tried to convert using the >> bp_genbank2gff3.pl script, >> perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/ >> but I got the following error: >> "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl >> line 672, <fh> line 208948." >> So instead I converted it with Biopython and the BCBio module, which worked >> fine. >> Only now, when I try to load it with makeTranscriptDbFromGFF, >> txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3", >> dataSource="CDS", species="Pseudomonas aeruginosa") >> I also get an error: >> Error in unique(tables[["transcripts"]][["tx_chrom"]]) : >> 'unique': Error: object 'tables' not found >> >> Why does this happen and what can I do about it? >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > Hi Sarah, > > It's hard to help you because it's pretty difficult to know what > actually happened after reading your post. I can't be sure if the other > scripts you mention produced a valid gff3 file and I have no idea which > version of the software you are using. Please see our posting guide here: > > http://www.bioconductor.org/help/mailing-list/posting-guide/ > > But I will go out on a limb anyways and guess (based only the error code > in your message), that your problem might get better if you passed in a > value to the chrominfo argument. You can see an example of how to use > that argument in the manual page by pulling the manual page up like this: > > help(makeTranscriptDbFromGFF) > > Hope this helps, > > > Marc > > ________________________________ > > Helmholtz-Zentrum f?r Infektionsforschung GmbH | Inhoffenstra?e 7 | 38124 Braunschweig | www.helmholtz-hzi.de > Das HZI ist seit 2007 zertifiziertes Mitglied im "audit berufundfamilie" > > Vorsitzende des Aufsichtsrates: MinDir?in B?rbel Brumme-Bothe, Bundesministerium f?r Bildung und Forschung > Stellvertreter: R?diger Eichel, Abteilungsleiter Nieders?chsisches Ministerium f?r Wissenschaft und Kultur > Gesch?ftsf?hrung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA > Gesellschaft mit beschr?nkter Haftung (GmbH) > Sitz der Gesellschaft: Braunschweig > Handelsregister: Amtsgericht Braunschweig, HRB 477 > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 4.3 years ago by Marc Carlson7.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 160 users visited in the last hour