makeTranscriptDbFromGFF fails on NCBI Bacteria genomes

0

Entering edit mode

Sarah Pohl ▴ 30

@sarah-pohl-6107

Last seen 11.4 years ago

Cook, Malcolm <mec at="" ...=""> writes: > > FYI, bioperl includes bp_genbank2gff3.pl > > which when run as > > > bp_genbank2gff3.pl NC_011025.gbk > > produces NC_011025.gbk.gff (attached) > > which loaded without error with transcript: > > > txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3", dataSource="NCBI", > species="Some bact") > extracting transcript information > Extracting gene IDs > extracting transcript information > Processing splicing information for gff3 file. > Deducing exon rank from relative coordinates provided > Prepare the 'metadata' data frame ... metadata: OK > Now generating chrominfo from available sequence names. No chromosome length information is available. > Warning messages: > 1: In .deduceExonRankings(exs, format = "gff") : > Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid > attribute for exonRankAttributeName > 2: In matchCircularity(chroms, circ_seqs) : > None of the strings in your circ_seqs argument match your seqnames. > > txdb > TranscriptDb object: > | Db type: TranscriptDb > | Supporting package: GenomicFeatures > | Data source: NCBI > | Genus and Species: Some bact > | miRBase build ID: NA > | transcript_nrow: 631 > | exon_nrow: 631 > | cds_nrow: 631 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013) > | GenomicFeatures version at creation time: 1.10.2 > | RSQLite version at creation time: 0.11.2 > | DBSCHEMAVERSION: 1.0 Hey, I know I'm a bit late for this discussion, but I have a similar problem. I have a bacterial GBK file which I tried to convert using the bp_genbank2gff3.pl script, perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/ but I got the following error: "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl line 672, <fh> line 208948." So instead I converted it with Biopython and the BCBio module, which worked fine. Only now, when I try to load it with makeTranscriptDbFromGFF, txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3", dataSource="CDS", species="Pseudomonas aeruginosa") I also get an error: Error in unique(tables[["transcripts"]][["tx_chrom"]]) : 'unique': Error: object 'tables' not found Why does this happen and what can I do about it?

TranscriptDb convert GenomicFeatures TranscriptDb convert GenomicFeatures • 2.2k views

ADD COMMENT • link 12.5 years ago Sarah Pohl ▴ 30

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 9.5 years ago

United States

On 08/22/2013 02:12 AM, Sarah Pohl wrote: > Cook, Malcolm <mec at="" ...=""> writes: > >> FYI, bioperl includes bp_genbank2gff3.pl >> >> which when run as >> >>> bp_genbank2gff3.pl NC_011025.gbk >> produces NC_011025.gbk.gff (attached) >> >> which loaded without error with transcript: >> >>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3", > dataSource="NCBI", >> species="Some bact") >> extracting transcript information >> Extracting gene IDs >> extracting transcript information >> Processing splicing information for gff3 file. >> Deducing exon rank from relative coordinates provided >> Prepare the 'metadata' data frame ... metadata: OK >> Now generating chrominfo from available sequence names. No chromosome > length information is available. >> Warning messages: >> 1: In .deduceExonRankings(exs, format = "gff") : >> Infering Exon Rankings. If this is not what you expected, then please > be sure that you have provided a valid >> attribute for exonRankAttributeName >> 2: In matchCircularity(chroms, circ_seqs) : >> None of the strings in your circ_seqs argument match your seqnames. >>> txdb >> TranscriptDb object: >> | Db type: TranscriptDb >> | Supporting package: GenomicFeatures >> | Data source: NCBI >> | Genus and Species: Some bact >> | miRBase build ID: NA >> | transcript_nrow: 631 >> | exon_nrow: 631 >> | cds_nrow: 631 >> | Db created by: GenomicFeatures package from Bioconductor >> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013) >> | GenomicFeatures version at creation time: 1.10.2 >> | RSQLite version at creation time: 0.11.2 >> | DBSCHEMAVERSION: 1.0 > > Hey, > > I know I'm a bit late for this discussion, but I have a similar problem. > > I have a bacterial GBK file which I tried to convert using the > bp_genbank2gff3.pl script, > perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/ > but I got the following error: > "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl > line 672, <fh> line 208948." > So instead I converted it with Biopython and the BCBio module, which worked > fine. > Only now, when I try to load it with makeTranscriptDbFromGFF, > txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3", > dataSource="CDS", species="Pseudomonas aeruginosa") > I also get an error: > Error in unique(tables[["transcripts"]][["tx_chrom"]]) : > 'unique': Error: object 'tables' not found > > Why does this happen and what can I do about it? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Hi Sarah, It's hard to help you because it's pretty difficult to know what actually happened after reading your post. I can't be sure if the other scripts you mention produced a valid gff3 file and I have no idea which version of the software you are using. Please see our posting guide here: http://www.bioconductor.org/help/mailing-list/posting-guide/ But I will go out on a limb anyways and guess (based only the error code in your message), that your problem might get better if you passed in a value to the chrominfo argument. You can see an example of how to use that argument in the manual page by pulling the manual page up like this: help(makeTranscriptDbFromGFF) Hope this helps, Marc

ADD COMMENT • link 12.5 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Sarah Pohl ▴ 30

@sarah-pohl-6107

Last seen 11.4 years ago

Hey Marc, I'm sorry, I came here via gmane.org and didn't see the posting guide. I'll attach the relevant information this time. I tried with the chrominfo argument, and in a sense it works. At least there's no error about the missing chromosome size now. The main error stays the same, though. I checked my gff3 file with http://modencode.oicr.on.ca/cgi- bin/validate_gff3_online yesterday and according to them it is fine. Here's the code: library(VariantAnnotation) library(GenomicFeatures) library(BSgenome) inf <- data.frame(cbind("NC_008463", 6537648, TRUE)) txdb <- makeTranscriptDbFromGFF(file="//CPI- SL64001/spo12/BSgenome/annotation/NC_008463.gff", format="gff3", dataSource="CDS", species="Pseudomonas aeruginosa", chrominfo=inf) the error: Prepare the 'metadata' data frame ... metadata: OK Error in is.data.frame(arg) : object 'tables' not found and the session info: R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C LC_TIME=German_Germany.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome_1.28.0 GenomicFeatures_1.12.3 AnnotationDbi_1.22.6 [4] Biobase_2.20.1 VariantAnnotation_1.6.7 Rsamtools_1.12.3 [7] Biostrings_2.28.0 GenomicRanges_1.12.4 IRanges_1.18.3 [10] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] biomaRt_2.16.0 bitops_1.0-6 DBI_0.2-7 RCurl_1.95-4.1 RSQLite_0.11.4 [6] rtracklayer_1.20.4 stats4_3.0.1 tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0 Date: Thu, 22 Aug 2013 11:27:39 -0700 From: Marc Carlson <mcarlson@fhcrc.org> To: bioconductor at r-project.org Subject: Re: [BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria genomes Message-ID: <5216581B.8090608 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 08/22/2013 02:12 AM, Sarah Pohl wrote: > Cook, Malcolm <mec at="" ...=""> writes: > >> FYI, bioperl includes bp_genbank2gff3.pl >> >> which when run as >> >>> bp_genbank2gff3.pl NC_011025.gbk >> produces NC_011025.gbk.gff (attached) >> >> which loaded without error with transcript: >> >>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3", > dataSource="NCBI", >> species="Some bact") >> extracting transcript information >> Extracting gene IDs >> extracting transcript information >> Processing splicing information for gff3 file. >> Deducing exon rank from relative coordinates provided >> Prepare the 'metadata' data frame ... metadata: OK >> Now generating chrominfo from available sequence names. No chromosome > length information is available. >> Warning messages: >> 1: In .deduceExonRankings(exs, format = "gff") : >> Infering Exon Rankings. If this is not what you expected, then please > be sure that you have provided a valid >> attribute for exonRankAttributeName >> 2: In matchCircularity(chroms, circ_seqs) : >> None of the strings in your circ_seqs argument match your seqnames. >>> txdb >> TranscriptDb object: >> | Db type: TranscriptDb >> | Supporting package: GenomicFeatures >> | Data source: NCBI >> | Genus and Species: Some bact >> | miRBase build ID: NA >> | transcript_nrow: 631 >> | exon_nrow: 631 >> | cds_nrow: 631 >> | Db created by: GenomicFeatures package from Bioconductor >> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013) >> | GenomicFeatures version at creation time: 1.10.2 >> | RSQLite version at creation time: 0.11.2 >> | DBSCHEMAVERSION: 1.0 > > Hey, > > I know I'm a bit late for this discussion, but I have a similar problem. > > I have a bacterial GBK file which I tried to convert using the > bp_genbank2gff3.pl script, > perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/ > but I got the following error: > "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl > line 672, <fh> line 208948." > So instead I converted it with Biopython and the BCBio module, which worked > fine. > Only now, when I try to load it with makeTranscriptDbFromGFF, > txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3", > dataSource="CDS", species="Pseudomonas aeruginosa") > I also get an error: > Error in unique(tables[["transcripts"]][["tx_chrom"]]) : > 'unique': Error: object 'tables' not found > > Why does this happen and what can I do about it? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Hi Sarah, It's hard to help you because it's pretty difficult to know what actually happened after reading your post. I can't be sure if the other scripts you mention produced a valid gff3 file and I have no idea which version of the software you are using. Please see our posting guide here: http://www.bioconductor.org/help/mailing-list/posting-guide/ But I will go out on a limb anyways and guess (based only the error code in your message), that your problem might get better if you passed in a value to the chrominfo argument. You can see an example of how to use that argument in the manual page by pulling the manual page up like this: help(makeTranscriptDbFromGFF) Hope this helps, Marc ________________________________ Helmholtz-Zentrum f?r Infektionsforschung GmbH | Inhoffenstra?e 7 | 38124 Braunschweig | www.helmholtz-hzi.de Das HZI ist seit 2007 zertifiziertes Mitglied im "audit berufundfamilie" Vorsitzende des Aufsichtsrates: MinDir?in B?rbel Brumme-Bothe, Bundesministerium f?r Bildung und Forschung Stellvertreter: R?diger Eichel, Abteilungsleiter Nieders?chsisches Ministerium f?r Wissenschaft und Kultur Gesch?ftsf?hrung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA Gesellschaft mit beschr?nkter Haftung (GmbH) Sitz der Gesellschaft: Braunschweig Handelsregister: Amtsgericht Braunschweig, HRB 477

ADD COMMENT • link 12.5 years ago Sarah Pohl ▴ 30

0

Entering edit mode

Thank you Sarah, That is much better. Is this the file you were parsing here? ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Pseudomonas_aeruginosa_UCB PP_PA14_uid57977/NC_008463.gff Marc On 08/23/2013 03:49 AM, Sarah Pohl wrote: > Hey Marc, > > I'm sorry, I came here via gmane.org and didn't see the posting guide. I'll attach the relevant information this time. > I tried with the chrominfo argument, and in a sense it works. At least there's no error about the missing chromosome size now. The main error stays the same, though. > > I checked my gff3 file with http://modencode.oicr.on.ca/cgi- bin/validate_gff3_online yesterday and according to them it is fine. > > Here's the code: > library(VariantAnnotation) > library(GenomicFeatures) > library(BSgenome) > inf <- data.frame(cbind("NC_008463", 6537648, TRUE)) > txdb <- makeTranscriptDbFromGFF(file="//CPI- SL64001/spo12/BSgenome/annotation/NC_008463.gff", format="gff3", dataSource="CDS", species="Pseudomonas aeruginosa", chrominfo=inf) > > the error: > Prepare the 'metadata' data frame ... metadata: OK > Error in is.data.frame(arg) : object 'tables' not found > > and the session info: > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 > [4] LC_NUMERIC=C LC_TIME=German_Germany.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BSgenome_1.28.0 GenomicFeatures_1.12.3 AnnotationDbi_1.22.6 > [4] Biobase_2.20.1 VariantAnnotation_1.6.7 Rsamtools_1.12.3 > [7] Biostrings_2.28.0 GenomicRanges_1.12.4 IRanges_1.18.3 > [10] BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 bitops_1.0-6 DBI_0.2-7 RCurl_1.95-4.1 RSQLite_0.11.4 > [6] rtracklayer_1.20.4 stats4_3.0.1 tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0 > Date: Thu, 22 Aug 2013 11:27:39 -0700 > From: Marc Carlson <mcarlson at="" fhcrc.org=""> > To: bioconductor at r-project.org > Subject: Re: [BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria > genomes > Message-ID: <5216581B.8090608 at fhcrc.org> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > On 08/22/2013 02:12 AM, Sarah Pohl wrote: >> Cook, Malcolm <mec at="" ...=""> writes: >> >>> FYI, bioperl includes bp_genbank2gff3.pl >>> >>> which when run as >>> >>>> bp_genbank2gff3.pl NC_011025.gbk >>> produces NC_011025.gbk.gff (attached) >>> >>> which loaded without error with transcript: >>> >>>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3", >> dataSource="NCBI", >>> species="Some bact") >>> extracting transcript information >>> Extracting gene IDs >>> extracting transcript information >>> Processing splicing information for gff3 file. >>> Deducing exon rank from relative coordinates provided >>> Prepare the 'metadata' data frame ... metadata: OK >>> Now generating chrominfo from available sequence names. No chromosome >> length information is available. >>> Warning messages: >>> 1: In .deduceExonRankings(exs, format = "gff") : >>> Infering Exon Rankings. If this is not what you expected, then please >> be sure that you have provided a valid >>> attribute for exonRankAttributeName >>> 2: In matchCircularity(chroms, circ_seqs) : >>> None of the strings in your circ_seqs argument match your seqnames. >>>> txdb >>> TranscriptDb object: >>> | Db type: TranscriptDb >>> | Supporting package: GenomicFeatures >>> | Data source: NCBI >>> | Genus and Species: Some bact >>> | miRBase build ID: NA >>> | transcript_nrow: 631 >>> | exon_nrow: 631 >>> | cds_nrow: 631 >>> | Db created by: GenomicFeatures package from Bioconductor >>> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013) >>> | GenomicFeatures version at creation time: 1.10.2 >>> | RSQLite version at creation time: 0.11.2 >>> | DBSCHEMAVERSION: 1.0 >> Hey, >> >> I know I'm a bit late for this discussion, but I have a similar problem. >> >> I have a bacterial GBK file which I tried to convert using the >> bp_genbank2gff3.pl script, >> perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/ >> but I got the following error: >> "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl >> line 672, <fh> line 208948." >> So instead I converted it with Biopython and the BCBio module, which worked >> fine. >> Only now, when I try to load it with makeTranscriptDbFromGFF, >> txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3", >> dataSource="CDS", species="Pseudomonas aeruginosa") >> I also get an error: >> Error in unique(tables[["transcripts"]][["tx_chrom"]]) : >> 'unique': Error: object 'tables' not found >> >> Why does this happen and what can I do about it? >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > Hi Sarah, > > It's hard to help you because it's pretty difficult to know what > actually happened after reading your post. I can't be sure if the other > scripts you mention produced a valid gff3 file and I have no idea which > version of the software you are using. Please see our posting guide here: > > http://www.bioconductor.org/help/mailing-list/posting-guide/ > > But I will go out on a limb anyways and guess (based only the error code > in your message), that your problem might get better if you passed in a > value to the chrominfo argument. You can see an example of how to use > that argument in the manual page by pulling the manual page up like this: > > help(makeTranscriptDbFromGFF) > > Hope this helps, > > > Marc > > ________________________________ > > Helmholtz-Zentrum f?r Infektionsforschung GmbH | Inhoffenstra?e 7 | 38124 Braunschweig | www.helmholtz-hzi.de > Das HZI ist seit 2007 zertifiziertes Mitglied im "audit berufundfamilie" > > Vorsitzende des Aufsichtsrates: MinDir?in B?rbel Brumme-Bothe, Bundesministerium f?r Bildung und Forschung > Stellvertreter: R?diger Eichel, Abteilungsleiter Nieders?chsisches Ministerium f?r Wissenschaft und Kultur > Gesch?ftsf?hrung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA > Gesellschaft mit beschr?nkter Haftung (GmbH) > Sitz der Gesellschaft: Braunschweig > Handelsregister: Amtsgericht Braunschweig, HRB 477 > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.5 years ago Marc Carlson ★ 7.2k

Login before adding your answer.