makeTranscriptDbFromGFF fails on NCBI Bacteria genomes
1
0
Entering edit mode
刘鹏飞 ▴ 80
@-6181
Last seen 10.3 years ago
Hi, Now I am want to use goseq for doing GO analysis of my bacteria RNA- seq data, but now I am stucked by the following issue: 1, makeTranscriptDbFromGFF First download gff from NCBI( ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Pelotomaculum_thermopropio nicum_SI_uid58877/NC_009454.gff), try to use it to build TranscriptDb in order to get the lengthData which needed to import into goseq >library(GenomicFeatures) >gffFile="/home/liupf/NC_009454.gff" >txdb <- makeTranscriptDbFromGFF(file="NC_009454.gff", format="gff3", dataSource=NA, species="Pelotomaculum thermopropionicum") failed: Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : No Transcript information found in gff file In addition: Warning message: In as.data.frame(data, stringsAsFactors = FALSE) : Arguments in '...' ignored Then I follow the discussion in the mailinglist: makeTranscriptDbFromGFF fails on NCBI Bacteria genomes, use, the following way: $perl bp_genbank2gff3.pl NC_009454.gbk >txdb <- makeTranscriptDbFromGFF(file="NC_009454.gbk.gff", format="gff3", dataSource=NA, species="Pelotomaculum thermopropionicum") Failed : Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : Unexpected transcript duplicates In addition: Warning message: In as.data.frame(data, stringsAsFactors = FALSE) : Arguments in '...' ignored I know that for the first way, there is no mRNA and exon information, as mentioned here ( https://stat.ethz.ch/pipermail/bioconductor/2013-June/053146.html) and add following like things would work: NC_011025.1 RefSeq mRNA 107 1471 . + 0 ID=mRNA0;Parent=gene0 NC_011025.1 RefSeq exon 107 1471 . + 0 ID=exon0;Parent=mRNA0 Ouoted: "Taking a quick look at the GFF in question ... I don't see any mRNA features.... they appear to be implicit .... which is not well formed GFF3 (c.f. http://www.sequenceontology.org/gff3.shtml) That is your first problem. In particular, the first gene in a file by itself is rejected. Adding the mRNA and exon lines as below, and the first gene is now accepted by makeTranscriptDbFromGFF" But by doing so, The TranscriptDb only contain the first gene. The second way may work on R-2.15 or before, but failed on R 3.0.0 ( https://stat.ethz.ch/pipermail/bioconductor/2013-June/053148.html) and here I got same error on R 3.0.2 this topic has been discussed before, but for bacteria genome, it still seems unsolved, is there any solutions on this issue now? Really thanks for this, for my work was stucked here now! sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 [4] GenomicRanges_1.14.1 XVector_0.2.0 IRanges_1.20.0 [7] BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] biomaRt_2.18.0 Biostrings_2.30.0 bitops_1.0-6 BSgenome_1.30.0 [5] DBI_0.2-7 RCurl_1.95-4.1 Rsamtools_1.14.1 RSQLite_0.11.4 [9] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2 XML_3.98-1.1 [13] zlibbioc_1.8.0 -- Pengfei Liu, PhD Candidate Lab of Microbial Ecology College of Resources and Environmental Sciences China Agricultural University No.2 Yuanmingyuanxilu, Beijing, 100193 P.R. China Tel: +86-10-62731358 Fax: +86-10-62731016 E-mail: liupfskygre@gmail.com [[alternative HTML version deleted]]
GO TranscriptDb goseq genomes GO TranscriptDb goseq genomes • 1.8k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.4 years ago
United States
Hi, So you might want to look at the manual page for the latest release of the makeTranscriptDbFromGFF() function. ?makeTranscriptDbFromGFF There you will see that there is a useGenesAsTranscripts argument that you can try. That *might* help, I added it in the hopes that it could help someone in a position like your own. But it's hard to know for sure as it depends a lot on how bad of shape your file is in. GFF files are really not idea for transcriptomics since the file specification is a little bit too vague and general. But I am trying to accommodate them as much as I can. Marc On 10/21/2013 10:46 PM, 刘鹏飞 wrote: > Hi, > Now I am want to use goseq for doing GO analysis of my bacteria > RNA-seq data, but now I am stucked by the following issue: > 1, makeTranscriptDbFromGFF > First download gff from NCBI( > ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Pelotomaculum_thermoprop ionicum_SI_uid58877/NC_009454.gff), > try to use it to build TranscriptDb in order to get the lengthData > which needed to import into goseq > > >library(GenomicFeatures) > > >gffFile="/home/liupf/NC_009454.gff" > > >txdb <- makeTranscriptDbFromGFF(file="NC_009454.gff", format="gff3", > dataSource=NA, species="Pelotomaculum thermopropionicum") > > failed: > > Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : > > No Transcript information found in gff file > > In addition: Warning message: > > In as.data.frame(data, stringsAsFactors = FALSE) : > > Arguments in '...' ignored > > > Then I follow the discussion in the mailinglist: > makeTranscriptDbFromGFF fails on NCBI Bacteria genomes, use, the > following way: > > $perl bp_genbank2gff3.pl <http: bp_genbank2gff3.pl=""> NC_009454.gbk > > >txdb <- makeTranscriptDbFromGFF(file="NC_009454.gbk.gff", > format="gff3", dataSource=NA, species="Pelotomaculum thermopropionicum") > > Failed : > > Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : > Unexpected transcript duplicates > In addition: Warning message: > In as.data.frame(data, stringsAsFactors = FALSE) : > Arguments in '...' ignored > > I know that for the first way, there is no mRNA and exon information, > as mentioned here > (https://stat.ethz.ch/pipermail/bioconductor/2013-June/053146.html) > and add following like things would work: > > NC_011025.1 RefSeq mRNA 107 1471 . + 0 ID=mRNA0;Parent=gene0 > NC_011025.1 RefSeq exon 107 1471 . + 0 ID=exon0;Parent=mRNA0 > Ouoted: "Taking a quick look at the GFF in question ... I don't see any mRNA features.... they appear to be implicit .... which is not well formed GFF3 (c.f.http://www.sequenceontology.org/gff3.shtml) > > That is your first problem. In particular, the first gene in a file by itself is rejected. Adding the mRNA and exon lines as below, and the first gene is now accepted by makeTranscriptDbFromGFF" > > But by doing so, The TranscriptDb only contain the first gene. > > The second way may work on R-2.15 or before, but failed on R 3.0.0 > (https://stat.ethz.ch/pipermail/bioconductor/2013-June/053148.html) > and here I got same error on R 3.0.2 > > this topic has been discussed before, but for bacteria genome, it > still seems unsolved, > > is there any solutions on this issue now? > > Really thanks for this, for my work was stucked here now! > > > sessionInfo() > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > other attached packages: > > [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 > > [4] GenomicRanges_1.14.1 XVector_0.2.0 IRanges_1.20.0 > > [7] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > > [1] biomaRt_2.18.0 Biostrings_2.30.0 bitops_1.0-6 BSgenome_1.30.0 > > [5] DBI_0.2-7 RCurl_1.95-4.1 Rsamtools_1.14.1 RSQLite_0.11.4 > > [9] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2 XML_3.98-1.1 > > [13] zlibbioc_1.8.0 > > > > -- > Pengfei Liu, PhD Candidate > > Lab of Microbial Ecology > College of Resources and Environmental Sciences > China Agricultural University > No.2 Yuanmingyuanxilu, Beijing, 100193 > P.R. China > > Tel: +86-10-62731358 > Fax: +86-10-62731016 > > E-mail: liupfskygre@gmail.com <mailto:liupfskygre@gmail.com> > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 433 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6