Search
Question: makeTranscriptDbFromGFF fails on NCBI Bacteria genomes
0
gravatar for 刘鹏飞
4.1 years ago by
刘鹏飞80
刘鹏飞80 wrote:
Hi, Now I am want to use goseq for doing GO analysis of my bacteria RNA- seq data, but now I am stucked by the following issue: 1, makeTranscriptDbFromGFF First download gff from NCBI( ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Pelotomaculum_thermopropio nicum_SI_uid58877/NC_009454.gff), try to use it to build TranscriptDb in order to get the lengthData which needed to import into goseq >library(GenomicFeatures) >gffFile="/home/liupf/NC_009454.gff" >txdb <- makeTranscriptDbFromGFF(file="NC_009454.gff", format="gff3", dataSource=NA, species="Pelotomaculum thermopropionicum") failed: Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : No Transcript information found in gff file In addition: Warning message: In as.data.frame(data, stringsAsFactors = FALSE) : Arguments in '...' ignored Then I follow the discussion in the mailinglist: makeTranscriptDbFromGFF fails on NCBI Bacteria genomes, use, the following way: $perl bp_genbank2gff3.pl NC_009454.gbk >txdb <- makeTranscriptDbFromGFF(file="NC_009454.gbk.gff", format="gff3", dataSource=NA, species="Pelotomaculum thermopropionicum") Failed : Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : Unexpected transcript duplicates In addition: Warning message: In as.data.frame(data, stringsAsFactors = FALSE) : Arguments in '...' ignored I know that for the first way, there is no mRNA and exon information, as mentioned here ( https://stat.ethz.ch/pipermail/bioconductor/2013-June/053146.html) and add following like things would work: NC_011025.1 RefSeq mRNA 107 1471 . + 0 ID=mRNA0;Parent=gene0 NC_011025.1 RefSeq exon 107 1471 . + 0 ID=exon0;Parent=mRNA0 Ouoted: "Taking a quick look at the GFF in question ... I don't see any mRNA features.... they appear to be implicit .... which is not well formed GFF3 (c.f. http://www.sequenceontology.org/gff3.shtml) That is your first problem. In particular, the first gene in a file by itself is rejected. Adding the mRNA and exon lines as below, and the first gene is now accepted by makeTranscriptDbFromGFF" But by doing so, The TranscriptDb only contain the first gene. The second way may work on R-2.15 or before, but failed on R 3.0.0 ( https://stat.ethz.ch/pipermail/bioconductor/2013-June/053148.html) and here I got same error on R 3.0.2 this topic has been discussed before, but for bacteria genome, it still seems unsolved, is there any solutions on this issue now? Really thanks for this, for my work was stucked here now! sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 [4] GenomicRanges_1.14.1 XVector_0.2.0 IRanges_1.20.0 [7] BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] biomaRt_2.18.0 Biostrings_2.30.0 bitops_1.0-6 BSgenome_1.30.0 [5] DBI_0.2-7 RCurl_1.95-4.1 Rsamtools_1.14.1 RSQLite_0.11.4 [9] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2 XML_3.98-1.1 [13] zlibbioc_1.8.0 -- Pengfei Liu, PhD Candidate Lab of Microbial Ecology College of Resources and Environmental Sciences China Agricultural University No.2 Yuanmingyuanxilu, Beijing, 100193 P.R. China Tel: +86-10-62731358 Fax: +86-10-62731016 E-mail: liupfskygre@gmail.com [[alternative HTML version deleted]]
ADD COMMENTlink modified 4.1 years ago by Marc Carlson7.2k • written 4.1 years ago by 刘鹏飞80
0
gravatar for Marc Carlson
4.1 years ago by
Marc Carlson7.2k
United States
Marc Carlson7.2k wrote:
Hi, So you might want to look at the manual page for the latest release of the makeTranscriptDbFromGFF() function. ?makeTranscriptDbFromGFF There you will see that there is a useGenesAsTranscripts argument that you can try. That *might* help, I added it in the hopes that it could help someone in a position like your own. But it's hard to know for sure as it depends a lot on how bad of shape your file is in. GFF files are really not idea for transcriptomics since the file specification is a little bit too vague and general. But I am trying to accommodate them as much as I can. Marc On 10/21/2013 10:46 PM, 刘鹏飞 wrote: > Hi, > Now I am want to use goseq for doing GO analysis of my bacteria > RNA-seq data, but now I am stucked by the following issue: > 1, makeTranscriptDbFromGFF > First download gff from NCBI( > ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Pelotomaculum_thermoprop ionicum_SI_uid58877/NC_009454.gff), > try to use it to build TranscriptDb in order to get the lengthData > which needed to import into goseq > > >library(GenomicFeatures) > > >gffFile="/home/liupf/NC_009454.gff" > > >txdb <- makeTranscriptDbFromGFF(file="NC_009454.gff", format="gff3", > dataSource=NA, species="Pelotomaculum thermopropionicum") > > failed: > > Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : > > No Transcript information found in gff file > > In addition: Warning message: > > In as.data.frame(data, stringsAsFactors = FALSE) : > > Arguments in '...' ignored > > > Then I follow the discussion in the mailinglist: > makeTranscriptDbFromGFF fails on NCBI Bacteria genomes, use, the > following way: > > $perl bp_genbank2gff3.pl <http: bp_genbank2gff3.pl=""> NC_009454.gbk > > >txdb <- makeTranscriptDbFromGFF(file="NC_009454.gbk.gff", > format="gff3", dataSource=NA, species="Pelotomaculum thermopropionicum") > > Failed : > > Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : > Unexpected transcript duplicates > In addition: Warning message: > In as.data.frame(data, stringsAsFactors = FALSE) : > Arguments in '...' ignored > > I know that for the first way, there is no mRNA and exon information, > as mentioned here > (https://stat.ethz.ch/pipermail/bioconductor/2013-June/053146.html) > and add following like things would work: > > NC_011025.1 RefSeq mRNA 107 1471 . + 0 ID=mRNA0;Parent=gene0 > NC_011025.1 RefSeq exon 107 1471 . + 0 ID=exon0;Parent=mRNA0 > Ouoted: "Taking a quick look at the GFF in question ... I don't see any mRNA features.... they appear to be implicit .... which is not well formed GFF3 (c.f.http://www.sequenceontology.org/gff3.shtml) > > That is your first problem. In particular, the first gene in a file by itself is rejected. Adding the mRNA and exon lines as below, and the first gene is now accepted by makeTranscriptDbFromGFF" > > But by doing so, The TranscriptDb only contain the first gene. > > The second way may work on R-2.15 or before, but failed on R 3.0.0 > (https://stat.ethz.ch/pipermail/bioconductor/2013-June/053148.html) > and here I got same error on R 3.0.2 > > this topic has been discussed before, but for bacteria genome, it > still seems unsolved, > > is there any solutions on this issue now? > > Really thanks for this, for my work was stucked here now! > > > sessionInfo() > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > other attached packages: > > [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 > > [4] GenomicRanges_1.14.1 XVector_0.2.0 IRanges_1.20.0 > > [7] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > > [1] biomaRt_2.18.0 Biostrings_2.30.0 bitops_1.0-6 BSgenome_1.30.0 > > [5] DBI_0.2-7 RCurl_1.95-4.1 Rsamtools_1.14.1 RSQLite_0.11.4 > > [9] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2 XML_3.98-1.1 > > [13] zlibbioc_1.8.0 > > > > -- > Pengfei Liu, PhD Candidate > > Lab of Microbial Ecology > College of Resources and Environmental Sciences > China Agricultural University > No.2 Yuanmingyuanxilu, Beijing, 100193 > P.R. China > > Tel: +86-10-62731358 > Fax: +86-10-62731016 > > E-mail: liupfskygre@gmail.com <mailto:liupfskygre@gmail.com> > [[alternative HTML version deleted]]
ADD COMMENTlink written 4.1 years ago by Marc Carlson7.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 378 users visited in the last hour