Search
Question: Error running makeTranscriptDbFromGFF in GenomicFeatures
0
gravatar for Jon Bråte
3.2 years ago by
Jon Bråte130
Norway
Jon Bråte130 wrote:
Hi list, I am trying to create a TranscriptDb using GenomicFeatures, but I get an error message. I think there might be something wrong with my gff- file, but I am not sure. I also tried converting the gff-file to gtf, but also get an error. My goal with this is to plot the number of exons per gene. Code: #GFF-file > txdb = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter/RNA- project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and- expression-levels/cds.gb.gff3", + format = "gff") extracting transcript information Extracting gene IDs extracting transcript information Processing splicing information for gff3 file. Deducing exon rank from relative coordinates provided Warning message: In .deduceExonRankings(exs, format = "gff") : Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName Error in unlist(mapply(.assignRankings, starts, strands)) : error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in (function (starts, strands) : Exon rank inference cannot accomodate trans-splicing. #GTF-file > txdbGTF = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter /RNA-project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and- expression-levels/cds.gb.gtf", + format = "gtf") Error in .parse_attrCol(attrCol, file, colnames) : Some attributes do not conform to 'tag value' format > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 [5] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BBmisc_1.7 BSgenome_1.32.0 BatchJobs_1.3 BiocParallel_0.6.1 [5] Biostrings_2.32.1 DBI_0.2-7 GenomicAlignments_1.0.5 RCurl_1.95-4.3 [9] RSQLite_0.11.4 Rcpp_0.11.2 Rsamtools_1.16.1 XML_3.98-1.1 [13] XVector_0.4.0 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 [17] checkmate_1.3 codetools_0.2-9 digest_0.6.4 fail_1.2 [21] foreach_1.4.2 iterators_1.0.7 rtracklayer_1.24.2 sendmailR_1.1-2 [25] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 ---------------------------------------------------------------- Jon Br?te Section for Genetics and Evolutionary Biology (EVOGENE) Department of Biosciences University of Oslo P.B. 1066 Blindern N-0316, Norway Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""> Phone: 922 44 582 Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<http: mn.uio.="" no="" ibv="" english="" people="" aca="" jonbra="" index.html=""> [[alternative HTML version deleted]]
ADD COMMENTlink modified 3.2 years ago by Michael Lawrence9.8k • written 3.2 years ago by Jon Bråte130
0
gravatar for Michael Lawrence
3.2 years ago by
United States
Michael Lawrence9.8k wrote:
I think the error messages are a pretty good clue to what's wrong here. The TxDb needs to know the "rank" (the order within the transcript) of each exon. It tries to infer this from the positions, but this obviously fails when exons within the same transcript fall on multiple chromosomes (trans-splicing). When parsing the GTF, there is some problem with the format. You could figure out the offending line(s) by cutting the file in half recursively until the error goes away. If you want, you could put the files up on dropbox, and I'll take a look at them. Michael On Thu, Sep 4, 2014 at 3:23 AM, Jon Br?te <jon.brate at="" ibv.uio.no=""> wrote: > Hi list, > > I am trying to create a TranscriptDb using GenomicFeatures, but I get an > error message. I think there might be something wrong with my gff- file, but > I am not sure. I also tried converting the gff-file to gtf, but also get an > error. > > My goal with this is to plot the number of exons per gene. > > Code: > > #GFF-file > > txdb = makeTranscriptDbFromGFF(file = > "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from- Bergen/gff-files-and-expression-levels/cds.gb.gff3", > + format = "gff") > extracting transcript information > Extracting gene IDs > extracting transcript information > Processing splicing information for gff3 file. > Deducing exon rank from relative coordinates provided > Warning message: > In .deduceExonRankings(exs, format = "gff") : > Infering Exon Rankings. If this is not what you expected, then please > be sure that you have provided a valid attribute for exonRankAttributeName > Error in unlist(mapply(.assignRankings, starts, strands)) : > error in evaluating the argument 'x' in selecting a method for function > 'unlist': Error in (function (starts, strands) : > Exon rank inference cannot accomodate trans-splicing. > > #GTF-file > > txdbGTF = makeTranscriptDbFromGFF(file = > "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from- Bergen/gff-files-and-expression-levels/cds.gb.gtf", > + format = "gtf") > Error in .parse_attrCol(attrCol, file, colnames) : > Some attributes do not conform to 'tag value' format > > > > sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 > GenomicRanges_1.16.3 > [5] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BBmisc_1.7 BSgenome_1.32.0 BatchJobs_1.3 > BiocParallel_0.6.1 > [5] Biostrings_2.32.1 DBI_0.2-7 > GenomicAlignments_1.0.5 RCurl_1.95-4.3 > [9] RSQLite_0.11.4 Rcpp_0.11.2 Rsamtools_1.16.1 > XML_3.98-1.1 > [13] XVector_0.4.0 biomaRt_2.20.0 bitops_1.0-6 > brew_1.0-6 > [17] checkmate_1.3 codetools_0.2-9 digest_0.6.4 > fail_1.2 > [21] foreach_1.4.2 iterators_1.0.7 rtracklayer_1.24.2 > sendmailR_1.1-2 > [25] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 > zlibbioc_1.10.0 > > > ---------------------------------------------------------------- > Jon Br?te > > Section for Genetics and Evolutionary Biology (EVOGENE) > Department of Biosciences > University of Oslo > P.B. 1066 Blindern > N-0316, Norway > Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""> > Phone: 922 44 582 > Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html< > http://mn.uio.no/ibv/english/people/aca/jonbra/index.html> > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 3.2 years ago by Michael Lawrence9.8k
Thanks Michael, Yes you are right. Many of the transcripts come from multiple chromosomes (or scaffolds because this is a poorly assembled genome and that is probably why there is so much trans-splicing). I think removing the trans-spliced genes removes too many genes so I will try to do this in another way. Thank you, Jon On 4. sep. 2014, at 13:56, Michael Lawrence wrote: I think the error messages are a pretty good clue to what's wrong here. The TxDb needs to know the "rank" (the order within the transcript) of each exon. It tries to infer this from the positions, but this obviously fails when exons within the same transcript fall on multiple chromosomes (trans-splicing). When parsing the GTF, there is some problem with the format. You could figure out the offending line(s) by cutting the file in half recursively until the error goes away. If you want, you could put the files up on dropbox, and I'll take a look at them. Michael On Thu, Sep 4, 2014 at 3:23 AM, Jon Br?te <jon.brate at="" ibv.uio.no<mailto:jon.brate="" at="" ibv.uio.no="">> wrote: Hi list, I am trying to create a TranscriptDb using GenomicFeatures, but I get an error message. I think there might be something wrong with my gff- file, but I am not sure. I also tried converting the gff-file to gtf, but also get an error. My goal with this is to plot the number of exons per gene. Code: #GFF-file > txdb = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter/RNA- project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and- expression-levels/cds.gb.gff3", + format = "gff") extracting transcript information Extracting gene IDs extracting transcript information Processing splicing information for gff3 file. Deducing exon rank from relative coordinates provided Warning message: In .deduceExonRankings(exs, format = "gff") : Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName Error in unlist(mapply(.assignRankings, starts, strands)) : error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in (function (starts, strands) : Exon rank inference cannot accomodate trans-splicing. #GTF-file > txdbGTF = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter /RNA-project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and- expression-levels/cds.gb.gtf", + format = "gtf") Error in .parse_attrCol(attrCol, file, colnames) : Some attributes do not conform to 'tag value' format > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 [5] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BBmisc_1.7 BSgenome_1.32.0 BatchJobs_1.3 BiocParallel_0.6.1 [5] Biostrings_2.32.1 DBI_0.2-7 GenomicAlignments_1.0.5 RCurl_1.95-4.3 [9] RSQLite_0.11.4 Rcpp_0.11.2 Rsamtools_1.16.1 XML_3.98-1.1 [13] XVector_0.4.0 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 [17] checkmate_1.3 codetools_0.2-9 digest_0.6.4 fail_1.2 [21] foreach_1.4.2 iterators_1.0.7 rtracklayer_1.24.2 sendmailR_1.1-2 [25] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 ---------------------------------------------------------------- Jon Br?te Section for Genetics and Evolutionary Biology (EVOGENE) Department of Biosciences University of Oslo P.B. 1066 Blindern N-0316, Norway Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""><mailto:jon.brate at="" ibv.uio.no<mailto:jon.brate="" at="" ibv.uio.no="">> Phone: 922 44 582 Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<http: mn.uio.="" no="" ibv="" english="" people="" aca="" jonbra="" index.html=""><http: mn.uio.no="" ibv="" engl="" ish="" people="" aca="" jonbra="" index.html=""> [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------- Jon Br?te Section for Genetics and Evolutionary Biology (EVOGENE) Department of Biosciences University of Oslo P.B. 1066 Blindern N-0316, Norway Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""> Phone: 922 44 582 Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<http: mn.uio.="" no="" ibv="" english="" people="" aca="" jonbra="" index.html=""> [[alternative HTML version deleted]]
ADD REPLYlink written 3.2 years ago by Jon Bråte130
I would recommend calling gr <- import(gff) And then subset for the type being exon and tabulate by parent. Michael On Thu, Sep 4, 2014 at 8:14 AM, Jon Br?te <jon.brate at="" ibv.uio.no=""> wrote: > Thanks Michael, > > Yes you are right. Many of the transcripts come from multiple > chromosomes (or scaffolds because this is a poorly assembled genome and > that is probably why there is so much trans-splicing). > > I think removing the trans-spliced genes removes too many genes so I > will try to do this in another way. > > Thank you, > > Jon > > > On 4. sep. 2014, at 13:56, Michael Lawrence wrote: > > I think the error messages are a pretty good clue to what's wrong here. > The TxDb needs to know the "rank" (the order within the transcript) of each > exon. It tries to infer this from the positions, but this obviously fails > when exons within the same transcript fall on multiple chromosomes > (trans-splicing). When parsing the GTF, there is some problem with the > format. You could figure out the offending line(s) by cutting the file in > half recursively until the error goes away. > > If you want, you could put the files up on dropbox, and I'll take a look > at them. > > Michael > > > > On Thu, Sep 4, 2014 at 3:23 AM, Jon Br?te <jon.brate at="" ibv.uio.no=""> wrote: > >> Hi list, >> >> I am trying to create a TranscriptDb using GenomicFeatures, but I get an >> error message. I think there might be something wrong with my gff- file, but >> I am not sure. I also tried converting the gff-file to gtf, but also get an >> error. >> >> My goal with this is to plot the number of exons per gene. >> >> Code: >> >> #GFF-file >> > txdb = makeTranscriptDbFromGFF(file = >> "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from- Bergen/gff-files-and-expression-levels/cds.gb.gff3", >> + format = "gff") >> extracting transcript information >> Extracting gene IDs >> extracting transcript information >> Processing splicing information for gff3 file. >> Deducing exon rank from relative coordinates provided >> Warning message: >> In .deduceExonRankings(exs, format = "gff") : >> Infering Exon Rankings. If this is not what you expected, then please >> be sure that you have provided a valid attribute for exonRankAttributeName >> Error in unlist(mapply(.assignRankings, starts, strands)) : >> error in evaluating the argument 'x' in selecting a method for function >> 'unlist': Error in (function (starts, strands) : >> Exon rank inference cannot accomodate trans-splicing. >> >> #GTF-file >> > txdbGTF = makeTranscriptDbFromGFF(file = >> "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from- Bergen/gff-files-and-expression-levels/cds.gb.gtf", >> + format = "gtf") >> Error in .parse_attrCol(attrCol, file, colnames) : >> Some attributes do not conform to 'tag value' format >> >> >> > sessionInfo() >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 >> GenomicRanges_1.16.3 >> [5] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 >> >> loaded via a namespace (and not attached): >> [1] BBmisc_1.7 BSgenome_1.32.0 BatchJobs_1.3 >> BiocParallel_0.6.1 >> [5] Biostrings_2.32.1 DBI_0.2-7 >> GenomicAlignments_1.0.5 RCurl_1.95-4.3 >> [9] RSQLite_0.11.4 Rcpp_0.11.2 Rsamtools_1.16.1 >> XML_3.98-1.1 >> [13] XVector_0.4.0 biomaRt_2.20.0 bitops_1.0-6 >> brew_1.0-6 >> [17] checkmate_1.3 codetools_0.2-9 digest_0.6.4 >> fail_1.2 >> [21] foreach_1.4.2 iterators_1.0.7 rtracklayer_1.24.2 >> sendmailR_1.1-2 >> [25] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 >> zlibbioc_1.10.0 >> >> >> ---------------------------------------------------------------- >> Jon Br?te >> >> Section for Genetics and Evolutionary Biology (EVOGENE) >> Department of Biosciences >> University of Oslo >> P.B. 1066 Blindern >> N-0316, Norway >> Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""> >> Phone: 922 44 582 >> Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html< >> http://mn.uio.no/ibv/english/people/aca/jonbra/index.html> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > > ---------------------------------------------------------------- > Jon Br?te > > Section for Genetics and Evolutionary Biology (EVOGENE) > Department of Biosciences > University of Oslo > P.B. 1066 Blindern > N-0316, Norway > Email: jon.brate at ibv.uio.no > Phone: 922 44 582 > Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html > > > > > [[alternative HTML version deleted]]
ADD REPLYlink written 3.2 years ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 238 users visited in the last hour