GenomicFeatures Reading GFF Efficiency
1
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 3 days ago
Australia
Using the command makeTranscriptDbFromGFF("gencode.v13.annotation.gtf", format = "gtf") takes many hours. It seems to take longest at the "Processing splicing information for gtf file." step. Is the code optimised ? -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia
• 760 views
ADD COMMENT
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 3 days ago
Australia
After nearly 2 days, it gave an error : Processing splicing information for gtf file. Error in `colnames<-`(`*tmp*`, value = c("exon_chrom", "exon_start", "exon_end", : 'names' attribute [9] must be the same length as the vector [6] In addition: Warning message: In .deduceExonRankings(exs) : Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName This is the 1.10.0 version of GenomicFeatures in R 2.15.1. Meanwhile, GENCODE version 14 is released, so you wouldn't have wanted my object of version 13 annotations, in the end.
ADD COMMENT
0
Entering edit mode
Hi Dario, Just a heads up that I am looking into this. Marc On 11/15/2012 06:00 PM, Dario Strbenac wrote: > After nearly 2 days, it gave an error : > > Processing splicing information for gtf file. > Error in `colnames<-`(`*tmp*`, value = c("exon_chrom", "exon_start", "exon_end", : > 'names' attribute [9] must be the same length as the vector [6] > In addition: Warning message: > In .deduceExonRankings(exs) : > Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName > > This is the 1.10.0 version of GenomicFeatures in R 2.15.1. > > Meanwhile, GENCODE version 14 is released, so you wouldn't have wanted my object of version 13 annotations, in the end. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Dario, I have found and killed a couple bugs with this parser and the fix should show up in the next couple days. I will work on better performance as well, but that is not in the latest update as I had to fix the bug 1st. But please be aware that a lot of the reason for the slow performance is because GTF files are not required to encode exon ranking information. In the 800+ megabyte file you were parsing, there only way to get exon rank information was by deducing it based on the provided coordinate positions. The fact that this file does not provide that information should probably concern you. Even though the inference can be done by the parser, it takes time to do and more importantly: it makes assumptions about your data. So it really should not be done if you can avoid it. This is why the function is throwing a warning about the fact that it is infering the exon rankings. So if you can get the data in another format, or at least from a GTF file that does provide the exon ranking information, that would be strongly recommended. Marc On 11/15/2012 06:00 PM, Dario Strbenac wrote: > After nearly 2 days, it gave an error : > > Processing splicing information for gtf file. > Error in `colnames<-`(`*tmp*`, value = c("exon_chrom", "exon_start", "exon_end", : > 'names' attribute [9] must be the same length as the vector [6] > In addition: Warning message: > In .deduceExonRankings(exs) : > Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName > > This is the 1.10.0 version of GenomicFeatures in R 2.15.1. > > Meanwhile, GENCODE version 14 is released, so you wouldn't have wanted my object of version 13 annotations, in the end. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives:http://news.gmane.org/gmane.science.biology.info rmatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6