Reading GTFs
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 11 weeks ago
United States
Hi, I'm wondering why I can't seem to stumble across any packages that deal with parsing and mapping GTF annotation data. In order to do some analysis with tiling array data, I need to incorporate annotation data for chromosome positions, which I've downloaded as a GTF (along with the latest genome release) from ensembl: http://www.ensembl.org/info/downloads/ftp_site.html I'm happy to whip up some rigged method of doing this myself, but I feel like others must be doing the same thing and I'm reinventing the wheel which might not be all that round by the time I'm done with it. Are there better ways to deal with genome annotation? I mentioned the AnnotationDbi in the subject line, because I feel like it provides something similar, but I don't think it's quite what I'm after. Thanks in advance, -steve
Annotation Annotation • 608 views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 11 weeks ago
United States
Howdy, On Jun 3, 2008, at 7:06 PM, Sean Davis wrote: > On Tue, Jun 3, 2008 at 6:24 PM, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi, >> >> I'm wondering why I can't seem to stumble across any packages that >> deal with >> parsing and mapping GTF annotation data. > > GTF is just tab-delimited text, yes? read.table() should eat that up. > You could also look at biomaRt, rtracklayer, and GenomeGraph > packages. Yes, they are just tab delimited and quite easy to read in w/ R's ability to slice and dice delimited text. I was just wondering if I was doing this "my own way" instead of taking advantage of something that's already there ... meaning, all this great work has gone into R/ Bioconductor that allows it to lay claim to the "batteries included" type motto. It's just that I feel like at times the batteries are somewhere on the top shelf and easy to miss :-) With packages that are concerned with setting up meta data for chip information, and probe mappings/whatever, I was just wondering if I should be attacking the problem with a certain bent that would be able to be used again in some already existing framework is all. That said, thanks for the pointers to your suggested packages, and I'll look through them more. >> In order to do some analysis with tiling array data, I need to >> incorporate >> annotation data for chromosome positions > > You might look at the tilingArray package. Yeah ... I've been in and out of that package. It's handy to learn from, for sure, and I'm trying to reuse as much of it as possible. >> I'm happy to whip up some rigged method of doing this myself, but I >> feel >> like others must be doing the same thing and I'm reinventing the >> wheel which >> might not be all that round by the time I'm done with it. >> >> Are there better ways to deal with genome annotation? I mentioned the >> AnnotationDbi in the subject line, because I feel like it provides >> something >> similar, but I don't think it's quite what I'm after. > > What do you actually want to do? The specifics may be relevant. Currently I'm trying to gather a set of probes that fit a certain set of criteria, such as their genomic annotation (intergenic vs exonic, etc), number of hits to its genome, etc. I have all the information for these from a combination of reblasting the probes to the genome (as suggested by W. Huber and others) and the GTF file and trying to store this information in a similar env that the tilingArray and Ringo packages use. Later I'll probably want to go the other way by having a set of interesting probes and ensuring a quick way I can get the pertinent information for them to send them through some other bioconductor functionality, like one of the go* packages (for example). Anyway .. thanks for the reply. -steve
ADD COMMENT
0
Entering edit mode
On Tue, Jun 3, 2008 at 11:20 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Howdy, > > On Jun 3, 2008, at 7:06 PM, Sean Davis wrote: > >> On Tue, Jun 3, 2008 at 6:24 PM, Steve Lianoglou >> <mailinglist.honeypot at="" gmail.com=""> wrote: >>> >>> Hi, >>> >>> I'm wondering why I can't seem to stumble across any packages that deal >>> with >>> parsing and mapping GTF annotation data. >> >> GTF is just tab-delimited text, yes? read.table() should eat that up. >> You could also look at biomaRt, rtracklayer, and GenomeGraph >> packages. > > Yes, they are just tab delimited and quite easy to read in w/ R's ability to > slice and dice delimited text. I was just wondering if I was doing this "my > own way" instead of taking advantage of something that's already there ... > meaning, all this great work has gone into R/Bioconductor that allows it to > lay claim to the "batteries included" type motto. It's just that I feel like > at times the batteries are somewhere on the top shelf and easy to miss :-) > > With packages that are concerned with setting up meta data for chip > information, and probe mappings/whatever, I was just wondering if I should > be attacking the problem with a certain bent that would be able to be used > again in some already existing framework is all. > > That said, thanks for the pointers to your suggested packages, and I'll look > through them more. > >>> In order to do some analysis with tiling array data, I need to >>> incorporate >>> annotation data for chromosome positions >> >> You might look at the tilingArray package. > > Yeah ... I've been in and out of that package. It's handy to learn from, for > sure, and I'm trying to reuse as much of it as possible. > >>> I'm happy to whip up some rigged method of doing this myself, but I feel >>> like others must be doing the same thing and I'm reinventing the wheel >>> which >>> might not be all that round by the time I'm done with it. >>> >>> Are there better ways to deal with genome annotation? I mentioned the >>> AnnotationDbi in the subject line, because I feel like it provides >>> something >>> similar, but I don't think it's quite what I'm after. >> >> What do you actually want to do? The specifics may be relevant. > > Currently I'm trying to gather a set of probes that fit a certain set of > criteria, such as their genomic annotation (intergenic vs exonic, etc), > number of hits to its genome, etc. I have all the information for these from > a combination of reblasting the probes to the genome (as suggested by W. > Huber and others) and the GTF file and trying to store this information in > a similar env that the tilingArray and Ringo packages use. > > Later I'll probably want to go the other way by having a set of interesting > probes and ensuring a quick way I can get the pertinent information for them > to send them through some other bioconductor functionality, like one of the > go* packages (for example). Steve, If you want to set up this kinda thing, I would suggest sticking with RSQLite rather than environments. If you have tables of blast results and tables of GTF annotation, you could load those directly now and do queries to get the probes of interest on the fly, as a simple example. Sean
ADD REPLY

Login before adding your answer.

Traffic: 1180 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6