bigBed support in rtracklayer

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 3.7 years ago

United States

I'm looking through the rtracklayer directory in SVN, specifically src, and finding common files shared between bigWig and bigBed. It looks like bigBed isn't supported yet, however, so I'm thinking of hacking up bigWig to do something similar. How close is/was bigBed support, and if anyone was/is working on it, are there any things I should look out for in order to save some time? The trackHub I'm most interested in using stores their bigData as bigBed, and at the moment it seems that import()'ing a bigBed isn't going to happen, so if I can take advantage of remote access, so much the better. Thanks in advance, --t -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

rtracklayer rtracklayer • 2.3k views

ADD COMMENT • link updated 7.9 years ago by koneill ▴ 30 • written 11.4 years ago by Tim Triche ★ 4.2k

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 3.7 years ago

United States

So, a trackHub as spec'd by UCSC ends up being a DAG. It may not be strictly directed, even; I'm pondering that. The file under a hub's root is title genomesall.txt and tells what trackDb is associated with a genome. Loading the corresponding trackDb file provides a bunch of information about the available data tracks for the hub, in a plain-text tab- delimited format sort of like MAGE-tab or what have you. Or perhaps more like Python code (spaces seem to matter). Whatever. The single (only?) most important bit of a trackHub's information is the bigDataUrl, but not entirely unimportant is the track name (duh), the parent track (which is where I'm going next) and the shortLabel/longLabel fields. The 'type' field is also handy since it usually indicates whether something is a bigWig, bigBed, BAM, or (something else) file. More questions: 1) try to store a trackHub's structure as a graph? It's enticing but I'm not sure how much I care, really I just want the data. 2) is this already done inside rtracklayer such that I can cannibalize existing code? I've only just started looking around. Thanks again, --t On Sun, Dec 2, 2012 at 3:23 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > I'm looking through the rtracklayer directory in SVN, specifically src, > and finding common files shared between bigWig and bigBed. It looks like > bigBed isn't supported yet, however, so I'm thinking of hacking up bigWig > to do something similar. How close is/was bigBed support, and if anyone > was/is working on it, are there any things I should look out for in order > to save some time? > > The trackHub I'm most interested in using stores their bigData as bigBed, > and at the moment it seems that import()'ing a bigBed isn't going to > happen, so if I can take advantage of remote access, so much the better. > > Thanks in advance, > > --t > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD COMMENT • link 11.4 years ago Tim Triche ★ 4.2k

0

Entering edit mode

On Sun, Dec 2, 2012 at 3:43 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > So, a trackHub as spec'd by UCSC ends up being a DAG. It may not be > strictly directed, even; I'm pondering that. The file under a hub's root > is title genomesall.txt and tells what trackDb is associated with a genome. > Loading the corresponding trackDb file provides a bunch of information > about the available data tracks for the hub, in a plain-text tab- delimited > format sort of like MAGE-tab or what have you. Or perhaps more like Python > code (spaces seem to matter). Whatever. > > The single (only?) most important bit of a trackHub's information is the > bigDataUrl, but not entirely unimportant is the track name (duh), the > parent track (which is where I'm going next) and the shortLabel/longLabel > fields. The 'type' field is also handy since it usually indicates whether > something is a bigWig, bigBed, BAM, or (something else) file. > > More questions: > > 1) try to store a trackHub's structure as a graph? It's enticing but I'm > not sure how much I care, really I just want the data. > Probably not as a general graph. A simple tree of objects should suffice. > 2) is this already done inside rtracklayer such that I can cannibalize > existing code? I've only just started looking around. > > You might check out the Quickload support (from the IGB project). It is very similar in structure and spirit to the UCSC track hub stuff. As far as bigBed, it's certainly possible, but it would require bringing quite a bit more of the Kent library into rtracklayer and writing the necessary wrappers. BigBED is more complicated than bigWig. For my use cases, using tabix on top of a BED has been sufficient. UCSC-specific stuff is probably all one will find only in bigBed. Thanks a lot for looking into this, Michael Thanks again, > > --t > > > > > On Sun, Dec 2, 2012 at 3:23 PM, Tim Triche, Jr. <tim.triche@gmail.com> >wrote: > > > I'm looking through the rtracklayer directory in SVN, specifically src, > > and finding common files shared between bigWig and bigBed. It looks like > > bigBed isn't supported yet, however, so I'm thinking of hacking up bigWig > > to do something similar. How close is/was bigBed support, and if anyone > > was/is working on it, are there any things I should look out for in order > > to save some time? > > > > The trackHub I'm most interested in using stores their bigData as bigBed, > > and at the moment it seems that import()'ing a bigBed isn't going to > > happen, so if I can take advantage of remote access, so much the better. > > > > Thanks in advance, > > > > --t > > > > > > > > -- > > *A model is a lie that helps you see the truth.* > > * > > * > > Howard Skipper< > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > > > > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper< > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Parsing a trackHub turns out to be relatively casual, at least if the hub is well behaved. (If it isn't, must not be worthwhile... ;-)) R> dim(hubTracks) [1] 11144 2 So that's a hub with 11144 tracks, which takes a while to parse (perhaps 2-3 minutes). There are a bunch of flat tracks right off the root of the hub, and then some tracks grouped by parents, which takes seconds to turn into a grouped list of URLs. I could rewrite the parse in C but I'm not going to; it would be better to cache the result and checksum the file. The DAG in this case is all of 2 levels deep. bigBed support is required -- this isn't a UCSC site but it is a trackHub that provides a *lot* of data which, for many if not most people, will be more valuable than the ENCODE data. I need to switch focus for a couple days but will get back to this and bolt on bigBed support. Looks like rtracklayer is just going to have to pork up a bit :-) One nice thing about bigBed and bigWig is that they do provide random access using HTTP headers, which is handy for just retrieving a particular range of data. Whether that will be as easy to work with in R as in the C client remains to be seen. Thanks, --t On Mon, Dec 3, 2012 at 9:36 AM, Michael Lawrence <lawrence.michael@gene.com>wrote: > On Sun, Dec 2, 2012 at 3:43 PM, Tim Triche, Jr. <tim.triche@gmail.com> >wrote: > > > So, a trackHub as spec'd by UCSC ends up being a DAG. It may not be > > strictly directed, even; I'm pondering that. The file under a hub's root > > is title genomesall.txt and tells what trackDb is associated with a > genome. > > Loading the corresponding trackDb file provides a bunch of information > > about the available data tracks for the hub, in a plain-text > tab-delimited > > format sort of like MAGE-tab or what have you. Or perhaps more like > Python > > code (spaces seem to matter). Whatever. > > > > The single (only?) most important bit of a trackHub's information is the > > bigDataUrl, but not entirely unimportant is the track name (duh), the > > parent track (which is where I'm going next) and the shortLabel/longLabel > > fields. The 'type' field is also handy since it usually indicates > whether > > something is a bigWig, bigBed, BAM, or (something else) file. > > > > More questions: > > > > 1) try to store a trackHub's structure as a graph? It's enticing but I'm > > not sure how much I care, really I just want the data. > > > > Probably not as a general graph. A simple tree of objects should suffice. > > > > 2) is this already done inside rtracklayer such that I can cannibalize > > existing code? I've only just started looking around. > > > > > You might check out the Quickload support (from the IGB project). It is > very similar in structure and spirit to the UCSC track hub stuff. > > As far as bigBed, it's certainly possible, but it would require bringing > quite a bit more of the Kent library into rtracklayer and writing the > necessary wrappers. BigBED is more complicated than bigWig. For my use > cases, using tabix on top of a BED has been sufficient. UCSC- specific stuff > is probably all one will find only in bigBed. > > Thanks a lot for looking into this, > Michael > > > Thanks again, > > > > --t > > > > > > > > > > On Sun, Dec 2, 2012 at 3:23 PM, Tim Triche, Jr. <tim.triche@gmail.com> > >wrote: > > > > > I'm looking through the rtracklayer directory in SVN, specifically src, > > > and finding common files shared between bigWig and bigBed. It looks > like > > > bigBed isn't supported yet, however, so I'm thinking of hacking up > bigWig > > > to do something similar. How close is/was bigBed support, and if > anyone > > > was/is working on it, are there any things I should look out for in > order > > > to save some time? > > > > > > The trackHub I'm most interested in using stores their bigData as > bigBed, > > > and at the moment it seems that import()'ing a bigBed isn't going to > > > happen, so if I can take advantage of remote access, so much the > better. > > > > > > Thanks in advance, > > > > > > --t > > > > > > > > > > > > -- > > > *A model is a lie that helps you see the truth.* > > > * > > > * > > > Howard Skipper< > > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > > > > > > > > > > > > -- > > *A model is a lie that helps you see the truth.* > > * > > * > > Howard Skipper< > > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago Tim Triche ★ 4.2k

0

Entering edit mode

koneill ▴ 30

@koneill-8031

Last seen 4.1 years ago

Canada

I guess this never went further? I see there is bigBed.h in the UCSC source in the rtracklayer package, but there doesn't seem to be actual bigBed functionality exposed at the R level.

For now, loading in bigBed to R seems to be impossible, and the only solution for working with bigBed files is to convert them to BED using UCSC's tools.

ADD COMMENT • link 7.9 years ago koneill ▴ 30

0

Entering edit mode

What was said 3.5 years ago still holds true today. It would require moving more of the Kent library into rtracklayer and writing the interface. It would be more work than the BigWig support, which was considerable. Since tabix-indexed BED files fulfill almost the same purpose, there hasn't been enough motivation for me to add BigBed support. If you're lucky enough to be motivated, I would consider a patch, but I fear that it would be a lot to maintain.

ADD REPLY • link 7.9 years ago Michael Lawrence ★ 11k

Login before adding your answer.