I'm looking through the rtracklayer directory in SVN, specifically
src, and
finding common files shared between bigWig and bigBed. It looks like
bigBed isn't supported yet, however, so I'm thinking of hacking up
bigWig
to do something similar. How close is/was bigBed support, and if
anyone
was/is working on it, are there any things I should look out for in
order
to save some time?
The trackHub I'm most interested in using stores their bigData as
bigBed,
and at the moment it seems that import()'ing a bigBed isn't going to
happen, so if I can take advantage of remote access, so much the
better.
Thanks in advance,
--t
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
So, a trackHub as spec'd by UCSC ends up being a DAG. It may not be
strictly directed, even; I'm pondering that. The file under a hub's
root
is title genomesall.txt and tells what trackDb is associated with a
genome.
Loading the corresponding trackDb file provides a bunch of
information
about the available data tracks for the hub, in a plain-text tab-
delimited
format sort of like MAGE-tab or what have you. Or perhaps more like
Python
code (spaces seem to matter). Whatever.
The single (only?) most important bit of a trackHub's information is
the
bigDataUrl, but not entirely unimportant is the track name (duh), the
parent track (which is where I'm going next) and the
shortLabel/longLabel
fields. The 'type' field is also handy since it usually indicates
whether
something is a bigWig, bigBed, BAM, or (something else) file.
More questions:
1) try to store a trackHub's structure as a graph? It's enticing but
I'm
not sure how much I care, really I just want the data.
2) is this already done inside rtracklayer such that I can cannibalize
existing code? I've only just started looking around.
Thanks again,
--t
On Sun, Dec 2, 2012 at 3:23 PM, Tim Triche, Jr.
<tim.triche@gmail.com>wrote:
> I'm looking through the rtracklayer directory in SVN, specifically
src,
> and finding common files shared between bigWig and bigBed. It looks
like
> bigBed isn't supported yet, however, so I'm thinking of hacking up
bigWig
> to do something similar. How close is/was bigBed support, and if
anyone
> was/is working on it, are there any things I should look out for in
order
> to save some time?
>
> The trackHub I'm most interested in using stores their bigData as
bigBed,
> and at the moment it seems that import()'ing a bigBed isn't going to
> happen, so if I can take advantage of remote access, so much the
better.
>
> Thanks in advance,
>
> --t
>
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
On Sun, Dec 2, 2012 at 3:43 PM, Tim Triche, Jr.
<tim.triche@gmail.com>wrote:
> So, a trackHub as spec'd by UCSC ends up being a DAG. It may not be
> strictly directed, even; I'm pondering that. The file under a hub's
root
> is title genomesall.txt and tells what trackDb is associated with a
genome.
> Loading the corresponding trackDb file provides a bunch of
information
> about the available data tracks for the hub, in a plain-text tab-
delimited
> format sort of like MAGE-tab or what have you. Or perhaps more like
Python
> code (spaces seem to matter). Whatever.
>
> The single (only?) most important bit of a trackHub's information is
the
> bigDataUrl, but not entirely unimportant is the track name (duh),
the
> parent track (which is where I'm going next) and the
shortLabel/longLabel
> fields. The 'type' field is also handy since it usually indicates
whether
> something is a bigWig, bigBed, BAM, or (something else) file.
>
> More questions:
>
> 1) try to store a trackHub's structure as a graph? It's enticing
but I'm
> not sure how much I care, really I just want the data.
>
Probably not as a general graph. A simple tree of objects should
suffice.
> 2) is this already done inside rtracklayer such that I can
cannibalize
> existing code? I've only just started looking around.
>
>
You might check out the Quickload support (from the IGB project). It
is
very similar in structure and spirit to the UCSC track hub stuff.
As far as bigBed, it's certainly possible, but it would require
bringing
quite a bit more of the Kent library into rtracklayer and writing the
necessary wrappers. BigBED is more complicated than bigWig. For my use
cases, using tabix on top of a BED has been sufficient. UCSC-specific
stuff
is probably all one will find only in bigBed.
Thanks a lot for looking into this,
Michael
Thanks again,
>
> --t
>
>
>
>
> On Sun, Dec 2, 2012 at 3:23 PM, Tim Triche, Jr.
<tim.triche@gmail.com> >wrote:
>
> > I'm looking through the rtracklayer directory in SVN, specifically
src,
> > and finding common files shared between bigWig and bigBed. It
looks like
> > bigBed isn't supported yet, however, so I'm thinking of hacking up
bigWig
> > to do something similar. How close is/was bigBed support, and if
anyone
> > was/is working on it, are there any things I should look out for
in order
> > to save some time?
> >
> > The trackHub I'm most interested in using stores their bigData as
bigBed,
> > and at the moment it seems that import()'ing a bigBed isn't going
to
> > happen, so if I can take advantage of remote access, so much the
better.
> >
> > Thanks in advance,
> >
> > --t
> >
> >
> >
> > --
> > *A model is a lie that helps you see the truth.*
> > *
> > *
> > Howard Skipper<
> http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
> >
> >
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard Skipper<
> http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
[[alternative HTML version deleted]]
Parsing a trackHub turns out to be relatively casual, at least if the
hub
is well behaved. (If it isn't, must not be worthwhile... ;-))
R> dim(hubTracks)
[1] 11144 2
So that's a hub with 11144 tracks, which takes a while to parse
(perhaps
2-3 minutes). There are a bunch of flat tracks right off the root of
the
hub, and then some tracks grouped by parents, which takes seconds to
turn
into a grouped list of URLs.
I could rewrite the parse in C but I'm not going to; it would be
better to
cache the result and checksum the file. The DAG in this case is all
of 2
levels deep. bigBed support is required -- this isn't a UCSC site but
it
is a trackHub that provides a *lot* of data which, for many if not
most
people, will be more valuable than the ENCODE data. I need to switch
focus
for a couple days but will get back to this and bolt on bigBed
support.
Looks like rtracklayer is just going to have to pork up a bit :-)
One nice thing about bigBed and bigWig is that they do provide random
access using HTTP headers, which is handy for just retrieving a
particular
range of data. Whether that will be as easy to work with in R as in
the C
client remains to be seen.
Thanks,
--t
On Mon, Dec 3, 2012 at 9:36 AM, Michael Lawrence
<lawrence.michael@gene.com>wrote:
> On Sun, Dec 2, 2012 at 3:43 PM, Tim Triche, Jr.
<tim.triche@gmail.com> >wrote:
>
> > So, a trackHub as spec'd by UCSC ends up being a DAG. It may not
be
> > strictly directed, even; I'm pondering that. The file under a
hub's root
> > is title genomesall.txt and tells what trackDb is associated with
a
> genome.
> > Loading the corresponding trackDb file provides a bunch of
information
> > about the available data tracks for the hub, in a plain-text
> tab-delimited
> > format sort of like MAGE-tab or what have you. Or perhaps more
like
> Python
> > code (spaces seem to matter). Whatever.
> >
> > The single (only?) most important bit of a trackHub's information
is the
> > bigDataUrl, but not entirely unimportant is the track name (duh),
the
> > parent track (which is where I'm going next) and the
shortLabel/longLabel
> > fields. The 'type' field is also handy since it usually indicates
> whether
> > something is a bigWig, bigBed, BAM, or (something else) file.
> >
> > More questions:
> >
> > 1) try to store a trackHub's structure as a graph? It's enticing
but I'm
> > not sure how much I care, really I just want the data.
> >
>
> Probably not as a general graph. A simple tree of objects should
suffice.
>
>
> > 2) is this already done inside rtracklayer such that I can
cannibalize
> > existing code? I've only just started looking around.
> >
> >
> You might check out the Quickload support (from the IGB project). It
is
> very similar in structure and spirit to the UCSC track hub stuff.
>
> As far as bigBed, it's certainly possible, but it would require
bringing
> quite a bit more of the Kent library into rtracklayer and writing
the
> necessary wrappers. BigBED is more complicated than bigWig. For my
use
> cases, using tabix on top of a BED has been sufficient. UCSC-
specific stuff
> is probably all one will find only in bigBed.
>
> Thanks a lot for looking into this,
> Michael
>
>
> Thanks again,
> >
> > --t
> >
> >
> >
> >
> > On Sun, Dec 2, 2012 at 3:23 PM, Tim Triche, Jr.
<tim.triche@gmail.com> > >wrote:
> >
> > > I'm looking through the rtracklayer directory in SVN,
specifically src,
> > > and finding common files shared between bigWig and bigBed. It
looks
> like
> > > bigBed isn't supported yet, however, so I'm thinking of hacking
up
> bigWig
> > > to do something similar. How close is/was bigBed support, and
if
> anyone
> > > was/is working on it, are there any things I should look out for
in
> order
> > > to save some time?
> > >
> > > The trackHub I'm most interested in using stores their bigData
as
> bigBed,
> > > and at the moment it seems that import()'ing a bigBed isn't
going to
> > > happen, so if I can take advantage of remote access, so much the
> better.
> > >
> > > Thanks in advance,
> > >
> > > --t
> > >
> > >
> > >
> > > --
> > > *A model is a lie that helps you see the truth.*
> > > *
> > > *
> > > Howard Skipper<
> > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
> > >
> > >
> >
> >
> > --
> > *A model is a lie that helps you see the truth.*
> > *
> > *
> > Howard Skipper<
> > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
I guess this never went further? I see there is bigBed.h in the UCSC source in the rtracklayer package, but there doesn't seem to be actual bigBed functionality exposed at the R level.
For now, loading in bigBed to R seems to be impossible, and the only solution for working with bigBed files is to convert them to BED using UCSC's tools.
What was said 3.5 years ago still holds true today. It would require moving more of the Kent library into rtracklayer and writing the interface. It would be more work than the BigWig support, which was considerable. Since tabix-indexed BED files fulfill almost the same purpose, there hasn't been enough motivation for me to add BigBed support. If you're lucky enough to be motivated, I would consider a patch, but I fear that it would be a lot to maintain.