Conservation tracks in BioC?
1
0
Entering edit mode
Mercier Eloi ▴ 30
@mercier-eloi-3799
Last seen 10.3 years ago
Hi everyone, I get hundreds of TF binding sites and I would be interesting to acces conservation data from UCSC. I tried the ucscTableQuery but as mentionned Paul, there is no easy way to send multiple queries. I tried something like : > start=c(500,600,700) > end=c(509,609,709) > chr=rep("chr1",3) > q1<- ucscTableQuery(session, "cons44way", GenomicRanges(start,end, chr)) > q1 Get track 'cons44way' within hg18:chr1:500600700-509609709 That is not what I would like. I also tried to download the entire track by using : > q1<- ucscTableQuery(session, "cons44way", GenomicRanges(genome="hg18")) > q1 Get track 'cons44way' within hg18:*:*-* > track(q1) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 100001 did not have 2 elements Looks like the track cannot exced 99,999 lines. So is there any easy way to send multiple queries using ucscTableQuery ? Thanks. On 24/09/09 9:01 PM, "Paul Leo" <p.leo at="" uq.edu.au=""> wrote: Hi Marc, Yes that an nice idea. I've got about 80K small regions across the genome, I worry that if I can't SUM() the conservation score across each region that it would an uncomfortably large query as the track is score per bp I think. Anyway I'll explore, thanks for the tip. Cheers Paul -----Original Message----- From: Marc Carlson <mcarlson@fhcrc.org> To: Paul Leo <p.leo at="" uq.edu.au=""> Cc: bioconductor <bioconductor at="" stat.math.ethz.ch=""> Subject: Re: [BioC] Conservation tracks in BioC? Date: Thu, 24 Sep 2009 17:10:38 -0700 Hi Paul, Not really as an annotation package. But there is an example of how you can get data like this from the UCSC tables in rtracklayer. Just load up rtracklayer and then look at the help page for ucscTableQuery. library("rtracklayer") ?ucscTableQuery Marc Paul Leo wrote: > Would like to get some advice on the best way to access Conservation > data from UCSC like the human-17 way conservation etc. Now I know HOW to > get it... ftp, table browser, SQL from UCSC.....and I was going to > download it and put it into an Rle object for later use. I need the lot, > not small pieces at a time.. > > My question is is it already packaged up in R-Annotation somewhere and > I've missed it? > > Thanks > Paul > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------------------ Eloi Mercier Computational Biology, IRCM 110 av. Des Pins O. Montreal Canada, QC
Annotation rtracklayer Annotation rtracklayer • 1.0k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-2759
Last seen 10.3 years ago
On Fri, Nov 13, 2009 at 12:52 PM, Mercier Eloi <eloi.mercier@ircm.qc.ca>wrote: > > Hi everyone, > > I get hundreds of TF binding sites and I would be interesting to acces > conservation data from UCSC. > I tried the ucscTableQuery but as mentionned Paul, there is no easy way to > send multiple queries. > > I tried something like : > > start=c(500,600,700) > > end=c(509,609,709) > > chr=rep("chr1",3) > > q1<- ucscTableQuery(session, "cons44way", GenomicRanges(start,end, chr)) > > q1 > Get track 'cons44way' within hg18:chr1:500600700-509609709 > > That is not what I would like. > > I also tried to download the entire track by using : > > q1<- ucscTableQuery(session, "cons44way", GenomicRanges(genome="hg18")) > > q1 > Get track 'cons44way' within hg18:*:*-* > > track(q1) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 100001 did not have 2 elements > > Looks like the track cannot exced 99,999 lines. > > So is there any easy way to send multiple queries using ucscTableQuery ? > > There has been some work towards efficiently querying UCSC for a large set of regions. I have not got it to really work yet (reverse engineering the UCSC interface can be tricky). There will always be the limit of 100,000 lines, which is enforced by UCSC. Btw, you can upload your regions as BED to the browser, and then use the intersect feature in the table browser manually to download the scores for your regions of interest. As far as loading the entire track, it would be possible to download the data in pieces (100kb at once), but with one line per base, this is going to be tough for many computers to handle all at once. One would want to store the data in a more efficient form than a tab-separated file. In many ways, this is a problem similar to the one solved by BSgenome. Each score could be stored in 2 bytes and the data would be loaded one chromosome at a time. > Thanks. > > > On 24/09/09 9:01 PM, "Paul Leo" <p.leo@uq.edu.au> wrote: > > Hi Marc, > Yes that an nice idea. > > I've got about 80K small regions across the genome, I worry that if I > can't SUM() the conservation score across each region that it would an > uncomfortably large query as the track is score per bp I think. Anyway > I'll explore, thanks for the tip. > > Cheers > Paul > > -----Original Message----- > From: Marc Carlson <mcarlson@fhcrc.org> > To: Paul Leo <p.leo@uq.edu.au> > Cc: bioconductor <bioconductor@stat.math.ethz.ch> > Subject: Re: [BioC] Conservation tracks in BioC? > Date: Thu, 24 Sep 2009 17:10:38 -0700 > > Hi Paul, > > Not really as an annotation package. But there is an example of how you > can get data like this from the UCSC tables in rtracklayer. Just load > up rtracklayer and then look at the help page for ucscTableQuery. > > library("rtracklayer") > ?ucscTableQuery > > > Marc > > > Paul Leo wrote: > > Would like to get some advice on the best way to access Conservation > > data from UCSC like the human-17 way conservation etc. Now I know HOW to > > get it... ftp, table browser, SQL from UCSC.....and I was going to > > download it and put it into an Rle object for later use. I need the lot, > > not small pieces at a time.. > > > > My question is is it already packaged up in R-Annotation somewhere and > > I've missed it? > > > > Thanks > > Paul > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ------------------------------------------ > Eloi Mercier > Computational Biology, IRCM > 110 av. Des Pins O. > Montreal > Canada, QC > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6