rtrackalyer too slow?
1
0
Entering edit mode
Wu, Xiwei ▴ 350
@wu-xiwei-1102
Last seen 10.3 years ago
Hi, I recently started to use rtracklayer, so I am still new to it. Here I am trying to get conservation score for a number of genomic loci. The code below works, but extremely slow. I found it took several seconds to get one done, so it becomes impossible when it gets to over 1000 sites. What is missing here? Is there any alternative way to get this done? Any help will be highly appreciated. > head(dum) chr start end 1 chr1 7233717 7233734 2 chr1 48238854 48238870 3 chr1 175465350 175465366 4 chr1 206969193 206969209 5 chr1 214532960 214532976 6 chr1 216671605 216671622 > getConserv <- function(genome="hg18", data) { + session <- browserSession() + result <- numeric() + genome(session) <- genome + for (i in 1:nrow(data)) { + query <- ucscTableQuery(session, "cons44way",GenomicRanges(data[i,"start"]), data[i, "end"], data[i, "chr"])) + tableName(query) <- "phyloP44wayPlacMammal" + a <- track(query) + result[i] <- mean(score(a)) + } + return(result) + } > b <- getConserv(data=dum) > head(b) [1] -1.76180289 -0.28795906 0.05897465 -0.23541347 -0.26477671 0.29019661 > sessionInfo() R version 2.11.0 Under development (unstable) (2010-01-28 r51065) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.6.0 RCurl_1.3-1 [3] bitops_1.0-4.1 smRNAseq_1.0 [5] time_1.0 BSgenome.Hsapiens.UCSC.hg18_1.3.16 [7] ShortRead_1.5.11 lattice_0.18-3 [9] BSgenome_1.15.4 Biostrings_2.15.19 [11] IRanges_1.5.36 loaded via a namespace (and not attached): [1] Biobase_2.7.4 grid_2.11.0 hwriter_1.1 tools_2.11.0 XML_2.6-0 Xiwei --------------------------------------------------------------------- SECURITY/CONFIDENTIALITY WARNING: \ This message and ...{{dropped:30}}
rtracklayer rtracklayer • 840 views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
On Fri, Mar 5, 2010 at 4:32 PM, Wu, Xiwei <xwu@coh.org> wrote: > Hi, > > I recently started to use rtracklayer, so I am still new to it. Here I > am trying to get conservation score for a number of genomic loci. The > code below works, but extremely slow. I found it took several seconds to > get one done, so it becomes impossible when it gets to over 1000 sites. > What is missing here? Is there any alternative way to get this done? Any > help will be highly appreciated. > > Each query will result in several HTTP requests. If you're going to be accessing conservation often, you could download the track, convert it to bigWig and use the bigWig import support. At some point it might make sense to get the conservation track into some sort of R package for distribution. I'm also working on getting the intersection support working in the table browser. Just another thing on the TODO list. Michael > head(dum) > chr start end > 1 chr1 7233717 7233734 > 2 chr1 48238854 48238870 > 3 chr1 175465350 175465366 > 4 chr1 206969193 206969209 > 5 chr1 214532960 214532976 > 6 chr1 216671605 216671622 > > > getConserv <- function(genome="hg18", data) { > + session <- browserSession() > + result <- numeric() > + genome(session) <- genome > + for (i in 1:nrow(data)) { > + query <- ucscTableQuery(session, > "cons44way",GenomicRanges(data[i,"start"]), data[i, "end"], > data[i, "chr"])) > + tableName(query) <- "phyloP44wayPlacMammal" > + a <- track(query) > + result[i] <- mean(score(a)) > + } > + return(result) > + } > > b <- getConserv(data=dum) > > head(b) > [1] -1.76180289 -0.28795906 0.05897465 -0.23541347 -0.26477671 > 0.29019661 > > > > sessionInfo() > R version 2.11.0 Under development (unstable) (2010-01-28 r51065) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > other attached packages: > [1] rtracklayer_1.6.0 RCurl_1.3-1 > > [3] bitops_1.0-4.1 smRNAseq_1.0 > > [5] time_1.0 > BSgenome.Hsapiens.UCSC.hg18_1.3.16 > [7] ShortRead_1.5.11 lattice_0.18-3 > > [9] BSgenome_1.15.4 Biostrings_2.15.19 > > [11] IRanges_1.5.36 > > loaded via a namespace (and not attached): > [1] Biobase_2.7.4 grid_2.11.0 hwriter_1.1 tools_2.11.0 XML_2.6-0 > > Xiwei > > > --------------------------------------------------------------------- > SECURITY/CONFIDENTIALITY WARNING: \ This message and ...{{dropped:30}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 484 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6