Entering edit mode
That's great, thanks!
Robert.
----- Reply message -----
From: "Michael Lawrence" <lawrence.michael@gene.com>
To: "Robert Castelo" <robert.castelo@upf.edu>
Cc: "Michael Lawrence" <lawrence.michael@gene.com>, "Steve Lianoglou"
<lianoglou.steve@gene.com>, "bioconductor@r-project.org"
<bioconductor@r-project.org>
Subject: [BioC] working with genome-wide phastCons scores
Date: Fri, Oct 25, 2013 9:05 pm
Fixed in 1.23.2.
On Wed, Oct 23, 2013 at 7:20 PM, Michael Lawrence <michafla@gene.com>
wrote:
> Things are just in flux. IRanges changed out from under me, and I'm
in the
> middle of refactoring to direct GRanges export.
>
>
> On Wed, Oct 23, 2013 at 2:40 PM, Robert Castelo
<robert.castelo@upf.edu>wrote:
>
>> Michael, Steve,
>>
>> thanks a lot for your input, this is the functionality i was
looking for
>> and it works like a charm in the current release, so question
solved!!
>>
>> yet, I have also tried this out in last development version and the
>> option 'asRle=TRUE' does not work and gives an error related to
IRanges. I
>> know the 'devel' version is a moving target but, just in case this
was
>> unexpected, I've pasted here below the code and its output and the
session
>> information:
>>
>> library(BSgenome.Hsapiens.UCSC.hg19)
>> library(rtracklayer)
>>
>> download.file(
>> "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vert
ebrate/chrY.phastCons46way.wigFix.gz"<http: hgdownload.cse.ucsc.edu="" g="" oldenpath="" hg19="" phastcons46way="" vertebrate="" chry.phastcons46way.wigfix.gz="">
>> ,
>> "chrY.phastCons46way.wigFix.gz", method="curl")
>>
>> si <- Seqinfo(seqnames=seqnames(Hsapiens),
>> seqlengths=seqlengths(Hsapiens))
>> wigToBigWig("chrY.phastCons46way.wigFix.gz", seqinfo=si)
>>
>> pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw")) ## OK!
>> head(pc22, n=3)
>> GRanges with 3 ranges and 1 metadata column:
>> seqnames ranges strand | score
>> <rle> <iranges> <rle> | <numeric>
>> [1] chrY [10502, 10502] * | 0.23199999332428
>> [2] chrY [10503, 10503] * | 0.226999998092651
>> [3] chrY [10504, 10504] * | 0.218999996781349
>> ---
>> seqlengths:
>> chrY
>> 59373566
>>
>> pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle=TRUE)
##
>> ERROR!
>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
>> object '.Ranges.coverage' not found
>>
>> traceback()
>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)
>> 11: IRanges:::.Ranges.coverage
>> 10: (function (r, v, sl)
>> {
>> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score)
>> })(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]])
>> 9: mapply(function(r, v, sl) {
>> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score)
>> }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE)
>> 8: .dotargsAsList("Rle", ...)
>> 7: RleList(mapply(function(r, v, sl) {
>> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score)
>> }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE),
compress =
>> FALSE)
>> 6: rdToRle(rd)
>> 5: .local(con, format, text, ...)
>> 4: import(con, "BigWig", ...)
>> 3: import(con, "BigWig", ...)
>> 2: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE)
>> 1: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE)
>>
>> sessionInfo()
>> R Under development (unstable) (2013-10-20 r64082)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
>> LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
>> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
>> LC_PAPER=en_US.UTF8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> base
>>
>> other attached packages:
>> [1] rtracklayer_1.23.1
>> BSgenome.Hsapiens.UCSC.hg19_1.3.19 BSgenome_1.31.0
>> [4] Biostrings_2.31.0
>> GenomicRanges_1.15.1 XVector_0.3.0
>> [7] IRanges_1.21.2
>> BiocGenerics_0.9.0 BiocInstaller_1.13.1
>> [10] vimcom_0.9-9
>> setwidth_1.0-3 colorout_1.0-1
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6 RCurl_1.95-4.1 Rsamtools_1.15.2 stats4_3.1.0
>> tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0
>>
>>
>> thanks again for your help!!
>> robert.
>>
>>
>>
>>
>> On 10/23/13 9:43 PM, Michael Lawrence wrote:
>>
>>
>>
>>
>> On Wed, Oct 23, 2013 at 9:55 AM, Steve Lianoglou <
>> lianoglou.steve@gene.com> wrote:
>>
>>> Hi Robert,
>>>
>>> On Wed, Oct 23, 2013 at 9:03 AM, Robert Castelo
<robert.castelo@upf.edu>
>>> wrote:
>>> > dear list,
>>> >
>>> > i have to pretty intensively work with genome-wide phastcons
scores and
>>> > instead of repeatedly interrogate them through the internet via
the
>>> UCSC
>>> > genome browser with 'rtracklayer', i'd prefer to do a bulk
download of
>>> the
>>> > *.phastCons46way.wigFix.gz files (about 0.6Gb) at
>>> >
>>> >
>>> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vert
ebrate
>>> >
>>> > and then import them into R storing the information in some
memory
>>> efficient
>>> > data structure (Rle?) that provides me also an efficient way to
query
>>> the
>>> > phastcons score at any position of the human genome.
>>> >
>>> > all documentation and messages i've been able to retrieve
through
>>> google and
>>> > the BioC list correspond to use cases that involve a small
fraction of
>>> the
>>> > genome which can be handled by 'rtracklayer' with an internet
>>> connection.
>>> >
>>> > any hint on how to achieve this goal will be very much
appreciated,
>>>
>>> I previously did this by downloading the wig files and converting
them
>>> to bigWig format. To do so, I used the wigToBigWig tool you can
>>> eventually get to form here:
>>>
>>> http://genomewiki.ucsc.edu/index.php/Kent_source_utilities
>>>
>>> It is unfortunate that UCSC doesn't host the bigWig files, as
>>> conversion from wig to bigWig is hugely memory intensive (last I
>>> remember, anyway).
>>>
>>> (Update: not sure when this happened, but rtracklayer actually
wraps
>>> this functionality in its wigToBigWig function -- nice! -- so you
can
>>> convert the wig file to bigWig straight from R (assuming you have
a
>>> machine with enough horsepower))
>>>
>>>
>> Of potential importance is that the Kent utilities are licensed
for
>> non-commercial use only. That's mostly what drove the
implementation of
>> wigToBigWig(). A
>>
>>
>>> After that, you could then use rtracklayer's import functions over
the
>>> bigWig files to query them using a GRanges object. Look at the
>>> Examples section in the ?import.bw man page for some help.
>>>
>>>
>> In the new release there's an argument asRle that if TRUE will
>> efficiently return an RleList from the BigWig, instead of a
GRanges.
>>
>> If you want to query the RleList for single positions (in a
GRanges)
>> and get the result as an atomic vector, it's most efficient to use
>> VariantTools::extractCoverageForPositions. That function should
probably
>> live somewhere more general but anyway, it's a fast path for single
>> position RleList=>vector extraction.
>>
>>
>>> HTH,
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Computational Biologist
>>> Bioinformatics and Computational Biology
>>> Genentech
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>
[[alternative HTML version deleted]]