working with genome-wide phastCons scores
0
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 19 days ago
Barcelona/Universitat Pompeu Fabra
That's great, thanks! Robert. ----- Reply message ----- From: "Michael Lawrence" <lawrence.michael@gene.com> To: "Robert Castelo" <robert.castelo@upf.edu> Cc: "Michael Lawrence" <lawrence.michael@gene.com>, "Steve Lianoglou" <lianoglou.steve@gene.com>, "bioconductor@r-project.org" <bioconductor@r-project.org> Subject: [BioC] working with genome-wide phastCons scores Date: Fri, Oct 25, 2013 9:05 pm Fixed in 1.23.2. On Wed, Oct 23, 2013 at 7:20 PM, Michael Lawrence <michafla@gene.com> wrote: > Things are just in flux. IRanges changed out from under me, and I'm in the > middle of refactoring to direct GRanges export. > > > On Wed, Oct 23, 2013 at 2:40 PM, Robert Castelo <robert.castelo@upf.edu>wrote: > >> Michael, Steve, >> >> thanks a lot for your input, this is the functionality i was looking for >> and it works like a charm in the current release, so question solved!! >> >> yet, I have also tried this out in last development version and the >> option 'asRle=TRUE' does not work and gives an error related to IRanges. I >> know the 'devel' version is a moving target but, just in case this was >> unexpected, I've pasted here below the code and its output and the session >> information: >> >> library(BSgenome.Hsapiens.UCSC.hg19) >> library(rtracklayer) >> >> download.file( >> "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vert ebrate/chrY.phastCons46way.wigFix.gz"<http: hgdownload.cse.ucsc.edu="" g="" oldenpath="" hg19="" phastcons46way="" vertebrate="" chry.phastcons46way.wigfix.gz=""> >> , >> "chrY.phastCons46way.wigFix.gz", method="curl") >> >> si <- Seqinfo(seqnames=seqnames(Hsapiens), >> seqlengths=seqlengths(Hsapiens)) >> wigToBigWig("chrY.phastCons46way.wigFix.gz", seqinfo=si) >> >> pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw")) ## OK! >> head(pc22, n=3) >> GRanges with 3 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <numeric> >> [1] chrY [10502, 10502] * | 0.23199999332428 >> [2] chrY [10503, 10503] * | 0.226999998092651 >> [3] chrY [10504, 10504] * | 0.218999996781349 >> --- >> seqlengths: >> chrY >> 59373566 >> >> pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle=TRUE) ## >> ERROR! >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : >> object '.Ranges.coverage' not found >> >> traceback() >> 12: get(name, envir = asNamespace(pkg), inherits = FALSE) >> 11: IRanges:::.Ranges.coverage >> 10: (function (r, v, sl) >> { >> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) >> })(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) >> 9: mapply(function(r, v, sl) { >> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) >> }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE) >> 8: .dotargsAsList("Rle", ...) >> 7: RleList(mapply(function(r, v, sl) { >> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) >> }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE), compress = >> FALSE) >> 6: rdToRle(rd) >> 5: .local(con, format, text, ...) >> 4: import(con, "BigWig", ...) >> 3: import(con, "BigWig", ...) >> 2: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE) >> 1: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE) >> >> sessionInfo() >> R Under development (unstable) (2013-10-20 r64082) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C >> LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 >> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 >> LC_PAPER=en_US.UTF8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] rtracklayer_1.23.1 >> BSgenome.Hsapiens.UCSC.hg19_1.3.19 BSgenome_1.31.0 >> [4] Biostrings_2.31.0 >> GenomicRanges_1.15.1 XVector_0.3.0 >> [7] IRanges_1.21.2 >> BiocGenerics_0.9.0 BiocInstaller_1.13.1 >> [10] vimcom_0.9-9 >> setwidth_1.0-3 colorout_1.0-1 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-6 RCurl_1.95-4.1 Rsamtools_1.15.2 stats4_3.1.0 >> tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 >> >> >> thanks again for your help!! >> robert. >> >> >> >> >> On 10/23/13 9:43 PM, Michael Lawrence wrote: >> >> >> >> >> On Wed, Oct 23, 2013 at 9:55 AM, Steve Lianoglou < >> lianoglou.steve@gene.com> wrote: >> >>> Hi Robert, >>> >>> On Wed, Oct 23, 2013 at 9:03 AM, Robert Castelo <robert.castelo@upf.edu> >>> wrote: >>> > dear list, >>> > >>> > i have to pretty intensively work with genome-wide phastcons scores and >>> > instead of repeatedly interrogate them through the internet via the >>> UCSC >>> > genome browser with 'rtracklayer', i'd prefer to do a bulk download of >>> the >>> > *.phastCons46way.wigFix.gz files (about 0.6Gb) at >>> > >>> > >>> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vert ebrate >>> > >>> > and then import them into R storing the information in some memory >>> efficient >>> > data structure (Rle?) that provides me also an efficient way to query >>> the >>> > phastcons score at any position of the human genome. >>> > >>> > all documentation and messages i've been able to retrieve through >>> google and >>> > the BioC list correspond to use cases that involve a small fraction of >>> the >>> > genome which can be handled by 'rtracklayer' with an internet >>> connection. >>> > >>> > any hint on how to achieve this goal will be very much appreciated, >>> >>> I previously did this by downloading the wig files and converting them >>> to bigWig format. To do so, I used the wigToBigWig tool you can >>> eventually get to form here: >>> >>> http://genomewiki.ucsc.edu/index.php/Kent_source_utilities >>> >>> It is unfortunate that UCSC doesn't host the bigWig files, as >>> conversion from wig to bigWig is hugely memory intensive (last I >>> remember, anyway). >>> >>> (Update: not sure when this happened, but rtracklayer actually wraps >>> this functionality in its wigToBigWig function -- nice! -- so you can >>> convert the wig file to bigWig straight from R (assuming you have a >>> machine with enough horsepower)) >>> >>> >> Of potential importance is that the Kent utilities are licensed for >> non-commercial use only. That's mostly what drove the implementation of >> wigToBigWig(). A >> >> >>> After that, you could then use rtracklayer's import functions over the >>> bigWig files to query them using a GRanges object. Look at the >>> Examples section in the ?import.bw man page for some help. >>> >>> >> In the new release there's an argument asRle that if TRUE will >> efficiently return an RleList from the BigWig, instead of a GRanges. >> >> If you want to query the RleList for single positions (in a GRanges) >> and get the result as an atomic vector, it's most efficient to use >> VariantTools::extractCoverageForPositions. That function should probably >> live somewhere more general but anyway, it's a fast path for single >> position RleList=>vector extraction. >> >> >>> HTH, >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Computational Biologist >>> Bioinformatics and Computational Biology >>> Genentech >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> > [[alternative HTML version deleted]]
BSgenome convert BSgenome rtracklayer IRanges charm BSgenome convert BSgenome rtracklayer • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6