working with genome-wide phastCons scores
1
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 3 hours ago
Barcelona/Universitat Pompeu Fabra
dear list, i have to pretty intensively work with genome-wide phastcons scores and instead of repeatedly interrogate them through the internet via the UCSC genome browser with 'rtracklayer', i'd prefer to do a bulk download of the *.phastCons46way.wigFix.gz files (about 0.6Gb) at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vertebra te and then import them into R storing the information in some memory efficient data structure (Rle?) that provides me also an efficient way to query the phastcons score at any position of the human genome. all documentation and messages i've been able to retrieve through google and the BioC list correspond to use cases that involve a small fraction of the genome which can be handled by 'rtracklayer' with an internet connection. any hint on how to achieve this goal will be very much appreciated, thanks! robert.
• 1.9k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi Robert, On Wed, Oct 23, 2013 at 9:03 AM, Robert Castelo <robert.castelo at="" upf.edu=""> wrote: > dear list, > > i have to pretty intensively work with genome-wide phastcons scores and > instead of repeatedly interrogate them through the internet via the UCSC > genome browser with 'rtracklayer', i'd prefer to do a bulk download of the > *.phastCons46way.wigFix.gz files (about 0.6Gb) at > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/verteb rate > > and then import them into R storing the information in some memory efficient > data structure (Rle?) that provides me also an efficient way to query the > phastcons score at any position of the human genome. > > all documentation and messages i've been able to retrieve through google and > the BioC list correspond to use cases that involve a small fraction of the > genome which can be handled by 'rtracklayer' with an internet connection. > > any hint on how to achieve this goal will be very much appreciated, I previously did this by downloading the wig files and converting them to bigWig format. To do so, I used the wigToBigWig tool you can eventually get to form here: http://genomewiki.ucsc.edu/index.php/Kent_source_utilities It is unfortunate that UCSC doesn't host the bigWig files, as conversion from wig to bigWig is hugely memory intensive (last I remember, anyway). (Update: not sure when this happened, but rtracklayer actually wraps this functionality in its wigToBigWig function -- nice! -- so you can convert the wig file to bigWig straight from R (assuming you have a machine with enough horsepower)) After that, you could then use rtracklayer's import functions over the bigWig files to query them using a GRanges object. Look at the Examples section in the ?import.bw man page for some help. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
ADD COMMENT
0
Entering edit mode
On Wed, Oct 23, 2013 at 9:55 AM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi Robert, > > On Wed, Oct 23, 2013 at 9:03 AM, Robert Castelo <robert.castelo@upf.edu> > wrote: > > dear list, > > > > i have to pretty intensively work with genome-wide phastcons scores and > > instead of repeatedly interrogate them through the internet via the UCSC > > genome browser with 'rtracklayer', i'd prefer to do a bulk download of > the > > *.phastCons46way.wigFix.gz files (about 0.6Gb) at > > > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vert ebrate > > > > and then import them into R storing the information in some memory > efficient > > data structure (Rle?) that provides me also an efficient way to query the > > phastcons score at any position of the human genome. > > > > all documentation and messages i've been able to retrieve through google > and > > the BioC list correspond to use cases that involve a small fraction of > the > > genome which can be handled by 'rtracklayer' with an internet connection. > > > > any hint on how to achieve this goal will be very much appreciated, > > I previously did this by downloading the wig files and converting them > to bigWig format. To do so, I used the wigToBigWig tool you can > eventually get to form here: > > http://genomewiki.ucsc.edu/index.php/Kent_source_utilities > > It is unfortunate that UCSC doesn't host the bigWig files, as > conversion from wig to bigWig is hugely memory intensive (last I > remember, anyway). > > (Update: not sure when this happened, but rtracklayer actually wraps > this functionality in its wigToBigWig function -- nice! -- so you can > convert the wig file to bigWig straight from R (assuming you have a > machine with enough horsepower)) > > Of potential importance is that the Kent utilities are licensed for non-commercial use only. That's mostly what drove the implementation of wigToBigWig(). A > After that, you could then use rtracklayer's import functions over the > bigWig files to query them using a GRanges object. Look at the > Examples section in the ?import.bw man page for some help. > > In the new release there's an argument asRle that if TRUE will efficiently return an RleList from the BigWig, instead of a GRanges. If you want to query the RleList for single positions (in a GRanges) and get the result as an atomic vector, it's most efficient to use VariantTools::extractCoverageForPositions. That function should probably live somewhere more general but anyway, it's a fast path for single position RleList=>vector extraction. > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Michael, Steve, thanks a lot for your input, this is the functionality i was looking for and it works like a charm in the current release, so question solved!! yet, I have also tried this out in last development version and the option 'asRle=TRUE' does not work and gives an error related to IRanges. I know the 'devel' version is a moving target but, just in case this was unexpected, I've pasted here below the code and its output and the session information: library(BSgenome.Hsapiens.UCSC.hg19) library(rtracklayer) download.file("http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCon s46way/vertebrate/chrY.phastCons46way.wigFix.gz", "chrY.phastCons46way.wigFix.gz", method="curl") si <- Seqinfo(seqnames=seqnames(Hsapiens), seqlengths=seqlengths(Hsapiens)) wigToBigWig("chrY.phastCons46way.wigFix.gz", seqinfo=si) pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw")) ## OK! head(pc22, n=3) GRanges with 3 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <numeric> [1] chrY [10502, 10502] * | 0.23199999332428 [2] chrY [10503, 10503] * | 0.226999998092651 [3] chrY [10504, 10504] * | 0.218999996781349 --- seqlengths: chrY 59373566 pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle=TRUE) ## ERROR! Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object '.Ranges.coverage' not found traceback() 12: get(name, envir = asNamespace(pkg), inherits = FALSE) 11: IRanges:::.Ranges.coverage 10: (function (r, v, sl) { IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) })(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) 9: mapply(function(r, v, sl) { IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE) 8: .dotargsAsList("Rle", ...) 7: RleList(mapply(function(r, v, sl) { IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE), compress = FALSE) 6: rdToRle(rd) 5: .local(con, format, text, ...) 4: import(con, "BigWig", ...) 3: import(con, "BigWig", ...) 2: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE) 1: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE) sessionInfo() R Under development (unstable) (2013-10-20 r64082) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 LC_PAPER=en_US.UTF8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.23.1 BSgenome.Hsapiens.UCSC.hg19_1.3.19 BSgenome_1.31.0 [4] Biostrings_2.31.0 GenomicRanges_1.15.1 XVector_0.3.0 [7] IRanges_1.21.2 BiocGenerics_0.9.0 BiocInstaller_1.13.1 [10] vimcom_0.9-9 setwidth_1.0-3 colorout_1.0-1 loaded via a namespace (and not attached): [1] bitops_1.0-6 RCurl_1.95-4.1 Rsamtools_1.15.2 stats4_3.1.0 tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 thanks again for your help!! robert. On 10/23/13 9:43 PM, Michael Lawrence wrote: > > > > On Wed, Oct 23, 2013 at 9:55 AM, Steve Lianoglou > <lianoglou.steve@gene.com <mailto:lianoglou.steve@gene.com="">> wrote: > > Hi Robert, > > On Wed, Oct 23, 2013 at 9:03 AM, Robert Castelo > <robert.castelo@upf.edu <mailto:robert.castelo@upf.edu="">> wrote: > > dear list, > > > > i have to pretty intensively work with genome-wide phastcons > scores and > > instead of repeatedly interrogate them through the internet via > the UCSC > > genome browser with 'rtracklayer', i'd prefer to do a bulk > download of the > > *.phastCons46way.wigFix.gz files (about 0.6Gb) at > > > > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/ve rtebrate > > > > and then import them into R storing the information in some > memory efficient > > data structure (Rle?) that provides me also an efficient way to > query the > > phastcons score at any position of the human genome. > > > > all documentation and messages i've been able to retrieve > through google and > > the BioC list correspond to use cases that involve a small > fraction of the > > genome which can be handled by 'rtracklayer' with an internet > connection. > > > > any hint on how to achieve this goal will be very much appreciated, > > I previously did this by downloading the wig files and converting them > to bigWig format. To do so, I used the wigToBigWig tool you can > eventually get to form here: > > http://genomewiki.ucsc.edu/index.php/Kent_source_utilities > > It is unfortunate that UCSC doesn't host the bigWig files, as > conversion from wig to bigWig is hugely memory intensive (last I > remember, anyway). > > (Update: not sure when this happened, but rtracklayer actually wraps > this functionality in its wigToBigWig function -- nice! -- so you can > convert the wig file to bigWig straight from R (assuming you have a > machine with enough horsepower)) > > > Of potential importance is that the Kent utilities are licensed for > non-commercial use only. That's mostly what drove the implementation > of wigToBigWig(). A > > After that, you could then use rtracklayer's import functions over the > bigWig files to query them using a GRanges object. Look at the > Examples section in the ?import.bw <http: import.bw=""> man page for > some help. > > > In the new release there's an argument asRle that if TRUE will > efficiently return an RleList from the BigWig, instead of a GRanges. > > If you want to query the RleList for single positions (in a GRanges) > and get the result as an atomic vector, it's most efficient to use > VariantTools::extractCoverageForPositions. That function should > probably live somewhere more general but anyway, it's a fast path for > single position RleList=>vector extraction. > > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Things are just in flux. IRanges changed out from under me, and I'm in the middle of refactoring to direct GRanges export. On Wed, Oct 23, 2013 at 2:40 PM, Robert Castelo <robert.castelo@upf.edu>wrote: > Michael, Steve, > > thanks a lot for your input, this is the functionality i was looking for > and it works like a charm in the current release, so question solved!! > > yet, I have also tried this out in last development version and the option > 'asRle=TRUE' does not work and gives an error related to IRanges. I know > the 'devel' version is a moving target but, just in case this was > unexpected, I've pasted here below the code and its output and the session > information: > > library(BSgenome.Hsapiens.UCSC.hg19) > library(rtracklayer) > > download.file( > "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/verte brate/chrY.phastCons46way.wigFix.gz"<http: hgdownload.cse.ucsc.edu="" go="" ldenpath="" hg19="" phastcons46way="" vertebrate="" chry.phastcons46way.wigfix.gz=""> > , > "chrY.phastCons46way.wigFix.gz", method="curl") > > si <- Seqinfo(seqnames=seqnames(Hsapiens), seqlengths=seqlengths(Hsapiens)) > wigToBigWig("chrY.phastCons46way.wigFix.gz", seqinfo=si) > > pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw")) ## OK! > head(pc22, n=3) > GRanges with 3 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <numeric> > [1] chrY [10502, 10502] * | 0.23199999332428 > [2] chrY [10503, 10503] * | 0.226999998092651 > [3] chrY [10504, 10504] * | 0.218999996781349 > --- > seqlengths: > chrY > 59373566 > > pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle=TRUE) ## > ERROR! > Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > object '.Ranges.coverage' not found > > traceback() > 12: get(name, envir = asNamespace(pkg), inherits = FALSE) > 11: IRanges:::.Ranges.coverage > 10: (function (r, v, sl) > { > IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) > })(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) > 9: mapply(function(r, v, sl) { > IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) > }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE) > 8: .dotargsAsList("Rle", ...) > 7: RleList(mapply(function(r, v, sl) { > IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) > }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE), compress = > FALSE) > 6: rdToRle(rd) > 5: .local(con, format, text, ...) > 4: import(con, "BigWig", ...) > 3: import(con, "BigWig", ...) > 2: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE) > 1: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE) > > sessionInfo() > R Under development (unstable) (2013-10-20 r64082) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C > LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 > LC_PAPER=en_US.UTF8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] rtracklayer_1.23.1 BSgenome.Hsapiens.UCSC.hg19_1.3.19 > BSgenome_1.31.0 > [4] Biostrings_2.31.0 GenomicRanges_1.15.1 > XVector_0.3.0 > [7] IRanges_1.21.2 BiocGenerics_0.9.0 > BiocInstaller_1.13.1 > [10] vimcom_0.9-9 setwidth_1.0-3 > colorout_1.0-1 > > loaded via a namespace (and not attached): > [1] bitops_1.0-6 RCurl_1.95-4.1 Rsamtools_1.15.2 stats4_3.1.0 > tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 > > > thanks again for your help!! > robert. > > > > > On 10/23/13 9:43 PM, Michael Lawrence wrote: > > > > > On Wed, Oct 23, 2013 at 9:55 AM, Steve Lianoglou <lianoglou.steve@gene.com> > wrote: > >> Hi Robert, >> >> On Wed, Oct 23, 2013 at 9:03 AM, Robert Castelo <robert.castelo@upf.edu> >> wrote: >> > dear list, >> > >> > i have to pretty intensively work with genome-wide phastcons scores and >> > instead of repeatedly interrogate them through the internet via the UCSC >> > genome browser with 'rtracklayer', i'd prefer to do a bulk download of >> the >> > *.phastCons46way.wigFix.gz files (about 0.6Gb) at >> > >> > >> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/verte brate >> > >> > and then import them into R storing the information in some memory >> efficient >> > data structure (Rle?) that provides me also an efficient way to query >> the >> > phastcons score at any position of the human genome. >> > >> > all documentation and messages i've been able to retrieve through >> google and >> > the BioC list correspond to use cases that involve a small fraction of >> the >> > genome which can be handled by 'rtracklayer' with an internet >> connection. >> > >> > any hint on how to achieve this goal will be very much appreciated, >> >> I previously did this by downloading the wig files and converting them >> to bigWig format. To do so, I used the wigToBigWig tool you can >> eventually get to form here: >> >> http://genomewiki.ucsc.edu/index.php/Kent_source_utilities >> >> It is unfortunate that UCSC doesn't host the bigWig files, as >> conversion from wig to bigWig is hugely memory intensive (last I >> remember, anyway). >> >> (Update: not sure when this happened, but rtracklayer actually wraps >> this functionality in its wigToBigWig function -- nice! -- so you can >> convert the wig file to bigWig straight from R (assuming you have a >> machine with enough horsepower)) >> >> > Of potential importance is that the Kent utilities are licensed for > non-commercial use only. That's mostly what drove the implementation of > wigToBigWig(). A > > >> After that, you could then use rtracklayer's import functions over the >> bigWig files to query them using a GRanges object. Look at the >> Examples section in the ?import.bw man page for some help. >> >> > In the new release there's an argument asRle that if TRUE will > efficiently return an RleList from the BigWig, instead of a GRanges. > > If you want to query the RleList for single positions (in a GRanges) and > get the result as an atomic vector, it's most efficient to use > VariantTools::extractCoverageForPositions. That function should probably > live somewhere more general but anyway, it's a fast path for single > position RleList=>vector extraction. > > >> HTH, >> -steve >> >> -- >> Steve Lianoglou >> Computational Biologist >> Bioinformatics and Computational Biology >> Genentech >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Fixed in 1.23.2. On Wed, Oct 23, 2013 at 7:20 PM, Michael Lawrence <michafla@gene.com> wrote: > Things are just in flux. IRanges changed out from under me, and I'm in the > middle of refactoring to direct GRanges export. > > > On Wed, Oct 23, 2013 at 2:40 PM, Robert Castelo <robert.castelo@upf.edu>wrote: > >> Michael, Steve, >> >> thanks a lot for your input, this is the functionality i was looking for >> and it works like a charm in the current release, so question solved!! >> >> yet, I have also tried this out in last development version and the >> option 'asRle=TRUE' does not work and gives an error related to IRanges. I >> know the 'devel' version is a moving target but, just in case this was >> unexpected, I've pasted here below the code and its output and the session >> information: >> >> library(BSgenome.Hsapiens.UCSC.hg19) >> library(rtracklayer) >> >> download.file( >> "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vert ebrate/chrY.phastCons46way.wigFix.gz"<http: hgdownload.cse.ucsc.edu="" g="" oldenpath="" hg19="" phastcons46way="" vertebrate="" chry.phastcons46way.wigfix.gz=""> >> , >> "chrY.phastCons46way.wigFix.gz", method="curl") >> >> si <- Seqinfo(seqnames=seqnames(Hsapiens), >> seqlengths=seqlengths(Hsapiens)) >> wigToBigWig("chrY.phastCons46way.wigFix.gz", seqinfo=si) >> >> pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw")) ## OK! >> head(pc22, n=3) >> GRanges with 3 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <numeric> >> [1] chrY [10502, 10502] * | 0.23199999332428 >> [2] chrY [10503, 10503] * | 0.226999998092651 >> [3] chrY [10504, 10504] * | 0.218999996781349 >> --- >> seqlengths: >> chrY >> 59373566 >> >> pc22 <- import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle=TRUE) ## >> ERROR! >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : >> object '.Ranges.coverage' not found >> >> traceback() >> 12: get(name, envir = asNamespace(pkg), inherits = FALSE) >> 11: IRanges:::.Ranges.coverage >> 10: (function (r, v, sl) >> { >> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) >> })(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) >> 9: mapply(function(r, v, sl) { >> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) >> }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE) >> 8: .dotargsAsList("Rle", ...) >> 7: RleList(mapply(function(r, v, sl) { >> IRanges:::.Ranges.coverage(r, width = sl, weight = v$score) >> }, ranges(x), values(x), seqlengths(x), SIMPLIFY = FALSE), compress = >> FALSE) >> 6: rdToRle(rd) >> 5: .local(con, format, text, ...) >> 4: import(con, "BigWig", ...) >> 3: import(con, "BigWig", ...) >> 2: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE) >> 1: import.bw(BigWigFile("chrY.phastCons46way.bw"), asRle = TRUE) >> >> sessionInfo() >> R Under development (unstable) (2013-10-20 r64082) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C >> LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 >> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 >> LC_PAPER=en_US.UTF8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] rtracklayer_1.23.1 >> BSgenome.Hsapiens.UCSC.hg19_1.3.19 BSgenome_1.31.0 >> [4] Biostrings_2.31.0 >> GenomicRanges_1.15.1 XVector_0.3.0 >> [7] IRanges_1.21.2 >> BiocGenerics_0.9.0 BiocInstaller_1.13.1 >> [10] vimcom_0.9-9 >> setwidth_1.0-3 colorout_1.0-1 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-6 RCurl_1.95-4.1 Rsamtools_1.15.2 stats4_3.1.0 >> tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 >> >> >> thanks again for your help!! >> robert. >> >> >> >> >> On 10/23/13 9:43 PM, Michael Lawrence wrote: >> >> >> >> >> On Wed, Oct 23, 2013 at 9:55 AM, Steve Lianoglou < >> lianoglou.steve@gene.com> wrote: >> >>> Hi Robert, >>> >>> On Wed, Oct 23, 2013 at 9:03 AM, Robert Castelo <robert.castelo@upf.edu> >>> wrote: >>> > dear list, >>> > >>> > i have to pretty intensively work with genome-wide phastcons scores and >>> > instead of repeatedly interrogate them through the internet via the >>> UCSC >>> > genome browser with 'rtracklayer', i'd prefer to do a bulk download of >>> the >>> > *.phastCons46way.wigFix.gz files (about 0.6Gb) at >>> > >>> > >>> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vert ebrate >>> > >>> > and then import them into R storing the information in some memory >>> efficient >>> > data structure (Rle?) that provides me also an efficient way to query >>> the >>> > phastcons score at any position of the human genome. >>> > >>> > all documentation and messages i've been able to retrieve through >>> google and >>> > the BioC list correspond to use cases that involve a small fraction of >>> the >>> > genome which can be handled by 'rtracklayer' with an internet >>> connection. >>> > >>> > any hint on how to achieve this goal will be very much appreciated, >>> >>> I previously did this by downloading the wig files and converting them >>> to bigWig format. To do so, I used the wigToBigWig tool you can >>> eventually get to form here: >>> >>> http://genomewiki.ucsc.edu/index.php/Kent_source_utilities >>> >>> It is unfortunate that UCSC doesn't host the bigWig files, as >>> conversion from wig to bigWig is hugely memory intensive (last I >>> remember, anyway). >>> >>> (Update: not sure when this happened, but rtracklayer actually wraps >>> this functionality in its wigToBigWig function -- nice! -- so you can >>> convert the wig file to bigWig straight from R (assuming you have a >>> machine with enough horsepower)) >>> >>> >> Of potential importance is that the Kent utilities are licensed for >> non-commercial use only. That's mostly what drove the implementation of >> wigToBigWig(). A >> >> >>> After that, you could then use rtracklayer's import functions over the >>> bigWig files to query them using a GRanges object. Look at the >>> Examples section in the ?import.bw man page for some help. >>> >>> >> In the new release there's an argument asRle that if TRUE will >> efficiently return an RleList from the BigWig, instead of a GRanges. >> >> If you want to query the RleList for single positions (in a GRanges) >> and get the result as an atomic vector, it's most efficient to use >> VariantTools::extractCoverageForPositions. That function should probably >> live somewhere more general but anyway, it's a fast path for single >> position RleList=>vector extraction. >> >> >>> HTH, >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Computational Biologist >>> Bioinformatics and Computational Biology >>> Genentech >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6