rtracklayer: small problem
1
0
Entering edit mode
@gustavo-fernandez-bayon-5300
Last seen 8.9 years ago
Spain
Hi everybody. I have managed to spot some strange (at least from a newbie point of view) behaviour in the rtracklayer package. I have set up a small example for this: library(rtracklayer) s <- browserSession() genome(s) <- 'hg19' track <- 'wgEncodeBroadHistone' table.name <- 'wgEncodeBroadHistoneGm12878CtcfStdPk' q <- ucscTableQuery(s, track=track, table=table.name) ex1 <- getTable(q) ex2 <- track(q) ex3 <- track(q, asRangedData=FALSE) Then, I show the contents for the first element of the three result datasets (data.frame, RangedData and GRanges, respectively): > ex1[1,] bin chrom chromStart chromEnd name score strand signalValue pValue qValue 1 3 chr1 150941733 151007265 . 297 . 2.98199 13 -1 > ex2[1,] UCSC track 'wgEncodeBroadHistoneGm12878CtcfStdPk' UCSCData with 1 row and 3 value columns across 93 spaces space ranges | name score strand <factor> <iranges> | <character> <numeric> <factor> 1 chr1 [150941734, 151007265] | NA 297 * > ex3[1] GRanges with 1 range and 2 metadata columns: seqnames ranges strand | name score <rle> <iranges> <rle> | <character> <numeric> [1] chr1 [150941734, 151007265] * | <na> 297 --- seqlengths: chr1 chr2 ... chrUn_gl000249 249250621 243199373 ... 38502 I have noticed that the starting position of the range is one base higher in the ranges-based objects than in the original table. Don't know if this is an error inside the track function() or something I am missing. This mistake occurs for every element, not only for the first one. > all(start(ex3) == ex1$chromStart + 1) [1] TRUE My sessionInfo: > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.18.2 GenomicRanges_1.10.5 IRanges_1.16.4 [4] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] Biostrings_2.26.2 bitops_1.0-5 [3] BSgenome_1.26.1 BSgenome.Hsapiens.UCSC.hg19_1.3.19 [5] parallel_2.15.2 RCurl_1.95-3 [7] Rsamtools_1.10.2 stats4_2.15.2 [9] tcltk_2.15.2 tools_2.15.2 [11] XML_3.95-0.1 zlibbioc_1.4.0 Any hint will be much appreciated. It's not a big problem, but quite interesting. Regards, Gus
BSgenome BSgenome rtracklayer BSgenome BSgenome rtracklayer • 1.1k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
Hi Gustavo, UCSC stores its ranges as 0-based, whereas the IRanges containers use a 1-based representation. This is the reason to use the track() accessor when obtaining ranges. The getTable() accessor simply downloads the table and is completely agnostic to the content. The track() accessor, since it is constructing an IRanges-based object, is smart enough to adjust the ranges. Michael On Wed, Jan 16, 2013 at 3:47 AM, Gustavo Fernández Bayón <gbayon@gmail.com>wrote: > Hi everybody. > > I have managed to spot some strange (at least from a newbie point of view) > behaviour in the rtracklayer package. I have set up a small example for > this: > > library(rtracklayer) > s <- browserSession() > genome(s) <- 'hg19' > track <- 'wgEncodeBroadHistone' > table.name <- '**wgEncodeBroadHistoneGm12878Ctc**fStdPk' > q <- ucscTableQuery(s, track=track, table=table.name) > > ex1 <- getTable(q) > ex2 <- track(q) > ex3 <- track(q, asRangedData=FALSE) > > Then, I show the contents for the first element of the three result > datasets (data.frame, RangedData and GRanges, respectively): > > > ex1[1,] > bin chrom chromStart chromEnd name score strand signalValue pValue > qValue > 1 3 chr1 150941733 151007265 . 297 . 2.98199 13 -1 > > ex2[1,] > UCSC track '**wgEncodeBroadHistoneGm12878Ctc**fStdPk' > UCSCData with 1 row and 3 value columns across 93 spaces > space ranges | name score strand > <factor> <iranges> | <character> <numeric> <factor> > 1 chr1 [150941734, 151007265] | NA 297 * > > ex3[1] > GRanges with 1 range and 2 metadata columns: > seqnames ranges strand | name score > <rle> <iranges> <rle> | <character> <numeric> > [1] chr1 [150941734, 151007265] * | <na> 297 > --- > seqlengths: > chr1 chr2 ... chrUn_gl000249 > 249250621 243199373 ... 38502 > > I have noticed that the starting position of the range is one base higher > in the ranges-based objects than in the original table. Don't know if this > is an error inside the track function() or something I am missing. This > mistake occurs for every element, not only for the first one. > > > all(start(ex3) == ex1$chromStart + 1) > [1] TRUE > > My sessionInfo: > > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C > [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 > [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.18.2 GenomicRanges_1.10.5 IRanges_1.16.4 > [4] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] Biostrings_2.26.2 bitops_1.0-5 > [3] BSgenome_1.26.1 BSgenome.Hsapiens.UCSC.hg19_1.**3.19 > [5] parallel_2.15.2 RCurl_1.95-3 > [7] Rsamtools_1.10.2 stats4_2.15.2 > [9] tcltk_2.15.2 tools_2.15.2 > [11] XML_3.95-0.1 zlibbioc_1.4.0 > > Any hint will be much appreciated. It's not a big problem, but quite > interesting. > > Regards, > Gus > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Michael. Thank you very much. Now it makes perfect sense. I was using getTable(), but I think it will be better to change and keep consistency across all of my code. Thank you again. Regards, Gus On 16/01/13 13:13, Michael Lawrence wrote: > Hi Gustavo, > > UCSC stores its ranges as 0-based, whereas the IRanges containers use > a 1-based representation. This is the reason to use the track() > accessor when obtaining ranges. The getTable() accessor simply > downloads the table and is completely agnostic to the content. The > track() accessor, since it is constructing an IRanges-based object, is > smart enough to adjust the ranges. > > Michael > > > On Wed, Jan 16, 2013 at 3:47 AM, Gustavo Fernández Bayón > <gbayon@gmail.com <mailto:gbayon@gmail.com="">> wrote: > > Hi everybody. > > I have managed to spot some strange (at least from a newbie point > of view) behaviour in the rtracklayer package. I have set up a > small example for this: > > library(rtracklayer) > s <- browserSession() > genome(s) <- 'hg19' > track <- 'wgEncodeBroadHistone' > table.name <http: table.name=""> <- > 'wgEncodeBroadHistoneGm12878CtcfStdPk' > q <- ucscTableQuery(s, track=track, table=table.name > <http: table.name="">) > > ex1 <- getTable(q) > ex2 <- track(q) > ex3 <- track(q, asRangedData=FALSE) > > Then, I show the contents for the first element of the three > result datasets (data.frame, RangedData and GRanges, respectively): > > > ex1[1,] > bin chrom chromStart chromEnd name score strand signalValue > pValue qValue > 1 3 chr1 150941733 151007265 . 297 . 2.98199 13 -1 > > ex2[1,] > UCSC track 'wgEncodeBroadHistoneGm12878CtcfStdPk' > UCSCData with 1 row and 3 value columns across 93 spaces > space ranges | name score strand > <factor> <iranges> | <character> <numeric> <factor> > 1 chr1 [150941734, 151007265] | NA 297 * > > ex3[1] > GRanges with 1 range and 2 metadata columns: > seqnames ranges strand | name score > <rle> <iranges> <rle> | <character> <numeric> > [1] chr1 [150941734, 151007265] * | <na> 297 > --- > seqlengths: > chr1 chr2 ... chrUn_gl000249 > 249250621 243199373 ... 38502 > > I have noticed that the starting position of the range is one base > higher in the ranges-based objects than in the original table. > Don't know if this is an error inside the track function() or > something I am missing. This mistake occurs for every element, not > only for the first one. > > > all(start(ex3) == ex1$chromStart + 1) > [1] TRUE > > My sessionInfo: > > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C > [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 > [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.18.2 GenomicRanges_1.10.5 IRanges_1.16.4 > [4] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] Biostrings_2.26.2 bitops_1.0-5 > [3] BSgenome_1.26.1 BSgenome.Hsapiens.UCSC.hg19_1.3.19 > [5] parallel_2.15.2 RCurl_1.95-3 > [7] Rsamtools_1.10.2 stats4_2.15.2 > [9] tcltk_2.15.2 tools_2.15.2 > [11] XML_3.95-0.1 zlibbioc_1.4.0 > > Any hint will be much appreciated. It's not a big problem, but > quite interesting. > > Regards, > Gus > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6