Entering edit mode
Hi Robert,
Sorry that this completely went off my radar but this week someone
kindly sent me a reminder.
So after re-reading your email I agree that shift() should behave
consistently on both ends of the chromosome but I don't like the idea
that it automatically trims the ranges for you. I think we should
shift without trimming, but with a warning that some ranges are going
off limits.
Then it would be very easy for the user to trim those ranges if
s/he wanted, using trim() (there is currently no "trim" method for
GRanges objects but I'll add one). Note that for ranges that are
completely off limits (e.g. start=-20, end=-4), it's not clear
what trim() should do: (a) fail, (b) produce a 0-width range located
on the edge (e.g. start=1, end=0), with or (c) without warning.
This is one of the reasons I think this task is better done
separately.
The other reason to not do the trimming in shift() is to maintain
the nice property that shift(shift(x, N), -N) should always be a no-
op.
When the chromosome is circular, I think we should do the same, except
that no warning should be issued when a range goes off limits.
Does that sound reasonable?
Cheers,
H.
On 09/01/2011 08:23 AM, Ivanek, Robert wrote:
> Hi Michael,
>
> is there any plan to fixed these issues?
>
> Best Regards
>
> Robert
>
> On 09/01/2011 03:37 PM, Michael Lawrence wrote:
>> Nevermind, I misunderstood what you were trying to do.
>>
>> On Thu, Sep 1, 2011 at 6:36 AM, Michael Lawrence <michafla at="" gene.com="">> <mailto:michafla at="" gene.com="">> wrote:
>>
>> As a sort of side-comment, you might be better off doing this with
>> resize(), i.e.,
>>
>> resize(gr, width(gr) - 50, fix = "end")
>>
>> Michael
>>
>> On Thu, Sep 1, 2011 at 5:27 AM, Ivanek, Robert <robert.ivanek at="" fmi.ch="">> <mailto:robert.ivanek at="" fmi.ch="">> wrote:
>>
>> Dear Bioc developers,
>>
>> I am analysing the ChIP-Seq data using the GenomicRanges
>> package. I am usually shifting the reads to the middle of the
>> fragment (i.e. by 50 bp toward 3' end of the fragment according
>> the strand of the read), using the shift function (from
>> IRanges), which works in most cases.
>>
>> shift(gr, shift=50 * as.integer(ifelse(strand(gr)==__"-", -1, 1)))
>>
>> However sometimes the result does not look like as I would expect:
>>
>> 1. If the read is on the minus strand and the start value is
>> smaller than the shift value, it produces an error:
>>
>> ## Error in validObject(x) :
>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain
>> negative values
>>
>> If the read is on the plus strand at the end of the chromosome
>> it returns the size of the chromosome as a new end value. Would
>> be possible to apply similar strategy for the beginning of the
>> chromosome (It would return 1 as a start value)?
>>
>>
>>
>> 2. The shift function does not take into account information
>> about circularity of the chromosome. Is there any plan to
>> include it?
>>
>>
>> Please see below reproducible examples
>>
>> Best Regards
>>
>> Robert
>>
>> --
>> Robert Ivanek
>> Postdoctoral Fellow Schuebeler Group
>> Friedrich Miescher Institute
>> Maulbeerstrasse 66
>> 4058 Basel / Switzerland
>> Office phone: +41 61 697 6100 <tel:%2b41%2061%20697%206100>
>>
>>
>>
>>
>> EXAMPLES:
>>
>>
>> ##
>> ##############################__##############################__###
##############
>>
>> ##
>> library("GenomicRanges")
>> ## Loading required package: IRanges
>> ## Attaching package: ?IRanges?
>> ## The following object(s) are masked from ?package:base?:
>> ## cbind, eval, intersect, Map, mapply, order, paste, pmax,
>> pmax.int <http: pmax.int="">, pmin, pmin.int <http: pmin.int="">,
>> rbind, rep.int <http: rep.int="">, setdiff, table, union
>>
>> ##
>> ##
>> ##############################__##############################__###
##############
>>
>> ##
>> chr1.gr <http: chr1.gr=""> <- GRanges(seqnames=Rle("chr1", 4),
>> ranges=IRanges(start=rep(c(1,__197195432),each=2), width=1),
>> strand=rep(c("+","-"),2),
>> seqlengths=c(chr1=197195432))
>> ##
>>
>> isCircularchr1.gr <http: chr1.gr="">) <- FALSE
>> ##
>>
>> chr1.gr <http: chr1.gr="">
>> ## GRanges with 4 ranges and 0 elementMetadata values:
>> ## seqnames ranges strand
>> ## <rle> <iranges> <rle>
>> ## [1] chr1 [ 1, 1] +
>> ## [2] chr1 [ 1, 1] -
>> ## [3] chr1 [197195432, 197195432] +
>> ## [4] chr1 [197195432, 197195432] -
>> ## ---
>> ## seqlengths:
>> ## chr1
>> ## 197195432
>> ##
>>
>> seqinfochr1.gr <http: chr1.gr="">)
>> ## Seqinfo of length 1
>> ## seqnames seqlengths isCircular
>> ## chr1 197195432 FALSE
>> ##
>>
>> shiftchr1.gr <http: chr1.gr="">, shift=50)
>> ## GRanges with 4 ranges and 0 elementMetadata values:
>> ## seqnames ranges strand
>> ## <rle> <iranges> <rle>
>> ## [1] chr1 [ 51, 51] +
>> ## [2] chr1 [ 51, 51] -
>> ## [3] chr1 [197195432, 197195432] +
>> ## [4] chr1 [197195432, 197195432] -
>> ## ---
>> ## seqlengths:
>> ## chr1
>> ## 197195432
>> ## Warning message:
>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L,
>> 197195482L, :
>> ## trimmed end values to be <= seqlengths
>> ##
>>
>> shiftchr1.gr <http: chr1.gr="">, shift=50 *
>> as.integer(ifelse(strand(chr1.__gr <http: chr1.gr="">)=="-", -1, 1)))
>> ## Error in validObject(x) :
>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain
>> negative values
>> ## In addition: Warning messages:
>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L,
>> 197195482L, :
>> ## trimmed end values to be <= seqlengths
>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L,
>> 197195382L :
>> ## trimmed start values to be positive
>> ##
>> ##
>> ##############################__##############################__###
##############
>>
>> ##
>> chrM.gr <- GRanges(seqnames=Rle("chrM", 4),
>> ranges=IRanges(start=rep(c(1,__16299),each=2), width=1),
>> strand=rep(c("+","-"),2),
>> seqlengths=c(chrM=16299))
>> ##
>>
>> isCircularchrM.gr) <- TRUE
>> ##
>>
>> chrM.gr
>> ## GRanges with 4 ranges and 0 elementMetadata values:
>> ## seqnames ranges strand
>> ## <rle> <iranges> <rle>
>> ## [1] chrM [ 1, 1] +
>> ## [2] chrM [ 1, 1] -
>> ## [3] chrM [16299, 16299] +
>> ## [4] chrM [16299, 16299] -
>> ## ---
>> ## seqlengths:
>> ## chrM
>> ## 16299
>> ##
>>
>> seqinfochrM.gr)
>> ## Seqinfo of length 1
>> ## seqnames seqlengths isCircular
>> ## chrM 16299 TRUE
>> ##
>>
>> shiftchrM.gr, shift=50)
>> ## GRanges with 4 ranges and 0 elementMetadata values:
>> ## seqnames ranges strand
>> ## <rle> <iranges> <rle>
>> ## [1] chrM [ 51, 51] +
>> ## [2] chrM [ 51, 51] -
>> ## [3] chrM [16299, 16299] +
>> ## [4] chrM [16299, 16299] -
>> ## ---
>> ## seqlengths:
>> ## chrM
>> ## 16299
>> ## Warning message:
>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L,
>> 16349L, 16349L :
>> ## trimmed end values to be <= seqlengths
>> ##
>>
>> shiftchrM.gr, shift=50 *
>> as.integer(ifelse(strand(chrM.__gr)=="-", -1, 1)))
>> ## Error in validObject(x) :
>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain
>> negative values
>> ## In addition: Warning messages:
>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L,
>> 16349L, :
>> ## trimmed end values to be <= seqlengths
>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) :
>> ## trimmed start values to be positive
>> ##
>> ##
>> ##
>> ##############################__##############################__###
##############
>>
>> ##
>> sessionInfo()
>> ## R Under development (unstable) (2011-08-29 r56824)
>> ## Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> ## locale:
>> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
>> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
>> ## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C
>> LC_TELEPHONE=C
>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> ## attached base packages:
>> ## [1] stats graphics grDevices utils datasets methods base
>>
>> ## other attached packages:
>> ## [1] GenomicRanges_1.5.31 IRanges_1.11.26
>> ##
>> ##
>> ##############################__##############################__###
##############
>>
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319