shift function (GenomicRanges, IRanges), issue with negative start values and circular DNA
1
0
Entering edit mode
@herve-pages-1542
Last seen 59 minutes ago
Seattle, WA, United States
Hi Robert, Sorry that this completely went off my radar but this week someone kindly sent me a reminder. So after re-reading your email I agree that shift() should behave consistently on both ends of the chromosome but I don't like the idea that it automatically trims the ranges for you. I think we should shift without trimming, but with a warning that some ranges are going off limits. Then it would be very easy for the user to trim those ranges if s/he wanted, using trim() (there is currently no "trim" method for GRanges objects but I'll add one). Note that for ranges that are completely off limits (e.g. start=-20, end=-4), it's not clear what trim() should do: (a) fail, (b) produce a 0-width range located on the edge (e.g. start=1, end=0), with or (c) without warning. This is one of the reasons I think this task is better done separately. The other reason to not do the trimming in shift() is to maintain the nice property that shift(shift(x, N), -N) should always be a no- op. When the chromosome is circular, I think we should do the same, except that no warning should be issued when a range goes off limits. Does that sound reasonable? Cheers, H. On 09/01/2011 08:23 AM, Ivanek, Robert wrote: > Hi Michael, > > is there any plan to fixed these issues? > > Best Regards > > Robert > > On 09/01/2011 03:37 PM, Michael Lawrence wrote: >> Nevermind, I misunderstood what you were trying to do. >> >> On Thu, Sep 1, 2011 at 6:36 AM, Michael Lawrence <michafla at="" gene.com="">> <mailto:michafla at="" gene.com="">> wrote: >> >> As a sort of side-comment, you might be better off doing this with >> resize(), i.e., >> >> resize(gr, width(gr) - 50, fix = "end") >> >> Michael >> >> On Thu, Sep 1, 2011 at 5:27 AM, Ivanek, Robert <robert.ivanek at="" fmi.ch="">> <mailto:robert.ivanek at="" fmi.ch="">> wrote: >> >> Dear Bioc developers, >> >> I am analysing the ChIP-Seq data using the GenomicRanges >> package. I am usually shifting the reads to the middle of the >> fragment (i.e. by 50 bp toward 3' end of the fragment according >> the strand of the read), using the shift function (from >> IRanges), which works in most cases. >> >> shift(gr, shift=50 * as.integer(ifelse(strand(gr)==__"-", -1, 1))) >> >> However sometimes the result does not look like as I would expect: >> >> 1. If the read is on the minus strand and the start value is >> smaller than the shift value, it produces an error: >> >> ## Error in validObject(x) : >> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >> negative values >> >> If the read is on the plus strand at the end of the chromosome >> it returns the size of the chromosome as a new end value. Would >> be possible to apply similar strategy for the beginning of the >> chromosome (It would return 1 as a start value)? >> >> >> >> 2. The shift function does not take into account information >> about circularity of the chromosome. Is there any plan to >> include it? >> >> >> Please see below reproducible examples >> >> Best Regards >> >> Robert >> >> -- >> Robert Ivanek >> Postdoctoral Fellow Schuebeler Group >> Friedrich Miescher Institute >> Maulbeerstrasse 66 >> 4058 Basel / Switzerland >> Office phone: +41 61 697 6100 <tel:%2b41%2061%20697%206100> >> >> >> >> >> EXAMPLES: >> >> >> ## >> ##############################__##############################__### ############## >> >> ## >> library("GenomicRanges") >> ## Loading required package: IRanges >> ## Attaching package: ?IRanges? >> ## The following object(s) are masked from ?package:base?: >> ## cbind, eval, intersect, Map, mapply, order, paste, pmax, >> pmax.int <http: pmax.int="">, pmin, pmin.int <http: pmin.int="">, >> rbind, rep.int <http: rep.int="">, setdiff, table, union >> >> ## >> ## >> ##############################__##############################__### ############## >> >> ## >> chr1.gr <http: chr1.gr=""> <- GRanges(seqnames=Rle("chr1", 4), >> ranges=IRanges(start=rep(c(1,__197195432),each=2), width=1), >> strand=rep(c("+","-"),2), >> seqlengths=c(chr1=197195432)) >> ## >> >> isCircularchr1.gr <http: chr1.gr="">) <- FALSE >> ## >> >> chr1.gr <http: chr1.gr=""> >> ## GRanges with 4 ranges and 0 elementMetadata values: >> ## seqnames ranges strand >> ## <rle> <iranges> <rle> >> ## [1] chr1 [ 1, 1] + >> ## [2] chr1 [ 1, 1] - >> ## [3] chr1 [197195432, 197195432] + >> ## [4] chr1 [197195432, 197195432] - >> ## --- >> ## seqlengths: >> ## chr1 >> ## 197195432 >> ## >> >> seqinfochr1.gr <http: chr1.gr="">) >> ## Seqinfo of length 1 >> ## seqnames seqlengths isCircular >> ## chr1 197195432 FALSE >> ## >> >> shiftchr1.gr <http: chr1.gr="">, shift=50) >> ## GRanges with 4 ranges and 0 elementMetadata values: >> ## seqnames ranges strand >> ## <rle> <iranges> <rle> >> ## [1] chr1 [ 51, 51] + >> ## [2] chr1 [ 51, 51] - >> ## [3] chr1 [197195432, 197195432] + >> ## [4] chr1 [197195432, 197195432] - >> ## --- >> ## seqlengths: >> ## chr1 >> ## 197195432 >> ## Warning message: >> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, >> 197195482L, : >> ## trimmed end values to be <= seqlengths >> ## >> >> shiftchr1.gr <http: chr1.gr="">, shift=50 * >> as.integer(ifelse(strand(chr1.__gr <http: chr1.gr="">)=="-", -1, 1))) >> ## Error in validObject(x) : >> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >> negative values >> ## In addition: Warning messages: >> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, >> 197195482L, : >> ## trimmed end values to be <= seqlengths >> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L, >> 197195382L : >> ## trimmed start values to be positive >> ## >> ## >> ##############################__##############################__### ############## >> >> ## >> chrM.gr <- GRanges(seqnames=Rle("chrM", 4), >> ranges=IRanges(start=rep(c(1,__16299),each=2), width=1), >> strand=rep(c("+","-"),2), >> seqlengths=c(chrM=16299)) >> ## >> >> isCircularchrM.gr) <- TRUE >> ## >> >> chrM.gr >> ## GRanges with 4 ranges and 0 elementMetadata values: >> ## seqnames ranges strand >> ## <rle> <iranges> <rle> >> ## [1] chrM [ 1, 1] + >> ## [2] chrM [ 1, 1] - >> ## [3] chrM [16299, 16299] + >> ## [4] chrM [16299, 16299] - >> ## --- >> ## seqlengths: >> ## chrM >> ## 16299 >> ## >> >> seqinfochrM.gr) >> ## Seqinfo of length 1 >> ## seqnames seqlengths isCircular >> ## chrM 16299 TRUE >> ## >> >> shiftchrM.gr, shift=50) >> ## GRanges with 4 ranges and 0 elementMetadata values: >> ## seqnames ranges strand >> ## <rle> <iranges> <rle> >> ## [1] chrM [ 51, 51] + >> ## [2] chrM [ 51, 51] - >> ## [3] chrM [16299, 16299] + >> ## [4] chrM [16299, 16299] - >> ## --- >> ## seqlengths: >> ## chrM >> ## 16299 >> ## Warning message: >> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, >> 16349L, 16349L : >> ## trimmed end values to be <= seqlengths >> ## >> >> shiftchrM.gr, shift=50 * >> as.integer(ifelse(strand(chrM.__gr)=="-", -1, 1))) >> ## Error in validObject(x) : >> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >> negative values >> ## In addition: Warning messages: >> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, >> 16349L, : >> ## trimmed end values to be <= seqlengths >> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) : >> ## trimmed start values to be positive >> ## >> ## >> ## >> ##############################__##############################__### ############## >> >> ## >> sessionInfo() >> ## R Under development (unstable) (2011-08-29 r56824) >> ## Platform: x86_64-unknown-linux-gnu (64-bit) >> >> ## locale: >> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 >> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 >> ## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C >> LC_TELEPHONE=C >> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> ## attached base packages: >> ## [1] stats graphics grDevices utils datasets methods base >> >> ## other attached packages: >> ## [1] GenomicRanges_1.5.31 IRanges_1.11.26 >> ## >> ## >> ##############################__##############################__### ############## >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Cancer Cancer • 1.7k views
ADD COMMENT
0
Entering edit mode
@ivanek-robert-3765
Last seen 10.2 years ago
Hi Herv?, On 01/23/2012 01:04 AM, Hervé Pagès wrote: > Hi Robert, > > Sorry that this completely went off my radar but this week someone > kindly sent me a reminder. > > So after re-reading your email I agree that shift() should behave > consistently on both ends of the chromosome but I don't like the idea > that it automatically trims the ranges for you. I think we should > shift without trimming, but with a warning that some ranges are going > off limits. This sounds reasonable. User can decide what to do with them. > Then it would be very easy for the user to trim those ranges if > s/he wanted, using trim() (there is currently no "trim" method for > GRanges objects but I'll add one). Note that for ranges that are > completely off limits (e.g. start=-20, end=-4), it's not clear > what trim() should do: (a) fail, (b) produce a 0-width range located > on the edge (e.g. start=1, end=0), with or (c) without warning. I think, that the GRanges cannot have negative widths, so I would suggest that they would be trimmed to (start=1,end=1) or (start=seqlength, end=seqlength) by default. > This is one of the reasons I think this task is better done separately. > The other reason to not do the trimming in shift() is to maintain > the nice property that shift(shift(x, N), -N) should always be a no- op. > > When the chromosome is circular, I think we should do the same, except > that no warning should be issued when a range goes off limits. Does the GRanges class allow the negative width on circular chromosomes? Cheers Robert > > Does that sound reasonable? > > Cheers, > H. > > > On 09/01/2011 08:23 AM, Ivanek, Robert wrote: >> Hi Michael, >> >> is there any plan to fixed these issues? >> >> Best Regards >> >> Robert >> >> On 09/01/2011 03:37 PM, Michael Lawrence wrote: >>> Nevermind, I misunderstood what you were trying to do. >>> >>> On Thu, Sep 1, 2011 at 6:36 AM, Michael Lawrence <michafla at="" gene.com="">>> <mailto:michafla at="" gene.com="">> wrote: >>> >>> As a sort of side-comment, you might be better off doing this with >>> resize(), i.e., >>> >>> resize(gr, width(gr) - 50, fix = "end") >>> >>> Michael >>> >>> On Thu, Sep 1, 2011 at 5:27 AM, Ivanek, Robert <robert.ivanek at="" fmi.ch="">>> <mailto:robert.ivanek at="" fmi.ch="">> wrote: >>> >>> Dear Bioc developers, >>> >>> I am analysing the ChIP-Seq data using the GenomicRanges >>> package. I am usually shifting the reads to the middle of the >>> fragment (i.e. by 50 bp toward 3' end of the fragment according >>> the strand of the read), using the shift function (from >>> IRanges), which works in most cases. >>> >>> shift(gr, shift=50 * as.integer(ifelse(strand(gr)==__"-", -1, 1))) >>> >>> However sometimes the result does not look like as I would expect: >>> >>> 1. If the read is on the minus strand and the start value is >>> smaller than the shift value, it produces an error: >>> >>> ## Error in validObject(x) : >>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >>> negative values >>> >>> If the read is on the plus strand at the end of the chromosome >>> it returns the size of the chromosome as a new end value. Would >>> be possible to apply similar strategy for the beginning of the >>> chromosome (It would return 1 as a start value)? >>> >>> >>> >>> 2. The shift function does not take into account information >>> about circularity of the chromosome. Is there any plan to >>> include it? >>> >>> >>> Please see below reproducible examples >>> >>> Best Regards >>> >>> Robert >>> >>> -- >>> Robert Ivanek >>> Postdoctoral Fellow Schuebeler Group >>> Friedrich Miescher Institute >>> Maulbeerstrasse 66 >>> 4058 Basel / Switzerland >>> Office phone: +41 61 697 6100 <tel:%2b41%2061%20697%206100> >>> >>> >>> >>> >>> EXAMPLES: >>> >>> >>> ## >>> ##############################__##############################__## ############### >>> >>> >>> ## >>> library("GenomicRanges") >>> ## Loading required package: IRanges >>> ## Attaching package: ?IRanges? >>> ## The following object(s) are masked from ?package:base?: >>> ## cbind, eval, intersect, Map, mapply, order, paste, pmax, >>> pmax.int <http: pmax.int="">, pmin, pmin.int <http: pmin.int="">, >>> rbind, rep.int <http: rep.int="">, setdiff, table, union >>> >>> ## >>> ## >>> ##############################__##############################__## ############### >>> >>> >>> ## >>> chr1.gr <http: chr1.gr=""> <- GRanges(seqnames=Rle("chr1", 4), >>> ranges=IRanges(start=rep(c(1,__197195432),each=2), width=1), >>> strand=rep(c("+","-"),2), >>> seqlengths=c(chr1=197195432)) >>> ## >>> >>> isCircularchr1.gr <http: chr1.gr="">) <- FALSE >>> ## >>> >>> chr1.gr <http: chr1.gr=""> >>> ## GRanges with 4 ranges and 0 elementMetadata values: >>> ## seqnames ranges strand >>> ## <rle> <iranges> <rle> >>> ## [1] chr1 [ 1, 1] + >>> ## [2] chr1 [ 1, 1] - >>> ## [3] chr1 [197195432, 197195432] + >>> ## [4] chr1 [197195432, 197195432] - >>> ## --- >>> ## seqlengths: >>> ## chr1 >>> ## 197195432 >>> ## >>> >>> seqinfochr1.gr <http: chr1.gr="">) >>> ## Seqinfo of length 1 >>> ## seqnames seqlengths isCircular >>> ## chr1 197195432 FALSE >>> ## >>> >>> shiftchr1.gr <http: chr1.gr="">, shift=50) >>> ## GRanges with 4 ranges and 0 elementMetadata values: >>> ## seqnames ranges strand >>> ## <rle> <iranges> <rle> >>> ## [1] chr1 [ 51, 51] + >>> ## [2] chr1 [ 51, 51] - >>> ## [3] chr1 [197195432, 197195432] + >>> ## [4] chr1 [197195432, 197195432] - >>> ## --- >>> ## seqlengths: >>> ## chr1 >>> ## 197195432 >>> ## Warning message: >>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, >>> 197195482L, : >>> ## trimmed end values to be <= seqlengths >>> ## >>> >>> shiftchr1.gr <http: chr1.gr="">, shift=50 * >>> as.integer(ifelse(strand(chr1.__gr <http: chr1.gr="">)=="-", -1, 1))) >>> ## Error in validObject(x) : >>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >>> negative values >>> ## In addition: Warning messages: >>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, >>> 197195482L, : >>> ## trimmed end values to be <= seqlengths >>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L, >>> 197195382L : >>> ## trimmed start values to be positive >>> ## >>> ## >>> ##############################__##############################__## ############### >>> >>> >>> ## >>> chrM.gr <- GRanges(seqnames=Rle("chrM", 4), >>> ranges=IRanges(start=rep(c(1,__16299),each=2), width=1), >>> strand=rep(c("+","-"),2), >>> seqlengths=c(chrM=16299)) >>> ## >>> >>> isCircularchrM.gr) <- TRUE >>> ## >>> >>> chrM.gr >>> ## GRanges with 4 ranges and 0 elementMetadata values: >>> ## seqnames ranges strand >>> ## <rle> <iranges> <rle> >>> ## [1] chrM [ 1, 1] + >>> ## [2] chrM [ 1, 1] - >>> ## [3] chrM [16299, 16299] + >>> ## [4] chrM [16299, 16299] - >>> ## --- >>> ## seqlengths: >>> ## chrM >>> ## 16299 >>> ## >>> >>> seqinfochrM.gr) >>> ## Seqinfo of length 1 >>> ## seqnames seqlengths isCircular >>> ## chrM 16299 TRUE >>> ## >>> >>> shiftchrM.gr, shift=50) >>> ## GRanges with 4 ranges and 0 elementMetadata values: >>> ## seqnames ranges strand >>> ## <rle> <iranges> <rle> >>> ## [1] chrM [ 51, 51] + >>> ## [2] chrM [ 51, 51] - >>> ## [3] chrM [16299, 16299] + >>> ## [4] chrM [16299, 16299] - >>> ## --- >>> ## seqlengths: >>> ## chrM >>> ## 16299 >>> ## Warning message: >>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, >>> 16349L, 16349L : >>> ## trimmed end values to be <= seqlengths >>> ## >>> >>> shiftchrM.gr, shift=50 * >>> as.integer(ifelse(strand(chrM.__gr)=="-", -1, 1))) >>> ## Error in validObject(x) : >>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >>> negative values >>> ## In addition: Warning messages: >>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, >>> 16349L, : >>> ## trimmed end values to be <= seqlengths >>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) : >>> ## trimmed start values to be positive >>> ## >>> ## >>> ## >>> ##############################__##############################__## ############### >>> >>> >>> ## >>> sessionInfo() >>> ## R Under development (unstable) (2011-08-29 r56824) >>> ## Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> ## locale: >>> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 >>> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 >>> ## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C >>> LC_TELEPHONE=C >>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> ## attached base packages: >>> ## [1] stats graphics grDevices utils datasets methods base >>> >>> ## other attached packages: >>> ## [1] GenomicRanges_1.5.31 IRanges_1.11.26 >>> ## >>> ## >>> ##############################__##############################__## ############### >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >> > > -- Robert Ivanek Postdoctoral Fellow Schuebeler Group Friedrich Miescher Institute Maulbeerstrasse 66 4058 Basel / Switzerland Office phone: +41 61 697 6100
ADD COMMENT
0
Entering edit mode
Hi Robert, On 01/23/2012 08:13 AM, Ivanek, Robert wrote: > Hi Herv?, > > On 01/23/2012 01:04 AM, Hervé Pagès wrote: >> Hi Robert, >> >> Sorry that this completely went off my radar but this week someone >> kindly sent me a reminder. >> >> So after re-reading your email I agree that shift() should behave >> consistently on both ends of the chromosome but I don't like the idea >> that it automatically trims the ranges for you. I think we should >> shift without trimming, but with a warning that some ranges are going >> off limits. > > This sounds reasonable. User can decide what to do with them. > > >> Then it would be very easy for the user to trim those ranges if >> s/he wanted, using trim() (there is currently no "trim" method for >> GRanges objects but I'll add one). Note that for ranges that are >> completely off limits (e.g. start=-20, end=-4), it's not clear >> what trim() should do: (a) fail, (b) produce a 0-width range located >> on the edge (e.g. start=1, end=0), with or (c) without warning. > > I think, that the GRanges cannot have negative widths, so I would > suggest that they would be trimmed to (start=1,end=1) or > (start=seqlength, end=seqlength) by default. The width of a range must be >= 0. The width is always equal to end - start + 1 so for a 0-width range the end is equal to start - 1. This is the only situation where the end < start. Sounds weird but this way of representing 0-width ranges is just following the general rule in the IRanges/GenomicRanges/GenomicFeatures/ShortRead world that integer ranges are presented to the user as closed intervals. So completely off limit ranges could be trimmed to (start=1,end=0) or (start=seqlength+1, end=seqlength). This is actually what restrict() does when used with keep.all.ranges=TRUE: > x <- GRanges("chr1", IRanges(c(-20, 3, 12), c(-4, 15, 15))) > restrict(x, start=1, end=10, keep.all.ranges=TRUE) GRanges with 3 ranges and 0 elementMetadata values: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [ 1, 0] * [2] chr1 [ 3, 10] * [3] chr1 [11, 10] * --- seqlengths: chr1 NA > >> This is one of the reasons I think this task is better done separately. >> The other reason to not do the trimming in shift() is to maintain >> the nice property that shift(shift(x, N), -N) should always be a no-op. >> >> When the chromosome is circular, I think we should do the same, except >> that no warning should be issued when a range goes off limits. > > Does the GRanges class allow the negative width on circular chromosomes? No, the width is always >= 0, even for ranges defined on a circular chromosome. Allowing negative widths would be like opening a can of worms. Thanks for your feedback, H. > > Cheers > > Robert > > >> >> Does that sound reasonable? >> >> Cheers, >> H. >> >> >> On 09/01/2011 08:23 AM, Ivanek, Robert wrote: >>> Hi Michael, >>> >>> is there any plan to fixed these issues? >>> >>> Best Regards >>> >>> Robert >>> >>> On 09/01/2011 03:37 PM, Michael Lawrence wrote: >>>> Nevermind, I misunderstood what you were trying to do. >>>> >>>> On Thu, Sep 1, 2011 at 6:36 AM, Michael Lawrence <michafla at="" gene.com="">>>> <mailto:michafla at="" gene.com="">> wrote: >>>> >>>> As a sort of side-comment, you might be better off doing this with >>>> resize(), i.e., >>>> >>>> resize(gr, width(gr) - 50, fix = "end") >>>> >>>> Michael >>>> >>>> On Thu, Sep 1, 2011 at 5:27 AM, Ivanek, Robert <robert.ivanek at="" fmi.ch="">>>> <mailto:robert.ivanek at="" fmi.ch="">> wrote: >>>> >>>> Dear Bioc developers, >>>> >>>> I am analysing the ChIP-Seq data using the GenomicRanges >>>> package. I am usually shifting the reads to the middle of the >>>> fragment (i.e. by 50 bp toward 3' end of the fragment according >>>> the strand of the read), using the shift function (from >>>> IRanges), which works in most cases. >>>> >>>> shift(gr, shift=50 * as.integer(ifelse(strand(gr)==__"-", -1, 1))) >>>> >>>> However sometimes the result does not look like as I would expect: >>>> >>>> 1. If the read is on the minus strand and the start value is >>>> smaller than the shift value, it produces an error: >>>> >>>> ## Error in validObject(x) : >>>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >>>> negative values >>>> >>>> If the read is on the plus strand at the end of the chromosome >>>> it returns the size of the chromosome as a new end value. Would >>>> be possible to apply similar strategy for the beginning of the >>>> chromosome (It would return 1 as a start value)? >>>> >>>> >>>> >>>> 2. The shift function does not take into account information >>>> about circularity of the chromosome. Is there any plan to >>>> include it? >>>> >>>> >>>> Please see below reproducible examples >>>> >>>> Best Regards >>>> >>>> Robert >>>> >>>> -- >>>> Robert Ivanek >>>> Postdoctoral Fellow Schuebeler Group >>>> Friedrich Miescher Institute >>>> Maulbeerstrasse 66 >>>> 4058 Basel / Switzerland >>>> Office phone: +41 61 697 6100 <tel:%2b41%2061%20697%206100> >>>> >>>> >>>> >>>> >>>> EXAMPLES: >>>> >>>> >>>> ## >>>> ##############################__##############################__# ################ >>>> >>>> >>>> >>>> ## >>>> library("GenomicRanges") >>>> ## Loading required package: IRanges >>>> ## Attaching package: ?IRanges? >>>> ## The following object(s) are masked from ?package:base?: >>>> ## cbind, eval, intersect, Map, mapply, order, paste, pmax, >>>> pmax.int <http: pmax.int="">, pmin, pmin.int <http: pmin.int="">, >>>> rbind, rep.int <http: rep.int="">, setdiff, table, union >>>> >>>> ## >>>> ## >>>> ##############################__##############################__# ################ >>>> >>>> >>>> >>>> ## >>>> chr1.gr <http: chr1.gr=""> <- GRanges(seqnames=Rle("chr1", 4), >>>> ranges=IRanges(start=rep(c(1,__197195432),each=2), width=1), >>>> strand=rep(c("+","-"),2), >>>> seqlengths=c(chr1=197195432)) >>>> ## >>>> >>>> isCircularchr1.gr <http: chr1.gr="">) <- FALSE >>>> ## >>>> >>>> chr1.gr <http: chr1.gr=""> >>>> ## GRanges with 4 ranges and 0 elementMetadata values: >>>> ## seqnames ranges strand >>>> ## <rle> <iranges> <rle> >>>> ## [1] chr1 [ 1, 1] + >>>> ## [2] chr1 [ 1, 1] - >>>> ## [3] chr1 [197195432, 197195432] + >>>> ## [4] chr1 [197195432, 197195432] - >>>> ## --- >>>> ## seqlengths: >>>> ## chr1 >>>> ## 197195432 >>>> ## >>>> >>>> seqinfochr1.gr <http: chr1.gr="">) >>>> ## Seqinfo of length 1 >>>> ## seqnames seqlengths isCircular >>>> ## chr1 197195432 FALSE >>>> ## >>>> >>>> shiftchr1.gr <http: chr1.gr="">, shift=50) >>>> ## GRanges with 4 ranges and 0 elementMetadata values: >>>> ## seqnames ranges strand >>>> ## <rle> <iranges> <rle> >>>> ## [1] chr1 [ 51, 51] + >>>> ## [2] chr1 [ 51, 51] - >>>> ## [3] chr1 [197195432, 197195432] + >>>> ## [4] chr1 [197195432, 197195432] - >>>> ## --- >>>> ## seqlengths: >>>> ## chr1 >>>> ## 197195432 >>>> ## Warning message: >>>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, >>>> 197195482L, : >>>> ## trimmed end values to be <= seqlengths >>>> ## >>>> >>>> shiftchr1.gr <http: chr1.gr="">, shift=50 * >>>> as.integer(ifelse(strand(chr1.__gr <http: chr1.gr="">)=="-", -1, 1))) >>>> ## Error in validObject(x) : >>>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >>>> negative values >>>> ## In addition: Warning messages: >>>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, >>>> 197195482L, : >>>> ## trimmed end values to be <= seqlengths >>>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L, >>>> 197195382L : >>>> ## trimmed start values to be positive >>>> ## >>>> ## >>>> ##############################__##############################__# ################ >>>> >>>> >>>> >>>> ## >>>> chrM.gr <- GRanges(seqnames=Rle("chrM", 4), >>>> ranges=IRanges(start=rep(c(1,__16299),each=2), width=1), >>>> strand=rep(c("+","-"),2), >>>> seqlengths=c(chrM=16299)) >>>> ## >>>> >>>> isCircularchrM.gr) <- TRUE >>>> ## >>>> >>>> chrM.gr >>>> ## GRanges with 4 ranges and 0 elementMetadata values: >>>> ## seqnames ranges strand >>>> ## <rle> <iranges> <rle> >>>> ## [1] chrM [ 1, 1] + >>>> ## [2] chrM [ 1, 1] - >>>> ## [3] chrM [16299, 16299] + >>>> ## [4] chrM [16299, 16299] - >>>> ## --- >>>> ## seqlengths: >>>> ## chrM >>>> ## 16299 >>>> ## >>>> >>>> seqinfochrM.gr) >>>> ## Seqinfo of length 1 >>>> ## seqnames seqlengths isCircular >>>> ## chrM 16299 TRUE >>>> ## >>>> >>>> shiftchrM.gr, shift=50) >>>> ## GRanges with 4 ranges and 0 elementMetadata values: >>>> ## seqnames ranges strand >>>> ## <rle> <iranges> <rle> >>>> ## [1] chrM [ 51, 51] + >>>> ## [2] chrM [ 51, 51] - >>>> ## [3] chrM [16299, 16299] + >>>> ## [4] chrM [16299, 16299] - >>>> ## --- >>>> ## seqlengths: >>>> ## chrM >>>> ## 16299 >>>> ## Warning message: >>>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, >>>> 16349L, 16349L : >>>> ## trimmed end values to be <= seqlengths >>>> ## >>>> >>>> shiftchrM.gr, shift=50 * >>>> as.integer(ifelse(strand(chrM.__gr)=="-", -1, 1))) >>>> ## Error in validObject(x) : >>>> ## invalid class ?IRanges? object: 'widths(x)' cannot contain >>>> negative values >>>> ## In addition: Warning messages: >>>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, >>>> 16349L, : >>>> ## trimmed end values to be <= seqlengths >>>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) : >>>> ## trimmed start values to be positive >>>> ## >>>> ## >>>> ## >>>> ##############################__##############################__# ################ >>>> >>>> >>>> >>>> ## >>>> sessionInfo() >>>> ## R Under development (unstable) (2011-08-29 r56824) >>>> ## Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> ## locale: >>>> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 >>>> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 >>>> ## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C >>>> LC_TELEPHONE=C >>>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> ## attached base packages: >>>> ## [1] stats graphics grDevices utils datasets methods base >>>> >>>> ## other attached packages: >>>> ## [1] GenomicRanges_1.5.31 IRanges_1.11.26 >>>> ## >>>> ## >>>> ##############################__##############################__# ################ >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> >>> >> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6