GRanges - reduce() function
3
0
Entering edit mode
Fahim Md ▴ 250
@fahim-md-4018
Last seen 9.6 years ago
Hi In the following example, I am trying to use 'reduce()' function to reduce the genomic intervals and find intervals corresponding to 'Gene'. It is behaving as it is desired. However the output of the reduce just give the intervals (I am losing the 'Gene' metadata). Is there a way to retain 'Gene' metadata. > gr <- GRanges(+ seqnames = Rle(c("chr1", "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + transcript = head(letters, 10),+ Gene = c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + )> grGRanges with 10 ranges and 2 elementMetadata values: seqnames ranges strand | transcript Gene <rle> <iranges> <rle> | <character> <character> [1] chr1 [ 1, 7] - | a xxx [2] chr1 [ 2, 8] - | b xxx [3] chr1 [ 3, 9] - | c xxx [4] chr1 [ 4, 10] + | d yyy [5] chr1 [ 5, 11] + | e yyy [6] chr1 [ 6, 12] + | f yyy [7] chr2 [ 1, 5] - | g zzz [8] chr2 [ 2, 9] - | h zzz [9] chr2 [10, 18] * | i www [10] chr2 [11, 19] * | j www --- seqlengths: chr1 chr2 NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 elementMetadata values: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [ 4, 12] + [2] chr1 [ 1, 9] - [3] chr2 [ 1, 9] - [4] chr2 [10, 19] * --- seqlengths: chr1 chr2 NA NA > Thanks Fahim -- ------------------------------- Fahim Mohammad Bioinforformatics Lab University of Louisville Louisville, KY, USA [[alternative HTML version deleted]]
• 11k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 weeks ago
United States
On Thu, Nov 17, 2011 at 6:28 PM, Fahim Mohammad <fahim.md@gmail.com> wrote: > Hi > In the following example, I am trying to use 'reduce()' function to reduce > the genomic intervals and find intervals corresponding to 'Gene'. It is > behaving as it is desired. However the output of the reduce just give the > intervals (I am losing the 'Gene' metadata). > Is there a way to retain 'Gene' metadata. > > > > gr <- GRanges(+ seqnames = Rle(c("chr1", > "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, > 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = > Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + > transcript = head(letters, 10),+ Gene = > c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + > )> grGRanges with 10 ranges and 2 elementMetadata values: > seqnames ranges strand | transcript Gene > <rle> <iranges> <rle> | <character> <character> > [1] chr1 [ 1, 7] - | a xxx > [2] chr1 [ 2, 8] - | b xxx > [3] chr1 [ 3, 9] - | c xxx > [4] chr1 [ 4, 10] + | d yyy > [5] chr1 [ 5, 11] + | e yyy > [6] chr1 [ 6, 12] + | f yyy > [7] chr2 [ 1, 5] - | g zzz > [8] chr2 [ 2, 9] - | h zzz > [9] chr2 [10, 18] * | i www > [10] chr2 [11, 19] * | j www > --- > seqlengths: > chr1 chr2 > NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 > elementMetadata values: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [ 4, 12] + > [2] chr1 [ 1, 9] - > [3] chr2 [ 1, 9] - > [4] chr2 [10, 19] * > --- > seqlengths: > chr1 chr2 > NA NA > This doesn't do exactly what you seek, but it gives a mangled names attributed that you can unmangle and reattach as metadata if you like unlist(reduce(split(gr, elementMetadata(gr)$Gene))) > > > > > > Thanks > > > Fahim > > -- > ------------------------------- > Fahim Mohammad > Bioinforformatics Lab > University of Louisville > Louisville, KY, USA > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thanks a lot Vincent, Your suggestion helped me a lot, but the problem of mangled names and unsorted ranges remained. Also the 'unlist' function add sufiix in the name. > unlist(reduce(split(gr, elementMetadata(gr)$Gene)))GRanges with 4 ranges and 0 elementMetadata values: seqnames ranges strand <rle> <iranges> <rle> www.1 chr2 [10, 19] * xxx.2 chr1 [ 1, 9] - yyy.3 chr1 [ 4, 12] + zzz.4 chr2 [ 1, 9] - --- seqlengths: chr1 chr2 NA NA So I ended up calling GRanges function twice. I know that this approach may not be the best and there may be other efficient workaround. Thanks again for your help. Here is what I did. > gr <- GRanges(+ seqnames = Rle(c("chr1", "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + transcript = head(letters, 10),+ Gene = c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + )> grGRanges with 10 ranges and 2 elementMetadata values: seqnames ranges strand | transcript Gene <rle> <iranges> <rle> | <character> <character> [1] chr1 [ 1, 7] - | a xxx [2] chr1 [ 2, 8] - | b xxx [3] chr1 [ 3, 9] - | c xxx [4] chr1 [ 4, 10] + | d yyy [5] chr1 [ 5, 11] + | e yyy [6] chr1 [ 6, 12] + | f yyy [7] chr2 [ 1, 5] - | g zzz [8] chr2 [ 2, 9] - | h zzz [9] chr2 [10, 18] * | i www [10] chr2 [11, 19] * | j www --- seqlengths: chr1 chr2 NA NA> reducedRanges = reduce(split(gr, elementMetadata(gr)$Gene)) #reduce the ranges according to 'Gene' # Convert the rr (reduced ranges) to data frame > tmp = as.data.frame(reducedRanges)> tmp element seqnames start end width strand 1 www chr2 10 19 10 * 2 xxx chr1 1 9 9 - 3 yyy chr1 4 12 9 + 4 zzz chr2 1 9 9 -> rr = tmp[order(tmp[,'seqnames'], tmp[,'start']),]; #sort the data frame first on 'seqname' and 'start' location> rr element seqnames start end width strand 2 xxx chr1 1 9 9 - 3 yyy chr1 4 12 9 + 4 zzz chr2 1 9 9 - 1 www chr2 10 19 10 * # And now convert the data frame again into GRanges object. > grFinal <- GRanges(seqnames = Rle(rr[,'seqnames']),+ ranges = IRanges (start =rr[,'start'], end = rr[,'end'] ),+ strand = Rle(rr[,'strand']),+ Gene = rr[,'element']+ )> grFinalGRanges with 4 ranges and 1 elementMetadata value: seqnames ranges strand | Gene <rle> <iranges> <rle> | <character> [1] chr1 [ 1, 9] - | xxx [2] chr1 [ 4, 12] + | yyy [3] chr2 [ 1, 9] - | zzz [4] chr2 [10, 19] * | www --- seqlengths: chr1 chr2 NA NA Regards. Fahim On Thu, Nov 17, 2011 at 9:13 PM, Vincent Carey <stvjc@channing.harvard.edu>wrote: > > > On Thu, Nov 17, 2011 at 6:28 PM, Fahim Mohammad <fahim.md@gmail.com>wrote: > >> Hi >> In the following example, I am trying to use 'reduce()' function to reduce >> the genomic intervals and find intervals corresponding to 'Gene'. It is >> behaving as it is desired. However the output of the reduce just give the >> intervals (I am losing the 'Gene' metadata). >> Is there a way to retain 'Gene' metadata. >> >> >> > gr <- GRanges(+ seqnames = Rle(c("chr1", >> "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, >> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = >> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >> transcript = head(letters, 10),+ Gene = >> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >> )> grGRanges with 10 ranges and 2 elementMetadata values: >> >> seqnames ranges strand | transcript Gene >> <rle> <iranges> <rle> | <character> <character> >> [1] chr1 [ 1, 7] - | a xxx >> [2] chr1 [ 2, 8] - | b xxx >> [3] chr1 [ 3, 9] - | c xxx >> [4] chr1 [ 4, 10] + | d yyy >> [5] chr1 [ 5, 11] + | e yyy >> [6] chr1 [ 6, 12] + | f yyy >> [7] chr2 [ 1, 5] - | g zzz >> [8] chr2 [ 2, 9] - | h zzz >> [9] chr2 [10, 18] * | i www >> [10] chr2 [11, 19] * | j www >> --- >> seqlengths: >> chr1 chr2 >> NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 >> >> elementMetadata values: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [ 4, 12] + >> [2] chr1 [ 1, 9] - >> [3] chr2 [ 1, 9] - >> [4] chr2 [10, 19] * >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> > > This doesn't do exactly what you seek, but it gives a mangled names > attributed that you can unmangle > and reattach as metadata if you like > > unlist(reduce(split(gr, elementMetadata(gr)$Gene))) > > >> >> > >> >> >> Thanks >> >> >> Fahim >> >> -- >> ------------------------------- >> Fahim Mohammad >> Bioinforformatics Lab >> University of Louisville >> Louisville, KY, USA >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- ------------------------------- Fahim Mohammad Bioinforformatics Lab University of Louisville Louisville, KY, USA Ph: +1-502-409-1167 web: http://bioinformatics.louisville.edu/lab/ ------------------------------- I think the reason people are dealing with science less well now than 50 years ago is that it has become so complicated. --James D. Watson<http: www.brainyquote.com="" quotes="" quotes="" j="" jamesdwat="" 388959.html=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Fahim and others, On 11-11-18 07:45 AM, Fahim Mohammad wrote: > Thanks a lot Vincent, > Your suggestion helped me a lot, but the problem of mangled names and > unsorted ranges remained. Also the 'unlist' function add sufiix in the > name. > >> unlist(reduce(split(gr, elementMetadata(gr)$Gene)))GRanges with 4 ranges and 0 elementMetadata values: > seqnames ranges strand > <rle> <iranges> <rle> > www.1 chr2 [10, 19] * > xxx.2 chr1 [ 1, 9] - > yyy.3 chr1 [ 4, 12] + > zzz.4 chr2 [ 1, 9] - > --- > seqlengths: > chr1 chr2 > NA NA Yes this mangling is definitely not helping. We have control over the "unlist" method for GRangesList objects so we could change that if nobody objects. I think it was originally implemented that way for 2 reasons: (1) to mimic what base::unlist() does (even though most people don't like it); (2) also until recently the names of a GRanges object had to be unique. Now that the names of a GRanges object don't have to be unique anymore (since BioC 2.9), maybe we could change the behaviour of unlist(). Does anybody object? > > So I ended up calling GRanges function twice. I know that this approach may > not be the best and there may be other efficient workaround. > Thanks again for your help. > Here is what I did. > > >> gr<- GRanges(+ seqnames = Rle(c("chr1", "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + transcript = head(letters, 10),+ Gene = c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + )> grGRanges with 10 ranges and 2 elementMetadata values: > seqnames ranges strand | transcript Gene > <rle> <iranges> <rle> |<character> <character> > [1] chr1 [ 1, 7] - | a xxx > [2] chr1 [ 2, 8] - | b xxx > [3] chr1 [ 3, 9] - | c xxx > [4] chr1 [ 4, 10] + | d yyy > [5] chr1 [ 5, 11] + | e yyy > [6] chr1 [ 6, 12] + | f yyy > [7] chr2 [ 1, 5] - | g zzz > [8] chr2 [ 2, 9] - | h zzz > [9] chr2 [10, 18] * | i www > [10] chr2 [11, 19] * | j www > --- > seqlengths: > chr1 chr2 > NA NA> reducedRanges = reduce(split(gr, > elementMetadata(gr)$Gene)) #reduce the ranges according to 'Gene' > > # Convert the rr (reduced ranges) to data frame > > >> tmp = as.data.frame(reducedRanges)> tmp element seqnames start end width strand > 1 www chr2 10 19 10 * > 2 xxx chr1 1 9 9 - > 3 yyy chr1 4 12 9 + > 4 zzz chr2 1 9 9 -> rr = > tmp[order(tmp[,'seqnames'], tmp[,'start']),]; #sort the data frame > first on 'seqname' and 'start' location> rr element seqnames start > end width strand > 2 xxx chr1 1 9 9 - > 3 yyy chr1 4 12 9 + > 4 zzz chr2 1 9 9 - > 1 www chr2 10 19 10 * > > # And now convert the data frame again into GRanges object. > > >> grFinal<- GRanges(seqnames = Rle(rr[,'seqnames']),+ ranges = IRanges (start =rr[,'start'], end = rr[,'end'] ),+ strand = Rle(rr[,'strand']),+ Gene = rr[,'element']+ )> grFinalGRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> |<character> > [1] chr1 [ 1, 9] - | xxx > [2] chr1 [ 4, 12] + | yyy > [3] chr2 [ 1, 9] - | zzz > [4] chr2 [10, 19] * | www > --- > seqlengths: > chr1 chr2 > NA NA Alternatively you could do: grl <- reduce(split(gr, elementMetadata(gr)$Gene)) reducedRanges <- unlist(grl, use.names=FALSE) elementMetadata(reducedRanges)$Gene <- rep(names(grl), elementLengths(grl)) > reducedRanges GRanges with 4 ranges and 1 elementMetadata value: seqnames ranges strand | Gene <rle> <iranges> <rle> | <character> [1] chr2 [10, 19] * | www [2] chr1 [ 1, 9] - | xxx [3] chr1 [ 4, 12] + | yyy [4] chr2 [ 1, 9] - | zzz --- seqlengths: chr1 chr2 NA NA Then if you want to restore the original order: oo <- order(match(elementMetadata(reducedRanges)$Gene, unique(elementMetadata(gr)$Gene))) grFinal <- reducedRanges[oo] > grFinal GRanges with 4 ranges and 1 elementMetadata value: seqnames ranges strand | Gene <rle> <iranges> <rle> | <character> [1] chr1 [ 1, 9] - | xxx [2] chr1 [ 4, 12] + | yyy [3] chr2 [ 1, 9] - | zzz [4] chr2 [10, 19] * | www --- seqlengths: chr1 chr2 NA NA Or if you want to order by seqnames and start: oo2 <- order(as.vector(seqnames(reducedRanges)), start(reducedRanges)) > reducedRanges[oo2] GRanges with 4 ranges and 1 elementMetadata value: seqnames ranges strand | Gene <rle> <iranges> <rle> | <character> [1] chr1 [ 1, 9] - | xxx [2] chr1 [ 4, 12] + | yyy [3] chr2 [ 1, 9] - | zzz [4] chr2 [10, 19] * | www --- seqlengths: chr1 chr2 NA NA which in your case happens to give the same result. Cheers, H. > > > > Regards. > Fahim > > On Thu, Nov 17, 2011 at 9:13 PM, Vincent Carey > <stvjc at="" channing.harvard.edu="">wrote: > >> >> >> On Thu, Nov 17, 2011 at 6:28 PM, Fahim Mohammad<fahim.md at="" gmail.com="">wrote: >> >>> Hi >>> In the following example, I am trying to use 'reduce()' function to reduce >>> the genomic intervals and find intervals corresponding to 'Gene'. It is >>> behaving as it is desired. However the output of the reduce just give the >>> intervals (I am losing the 'Gene' metadata). >>> Is there a way to retain 'Gene' metadata. >>> >>> >>>> gr<- GRanges(+ seqnames = Rle(c("chr1", >>> "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, >>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = >>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >>> transcript = head(letters, 10),+ Gene = >>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >>> )> grGRanges with 10 ranges and 2 elementMetadata values: >>> >>> seqnames ranges strand | transcript Gene >>> <rle> <iranges> <rle> |<character> <character> >>> [1] chr1 [ 1, 7] - | a xxx >>> [2] chr1 [ 2, 8] - | b xxx >>> [3] chr1 [ 3, 9] - | c xxx >>> [4] chr1 [ 4, 10] + | d yyy >>> [5] chr1 [ 5, 11] + | e yyy >>> [6] chr1 [ 6, 12] + | f yyy >>> [7] chr2 [ 1, 5] - | g zzz >>> [8] chr2 [ 2, 9] - | h zzz >>> [9] chr2 [10, 18] * | i www >>> [10] chr2 [11, 19] * | j www >>> --- >>> seqlengths: >>> chr1 chr2 >>> NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 >>> >>> elementMetadata values: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] chr1 [ 4, 12] + >>> [2] chr1 [ 1, 9] - >>> [3] chr2 [ 1, 9] - >>> [4] chr2 [10, 19] * >>> --- >>> seqlengths: >>> chr1 chr2 >>> NA NA >>> >> >> This doesn't do exactly what you seek, but it gives a mangled names >> attributed that you can unmangle >> and reattach as metadata if you like >> >> unlist(reduce(split(gr, elementMetadata(gr)$Gene))) >> >> >>> >>>> >>> >>> >>> Thanks >>> >>> >>> Fahim >>> >>> -- >>> ------------------------------- >>> Fahim Mohammad >>> Bioinforformatics Lab >>> University of Louisville >>> Louisville, KY, USA >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Herve, re: does anyone object To be honest, I've been skimming. Are you proposing that the names of the unlisted GRanges don't get suffixed so as to be unique. If so, then, no objections here. Thanks! ~Malcolm > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- > bounces at r-project.org] On Behalf Of Hervé Pagès > Sent: Friday, November 18, 2011 1:38 PM > To: Fahim Mohammad > Cc: Vincent Carey; Bioconductor Newsgroup > Subject: Re: [BioC] GRanges - reduce() function > > Hi Fahim and others, > > On 11-11-18 07:45 AM, Fahim Mohammad wrote: > > Thanks a lot Vincent, > > Your suggestion helped me a lot, but the problem of mangled names and > > unsorted ranges remained. Also the 'unlist' function add sufiix in the > > name. > > > >> unlist(reduce(split(gr, elementMetadata(gr)$Gene)))GRanges with 4 > ranges and 0 elementMetadata values: > > seqnames ranges strand > > <rle> <iranges> <rle> > > www.1 chr2 [10, 19] * > > xxx.2 chr1 [ 1, 9] - > > yyy.3 chr1 [ 4, 12] + > > zzz.4 chr2 [ 1, 9] - > > --- > > seqlengths: > > chr1 chr2 > > NA NA > > Yes this mangling is definitely not helping. We have control over the > "unlist" method for GRangesList objects so we could change that if > nobody objects. I think it was originally implemented that way for 2 > reasons: > > (1) to mimic what base::unlist() does (even though most people > don't like it); > > (2) also until recently the names of a GRanges object had to be > unique. > > Now that the names of a GRanges object don't have to be unique anymore > (since BioC 2.9), maybe we could change the behaviour of unlist(). > Does anybody object? > > > > > So I ended up calling GRanges function twice. I know that this approach > may > > not be the best and there may be other efficient workaround. > > Thanks again for your help. > > Here is what I did. > > > > > >> gr<- GRanges(+ seqnames = Rle(c("chr1", "chr2", "chr2"), > c(6, 2,2)), + ranges = IRanges(c(1:6, 1:2, 10:11) , end = c(7:12, 5,9, > 18:19 )), + strand = Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + > transcript = head(letters, 10),+ Gene = c(rep("xxx",3), > rep("yyy",3), rep("zzz",2), rep("www",2)) + )> grGRanges with 10 > ranges and 2 elementMetadata values: > > seqnames ranges strand | transcript Gene > > <rle> <iranges> <rle> |<character> <character> > > [1] chr1 [ 1, 7] - | a xxx > > [2] chr1 [ 2, 8] - | b xxx > > [3] chr1 [ 3, 9] - | c xxx > > [4] chr1 [ 4, 10] + | d yyy > > [5] chr1 [ 5, 11] + | e yyy > > [6] chr1 [ 6, 12] + | f yyy > > [7] chr2 [ 1, 5] - | g zzz > > [8] chr2 [ 2, 9] - | h zzz > > [9] chr2 [10, 18] * | i www > > [10] chr2 [11, 19] * | j www > > --- > > seqlengths: > > chr1 chr2 > > NA NA> reducedRanges = reduce(split(gr, > > elementMetadata(gr)$Gene)) #reduce the ranges according to 'Gene' > > > > # Convert the rr (reduced ranges) to data frame > > > > > >> tmp = as.data.frame(reducedRanges)> tmp element seqnames start > end width strand > > 1 www chr2 10 19 10 * > > 2 xxx chr1 1 9 9 - > > 3 yyy chr1 4 12 9 + > > 4 zzz chr2 1 9 9 -> rr = > > tmp[order(tmp[,'seqnames'], tmp[,'start']),]; #sort the data frame > > first on 'seqname' and 'start' location> rr element seqnames start > > end width strand > > 2 xxx chr1 1 9 9 - > > 3 yyy chr1 4 12 9 + > > 4 zzz chr2 1 9 9 - > > 1 www chr2 10 19 10 * > > > > # And now convert the data frame again into GRanges object. > > > > > >> grFinal<- GRanges(seqnames = Rle(rr[,'seqnames']),+ > ranges = IRanges (start =rr[,'start'], end = rr[,'end'] ),+ strand = > Rle(rr[,'strand']),+ Gene = rr[,'element']+ )> > grFinalGRanges with 4 ranges and 1 elementMetadata value: > > seqnames ranges strand | Gene > > <rle> <iranges> <rle> |<character> > > [1] chr1 [ 1, 9] - | xxx > > [2] chr1 [ 4, 12] + | yyy > > [3] chr2 [ 1, 9] - | zzz > > [4] chr2 [10, 19] * | www > > --- > > seqlengths: > > chr1 chr2 > > NA NA > > Alternatively you could do: > > grl <- reduce(split(gr, elementMetadata(gr)$Gene)) > reducedRanges <- unlist(grl, use.names=FALSE) > elementMetadata(reducedRanges)$Gene <- rep(names(grl), > elementLengths(grl)) > > > reducedRanges > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr2 [10, 19] * | www > [2] chr1 [ 1, 9] - | xxx > [3] chr1 [ 4, 12] + | yyy > [4] chr2 [ 1, 9] - | zzz > --- > seqlengths: > chr1 chr2 > NA NA > > Then if you want to restore the original order: > > oo <- order(match(elementMetadata(reducedRanges)$Gene, > unique(elementMetadata(gr)$Gene))) > grFinal <- reducedRanges[oo] > > > grFinal > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr1 [ 1, 9] - | xxx > [2] chr1 [ 4, 12] + | yyy > [3] chr2 [ 1, 9] - | zzz > [4] chr2 [10, 19] * | www > --- > seqlengths: > chr1 chr2 > NA NA > > Or if you want to order by seqnames and start: > > oo2 <- order(as.vector(seqnames(reducedRanges)), start(reducedRanges)) > > > reducedRanges[oo2] > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr1 [ 1, 9] - | xxx > [2] chr1 [ 4, 12] + | yyy > [3] chr2 [ 1, 9] - | zzz > [4] chr2 [10, 19] * | www > --- > seqlengths: > chr1 chr2 > NA NA > > which in your case happens to give the same result. > > Cheers, > H. > > > > > > > > > Regards. > > Fahim > > > > On Thu, Nov 17, 2011 at 9:13 PM, Vincent Carey > > <stvjc at="" channing.harvard.edu="">wrote: > > > >> > >> > >> On Thu, Nov 17, 2011 at 6:28 PM, Fahim > Mohammad<fahim.md at="" gmail.com="">wrote: > >> > >>> Hi > >>> In the following example, I am trying to use 'reduce()' function to reduce > >>> the genomic intervals and find intervals corresponding to 'Gene'. It is > >>> behaving as it is desired. However the output of the reduce just give the > >>> intervals (I am losing the 'Gene' metadata). > >>> Is there a way to retain 'Gene' metadata. > >>> > >>> > >>>> gr<- GRanges(+ seqnames = Rle(c("chr1", > >>> "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, > >>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = > >>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + > >>> transcript = head(letters, 10),+ Gene = > >>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + > >>> )> grGRanges with 10 ranges and 2 elementMetadata values: > >>> > >>> seqnames ranges strand | transcript Gene > >>> <rle> <iranges> <rle> |<character> <character> > >>> [1] chr1 [ 1, 7] - | a xxx > >>> [2] chr1 [ 2, 8] - | b xxx > >>> [3] chr1 [ 3, 9] - | c xxx > >>> [4] chr1 [ 4, 10] + | d yyy > >>> [5] chr1 [ 5, 11] + | e yyy > >>> [6] chr1 [ 6, 12] + | f yyy > >>> [7] chr2 [ 1, 5] - | g zzz > >>> [8] chr2 [ 2, 9] - | h zzz > >>> [9] chr2 [10, 18] * | i www > >>> [10] chr2 [11, 19] * | j www > >>> --- > >>> seqlengths: > >>> chr1 chr2 > >>> NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 > >>> > >>> elementMetadata values: > >>> seqnames ranges strand > >>> <rle> <iranges> <rle> > >>> [1] chr1 [ 4, 12] + > >>> [2] chr1 [ 1, 9] - > >>> [3] chr2 [ 1, 9] - > >>> [4] chr2 [10, 19] * > >>> --- > >>> seqlengths: > >>> chr1 chr2 > >>> NA NA > >>> > >> > >> This doesn't do exactly what you seek, but it gives a mangled names > >> attributed that you can unmangle > >> and reattach as metadata if you like > >> > >> unlist(reduce(split(gr, elementMetadata(gr)$Gene))) > >> > >> > >>> > >>>> > >>> > >>> > >>> Thanks > >>> > >>> > >>> Fahim > >>> > >>> -- > >>> ------------------------------- > >>> Fahim Mohammad > >>> Bioinforformatics Lab > >>> University of Louisville > >>> Louisville, KY, USA > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >> > >> > > > > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Malcolm, On 11-11-18 12:13 PM, Cook, Malcolm wrote: > Herve, > > re: does anyone object > > To be honest, I've been skimming. > > Are you proposing that the names of the unlisted GRanges don't get suffixed so as to be unique. Correct. No more mangling, so yes, they would be repeated. It's not ideal because if you subset by the gene name then you only get the first occurence. But the current situation is even worse: you don't get anything! > > If so, then, no objections here. Good. H. > > Thanks! > > ~Malcolm > > >> -----Original Message----- >> From: bioconductor-bounces at r-project.org [mailto:bioconductor- >> bounces at r-project.org] On Behalf Of Hervé Pagès >> Sent: Friday, November 18, 2011 1:38 PM >> To: Fahim Mohammad >> Cc: Vincent Carey; Bioconductor Newsgroup >> Subject: Re: [BioC] GRanges - reduce() function >> >> Hi Fahim and others, >> >> On 11-11-18 07:45 AM, Fahim Mohammad wrote: >>> Thanks a lot Vincent, >>> Your suggestion helped me a lot, but the problem of mangled names and >>> unsorted ranges remained. Also the 'unlist' function add sufiix in the >>> name. >>> >>>> unlist(reduce(split(gr, elementMetadata(gr)$Gene)))GRanges with 4 >> ranges and 0 elementMetadata values: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> www.1 chr2 [10, 19] * >>> xxx.2 chr1 [ 1, 9] - >>> yyy.3 chr1 [ 4, 12] + >>> zzz.4 chr2 [ 1, 9] - >>> --- >>> seqlengths: >>> chr1 chr2 >>> NA NA >> >> Yes this mangling is definitely not helping. We have control over the >> "unlist" method for GRangesList objects so we could change that if >> nobody objects. I think it was originally implemented that way for 2 >> reasons: >> >> (1) to mimic what base::unlist() does (even though most people >> don't like it); >> >> (2) also until recently the names of a GRanges object had to be >> unique. >> >> Now that the names of a GRanges object don't have to be unique anymore >> (since BioC 2.9), maybe we could change the behaviour of unlist(). >> Does anybody object? >> >>> >>> So I ended up calling GRanges function twice. I know that this approach >> may >>> not be the best and there may be other efficient workaround. >>> Thanks again for your help. >>> Here is what I did. >>> >>> >>>> gr<- GRanges(+ seqnames = Rle(c("chr1", "chr2", "chr2"), >> c(6, 2,2)), + ranges = IRanges(c(1:6, 1:2, 10:11) , end = c(7:12, 5,9, >> 18:19 )), + strand = Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >> transcript = head(letters, 10),+ Gene = c(rep("xxx",3), >> rep("yyy",3), rep("zzz",2), rep("www",2)) + )> grGRanges with 10 >> ranges and 2 elementMetadata values: >>> seqnames ranges strand | transcript Gene >>> <rle> <iranges> <rle> |<character> <character> >>> [1] chr1 [ 1, 7] - | a xxx >>> [2] chr1 [ 2, 8] - | b xxx >>> [3] chr1 [ 3, 9] - | c xxx >>> [4] chr1 [ 4, 10] + | d yyy >>> [5] chr1 [ 5, 11] + | e yyy >>> [6] chr1 [ 6, 12] + | f yyy >>> [7] chr2 [ 1, 5] - | g zzz >>> [8] chr2 [ 2, 9] - | h zzz >>> [9] chr2 [10, 18] * | i www >>> [10] chr2 [11, 19] * | j www >>> --- >>> seqlengths: >>> chr1 chr2 >>> NA NA> reducedRanges = reduce(split(gr, >>> elementMetadata(gr)$Gene)) #reduce the ranges according to 'Gene' >>> >>> # Convert the rr (reduced ranges) to data frame >>> >>> >>>> tmp = as.data.frame(reducedRanges)> tmp element seqnames start >> end width strand >>> 1 www chr2 10 19 10 * >>> 2 xxx chr1 1 9 9 - >>> 3 yyy chr1 4 12 9 + >>> 4 zzz chr2 1 9 9 -> rr = >>> tmp[order(tmp[,'seqnames'], tmp[,'start']),]; #sort the data frame >>> first on 'seqname' and 'start' location> rr element seqnames start >>> end width strand >>> 2 xxx chr1 1 9 9 - >>> 3 yyy chr1 4 12 9 + >>> 4 zzz chr2 1 9 9 - >>> 1 www chr2 10 19 10 * >>> >>> # And now convert the data frame again into GRanges object. >>> >>> >>>> grFinal<- GRanges(seqnames = Rle(rr[,'seqnames']),+ >> ranges = IRanges (start =rr[,'start'], end = rr[,'end'] ),+ strand = >> Rle(rr[,'strand']),+ Gene = rr[,'element']+ )> >> grFinalGRanges with 4 ranges and 1 elementMetadata value: >>> seqnames ranges strand | Gene >>> <rle> <iranges> <rle> |<character> >>> [1] chr1 [ 1, 9] - | xxx >>> [2] chr1 [ 4, 12] + | yyy >>> [3] chr2 [ 1, 9] - | zzz >>> [4] chr2 [10, 19] * | www >>> --- >>> seqlengths: >>> chr1 chr2 >>> NA NA >> >> Alternatively you could do: >> >> grl<- reduce(split(gr, elementMetadata(gr)$Gene)) >> reducedRanges<- unlist(grl, use.names=FALSE) >> elementMetadata(reducedRanges)$Gene<- rep(names(grl), >> elementLengths(grl)) >> >> > reducedRanges >> GRanges with 4 ranges and 1 elementMetadata value: >> seqnames ranges strand | Gene >> <rle> <iranges> <rle> |<character> >> [1] chr2 [10, 19] * | www >> [2] chr1 [ 1, 9] - | xxx >> [3] chr1 [ 4, 12] + | yyy >> [4] chr2 [ 1, 9] - | zzz >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> >> Then if you want to restore the original order: >> >> oo<- order(match(elementMetadata(reducedRanges)$Gene, >> unique(elementMetadata(gr)$Gene))) >> grFinal<- reducedRanges[oo] >> >> > grFinal >> GRanges with 4 ranges and 1 elementMetadata value: >> seqnames ranges strand | Gene >> <rle> <iranges> <rle> |<character> >> [1] chr1 [ 1, 9] - | xxx >> [2] chr1 [ 4, 12] + | yyy >> [3] chr2 [ 1, 9] - | zzz >> [4] chr2 [10, 19] * | www >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> >> Or if you want to order by seqnames and start: >> >> oo2<- order(as.vector(seqnames(reducedRanges)), start(reducedRanges)) >> >> > reducedRanges[oo2] >> GRanges with 4 ranges and 1 elementMetadata value: >> seqnames ranges strand | Gene >> <rle> <iranges> <rle> |<character> >> [1] chr1 [ 1, 9] - | xxx >> [2] chr1 [ 4, 12] + | yyy >> [3] chr2 [ 1, 9] - | zzz >> [4] chr2 [10, 19] * | www >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> >> which in your case happens to give the same result. >> >> Cheers, >> H. >> >>> >>> >>> >>> Regards. >>> Fahim >>> >>> On Thu, Nov 17, 2011 at 9:13 PM, Vincent Carey >>> <stvjc at="" channing.harvard.edu="">wrote: >>> >>>> >>>> >>>> On Thu, Nov 17, 2011 at 6:28 PM, Fahim >> Mohammad<fahim.md at="" gmail.com="">wrote: >>>> >>>>> Hi >>>>> In the following example, I am trying to use 'reduce()' function to reduce >>>>> the genomic intervals and find intervals corresponding to 'Gene'. It is >>>>> behaving as it is desired. However the output of the reduce just give the >>>>> intervals (I am losing the 'Gene' metadata). >>>>> Is there a way to retain 'Gene' metadata. >>>>> >>>>> >>>>>> gr<- GRanges(+ seqnames = Rle(c("chr1", >>>>> "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, >>>>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = >>>>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >>>>> transcript = head(letters, 10),+ Gene = >>>>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >>>>> )> grGRanges with 10 ranges and 2 elementMetadata values: >>>>> >>>>> seqnames ranges strand | transcript Gene >>>>> <rle> <iranges> <rle> |<character> <character> >>>>> [1] chr1 [ 1, 7] - | a xxx >>>>> [2] chr1 [ 2, 8] - | b xxx >>>>> [3] chr1 [ 3, 9] - | c xxx >>>>> [4] chr1 [ 4, 10] + | d yyy >>>>> [5] chr1 [ 5, 11] + | e yyy >>>>> [6] chr1 [ 6, 12] + | f yyy >>>>> [7] chr2 [ 1, 5] - | g zzz >>>>> [8] chr2 [ 2, 9] - | h zzz >>>>> [9] chr2 [10, 18] * | i www >>>>> [10] chr2 [11, 19] * | j www >>>>> --- >>>>> seqlengths: >>>>> chr1 chr2 >>>>> NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 >>>>> >>>>> elementMetadata values: >>>>> seqnames ranges strand >>>>> <rle> <iranges> <rle> >>>>> [1] chr1 [ 4, 12] + >>>>> [2] chr1 [ 1, 9] - >>>>> [3] chr2 [ 1, 9] - >>>>> [4] chr2 [10, 19] * >>>>> --- >>>>> seqlengths: >>>>> chr1 chr2 >>>>> NA NA >>>>> >>>> >>>> This doesn't do exactly what you seek, but it gives a mangled names >>>> attributed that you can unmangle >>>> and reattach as metadata if you like >>>> >>>> unlist(reduce(split(gr, elementMetadata(gr)$Gene))) >>>> >>>> >>>>> >>>>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> Fahim >>>>> >>>>> -- >>>>> ------------------------------- >>>>> Fahim Mohammad >>>>> Bioinforformatics Lab >>>>> University of Louisville >>>>> Louisville, KY, USA >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>> >>> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi Herves Thanks a lot for your reply. I also feel that adding suffix in the output of unlist() function is not useful. Fahim 2011/11/18 Hervé Pagès <hpages@fhcrc.org> > Hi Fahim and others, > > > On 11-11-18 07:45 AM, Fahim Mohammad wrote: > >> Thanks a lot Vincent, >> Your suggestion helped me a lot, but the problem of mangled names and >> unsorted ranges remained. Also the 'unlist' function add sufiix in the >> name. >> >> unlist(reduce(split(gr, elementMetadata(gr)$Gene)))**GRanges with 4 >>> ranges and 0 elementMetadata values: >>> >> seqnames ranges strand >> <rle> <iranges> <rle> >> www.1 chr2 [10, 19] * >> xxx.2 chr1 [ 1, 9] - >> yyy.3 chr1 [ 4, 12] + >> zzz.4 chr2 [ 1, 9] - >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> > > Yes this mangling is definitely not helping. We have control over the > "unlist" method for GRangesList objects so we could change that if > nobody objects. I think it was originally implemented that way for 2 > reasons: > > (1) to mimic what base::unlist() does (even though most people > don't like it); > > (2) also until recently the names of a GRanges object had to be > unique. > > Now that the names of a GRanges object don't have to be unique anymore > (since BioC 2.9), maybe we could change the behaviour of unlist(). > Does anybody object? > > >> So I ended up calling GRanges function twice. I know that this approach >> may >> not be the best and there may be other efficient workaround. >> Thanks again for your help. >> Here is what I did. >> >> >> gr<- GRanges(+ seqnames = Rle(c("chr1", >>> "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, >>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = >>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >>> transcript = head(letters, 10),+ Gene = >>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >>> )> grGRanges with 10 ranges and 2 elementMetadata values: >>> >> seqnames ranges strand | transcript Gene >> <rle> <iranges> <rle> |<character> <character> >> [1] chr1 [ 1, 7] - | a xxx >> [2] chr1 [ 2, 8] - | b xxx >> [3] chr1 [ 3, 9] - | c xxx >> [4] chr1 [ 4, 10] + | d yyy >> [5] chr1 [ 5, 11] + | e yyy >> [6] chr1 [ 6, 12] + | f yyy >> [7] chr2 [ 1, 5] - | g zzz >> [8] chr2 [ 2, 9] - | h zzz >> [9] chr2 [10, 18] * | i www >> [10] chr2 [11, 19] * | j www >> --- >> seqlengths: >> chr1 chr2 >> NA NA> reducedRanges = reduce(split(gr, >> >> elementMetadata(gr)$Gene)) #reduce the ranges according to 'Gene' >> >> # Convert the rr (reduced ranges) to data frame >> >> >> tmp = as.data.frame(reducedRanges)> tmp element seqnames start >>> end width strand >>> >> 1 www chr2 10 19 10 * >> 2 xxx chr1 1 9 9 - >> 3 yyy chr1 4 12 9 + >> 4 zzz chr2 1 9 9 -> rr = >> tmp[order(tmp[,'seqnames'], tmp[,'start']),]; #sort the data frame >> first on 'seqname' and 'start' location> rr element seqnames start >> end width strand >> 2 xxx chr1 1 9 9 - >> 3 yyy chr1 4 12 9 + >> 4 zzz chr2 1 9 9 - >> 1 www chr2 10 19 10 * >> >> # And now convert the data frame again into GRanges object. >> >> >> grFinal<- GRanges(seqnames = Rle(rr[,'seqnames']),+ >>> ranges = IRanges (start =rr[,'start'], end = rr[,'end'] ),+ >>> strand = Rle(rr[,'strand']),+ >>> Gene = rr[,'element']+ )> grFinalGRanges >>> with 4 ranges and 1 elementMetadata value: >>> >> seqnames ranges strand | Gene >> <rle> <iranges> <rle> |<character> >> [1] chr1 [ 1, 9] - | xxx >> [2] chr1 [ 4, 12] + | yyy >> [3] chr2 [ 1, 9] - | zzz >> [4] chr2 [10, 19] * | www >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> > > Alternatively you could do: > > grl <- reduce(split(gr, elementMetadata(gr)$Gene)) > reducedRanges <- unlist(grl, use.names=FALSE) > elementMetadata(reducedRanges)**$Gene <- rep(names(grl), > elementLengths(grl)) > > > reducedRanges > > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr2 [10, 19] * | www > [2] chr1 [ 1, 9] - | xxx > [3] chr1 [ 4, 12] + | yyy > [4] chr2 [ 1, 9] - | zzz > > --- > seqlengths: > chr1 chr2 > NA NA > > Then if you want to restore the original order: > > oo <- order(match(elementMetadata(**reducedRanges)$Gene, > unique(elementMetadata(gr)$**Gene))) > grFinal <- reducedRanges[oo] > > > > grFinal > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr1 [ 1, 9] - | xxx > [2] chr1 [ 4, 12] + | yyy > [3] chr2 [ 1, 9] - | zzz > [4] chr2 [10, 19] * | www > --- > seqlengths: > chr1 chr2 > NA NA > > Or if you want to order by seqnames and start: > > oo2 <- order(as.vector(seqnames(**reducedRanges)), start(reducedRanges)) > > > reducedRanges[oo2] > > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr1 [ 1, 9] - | xxx > [2] chr1 [ 4, 12] + | yyy > [3] chr2 [ 1, 9] - | zzz > [4] chr2 [10, 19] * | www > --- > seqlengths: > chr1 chr2 > NA NA > > which in your case happens to give the same result. > > Cheers, > H. > > > >> >> >> Regards. >> Fahim >> >> On Thu, Nov 17, 2011 at 9:13 PM, Vincent Carey >> <stvjc@channing.harvard.edu>**wrote: >> >> >>> >>> On Thu, Nov 17, 2011 at 6:28 PM, Fahim Mohammad<fahim.md@gmail.com>** >>> wrote: >>> >>> Hi >>>> In the following example, I am trying to use 'reduce()' function to >>>> reduce >>>> the genomic intervals and find intervals corresponding to 'Gene'. It is >>>> behaving as it is desired. However the output of the reduce just give >>>> the >>>> intervals (I am losing the 'Gene' metadata). >>>> Is there a way to retain 'Gene' metadata. >>>> >>>> >>>> gr<- GRanges(+ seqnames = Rle(c("chr1", >>>>> >>>> "chr2", "chr2"), c(6, 2,2)), + ranges = >>>> IRanges(c(1:6, >>>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand >>>> = >>>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >>>> transcript = head(letters, 10),+ Gene = >>>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >>>> )> grGRanges with 10 ranges and 2 elementMetadata values: >>>> >>>> seqnames ranges strand | transcript Gene >>>> <rle> <iranges> <rle> |<character> <character> >>>> [1] chr1 [ 1, 7] - | a xxx >>>> [2] chr1 [ 2, 8] - | b xxx >>>> [3] chr1 [ 3, 9] - | c xxx >>>> [4] chr1 [ 4, 10] + | d yyy >>>> [5] chr1 [ 5, 11] + | e yyy >>>> [6] chr1 [ 6, 12] + | f yyy >>>> [7] chr2 [ 1, 5] - | g zzz >>>> [8] chr2 [ 2, 9] - | h zzz >>>> [9] chr2 [10, 18] * | i www >>>> [10] chr2 [11, 19] * | j www >>>> --- >>>> seqlengths: >>>> chr1 chr2 >>>> NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 >>>> >>>> elementMetadata values: >>>> seqnames ranges strand >>>> <rle> <iranges> <rle> >>>> [1] chr1 [ 4, 12] + >>>> [2] chr1 [ 1, 9] - >>>> [3] chr2 [ 1, 9] - >>>> [4] chr2 [10, 19] * >>>> --- >>>> seqlengths: >>>> chr1 chr2 >>>> NA NA >>>> >>>> >>> This doesn't do exactly what you seek, but it gives a mangled names >>> attributed that you can unmangle >>> and reattach as metadata if you like >>> >>> unlist(reduce(split(gr, elementMetadata(gr)$Gene))) >>> >>> >>> >>>> >>>>> >>>> >>>> Thanks >>>> >>>> >>>> Fahim >>>> >>>> -- >>>> ------------------------------**- >>>> Fahim Mohammad >>>> Bioinforformatics Lab >>>> University of Louisville >>>> Louisville, KY, USA >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> >> >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- ------------------------------- Fahim Mohammad Bioinforformatics Lab University of Louisville Louisville, KY, USA Ph: +1-502-409-1167 web: http://bioinformatics.louisville.edu/lab/ ------------------------------- I think the reason people are dealing with science less well now than 50 years ago is that it has become so complicated. --James D. Watson<http: www.brainyquote.com="" quotes="" quotes="" j="" jamesdwat="" 388959.html=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
2011/11/18 Hervé Pagès <hpages@fhcrc.org> > Hi Fahim and others, > > > On 11-11-18 07:45 AM, Fahim Mohammad wrote: > >> Thanks a lot Vincent, >> Your suggestion helped me a lot, but the problem of mangled names and >> unsorted ranges remained. Also the 'unlist' function add sufiix in the >> name. >> >> unlist(reduce(split(gr, elementMetadata(gr)$Gene)))**GRanges with 4 >>> ranges and 0 elementMetadata values: >>> >> seqnames ranges strand >> <rle> <iranges> <rle> >> www.1 chr2 [10, 19] * >> xxx.2 chr1 [ 1, 9] - >> yyy.3 chr1 [ 4, 12] + >> zzz.4 chr2 [ 1, 9] - >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> > > Yes this mangling is definitely not helping. We have control over the > "unlist" method for GRangesList objects so we could change that if > nobody objects. I think it was originally implemented that way for 2 > reasons: > > (1) to mimic what base::unlist() does (even though most people > don't like it); > > (2) also until recently the names of a GRanges object had to be > unique. > > I think the annoyance comes from #2, not #1. Even if the unlistData does not have names, it is forced to have them, and thus we always have the indices on the end. Can we please delete this method? setMethod("unlist", "GRangesList", function(x, recursive = TRUE, use.names = TRUE) { if (use.names && is.null(names(x@unlistData))) names(x@unlistData) <- seq_len(length(x@unlistData)) callNextMethod() } ) Now that the names of a GRanges object don't have to be unique anymore > (since BioC 2.9), maybe we could change the behaviour of unlist(). > Does anybody object? > > > >> So I ended up calling GRanges function twice. I know that this approach >> may >> not be the best and there may be other efficient workaround. >> Thanks again for your help. >> Here is what I did. >> >> >> gr<- GRanges(+ seqnames = Rle(c("chr1", >>> "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, >>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = >>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >>> transcript = head(letters, 10),+ Gene = >>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >>> )> grGRanges with 10 ranges and 2 elementMetadata values: >>> >> seqnames ranges strand | transcript Gene >> <rle> <iranges> <rle> |<character> <character> >> [1] chr1 [ 1, 7] - | a xxx >> [2] chr1 [ 2, 8] - | b xxx >> [3] chr1 [ 3, 9] - | c xxx >> [4] chr1 [ 4, 10] + | d yyy >> [5] chr1 [ 5, 11] + | e yyy >> [6] chr1 [ 6, 12] + | f yyy >> [7] chr2 [ 1, 5] - | g zzz >> [8] chr2 [ 2, 9] - | h zzz >> [9] chr2 [10, 18] * | i www >> [10] chr2 [11, 19] * | j www >> --- >> seqlengths: >> chr1 chr2 >> NA NA> reducedRanges = reduce(split(gr, >> elementMetadata(gr)$Gene)) #reduce the ranges according to 'Gene' >> >> # Convert the rr (reduced ranges) to data frame >> >> >> tmp = as.data.frame(reducedRanges)> tmp element seqnames start >>> end width strand >>> >> 1 www chr2 10 19 10 * >> 2 xxx chr1 1 9 9 - >> 3 yyy chr1 4 12 9 + >> 4 zzz chr2 1 9 9 -> rr = >> tmp[order(tmp[,'seqnames'], tmp[,'start']),]; #sort the data frame >> first on 'seqname' and 'start' location> rr element seqnames start >> end width strand >> 2 xxx chr1 1 9 9 - >> 3 yyy chr1 4 12 9 + >> 4 zzz chr2 1 9 9 - >> 1 www chr2 10 19 10 * >> >> # And now convert the data frame again into GRanges object. >> >> >> grFinal<- GRanges(seqnames = Rle(rr[,'seqnames']),+ >>> ranges = IRanges (start =rr[,'start'], end = rr[,'end'] ),+ >>> strand = Rle(rr[,'strand']),+ >>> Gene = rr[,'element']+ )> grFinalGRanges >>> with 4 ranges and 1 elementMetadata value: >>> >> seqnames ranges strand | Gene >> <rle> <iranges> <rle> |<character> >> [1] chr1 [ 1, 9] - | xxx >> [2] chr1 [ 4, 12] + | yyy >> [3] chr2 [ 1, 9] - | zzz >> [4] chr2 [10, 19] * | www >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> > > Alternatively you could do: > > grl <- reduce(split(gr, elementMetadata(gr)$Gene)) > reducedRanges <- unlist(grl, use.names=FALSE) > elementMetadata(reducedRanges)**$Gene <- rep(names(grl), > elementLengths(grl)) > > > reducedRanges > > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr2 [10, 19] * | www > [2] chr1 [ 1, 9] - | xxx > [3] chr1 [ 4, 12] + | yyy > [4] chr2 [ 1, 9] - | zzz > > --- > seqlengths: > chr1 chr2 > NA NA > > Then if you want to restore the original order: > > oo <- order(match(elementMetadata(**reducedRanges)$Gene, > unique(elementMetadata(gr)$**Gene))) > grFinal <- reducedRanges[oo] > > > grFinal > > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr1 [ 1, 9] - | xxx > [2] chr1 [ 4, 12] + | yyy > [3] chr2 [ 1, 9] - | zzz > [4] chr2 [10, 19] * | www > --- > seqlengths: > chr1 chr2 > NA NA > > Or if you want to order by seqnames and start: > > oo2 <- order(as.vector(seqnames(**reducedRanges)), start(reducedRanges)) > > > reducedRanges[oo2] > > GRanges with 4 ranges and 1 elementMetadata value: > seqnames ranges strand | Gene > <rle> <iranges> <rle> | <character> > [1] chr1 [ 1, 9] - | xxx > [2] chr1 [ 4, 12] + | yyy > [3] chr2 [ 1, 9] - | zzz > [4] chr2 [10, 19] * | www > --- > seqlengths: > chr1 chr2 > NA NA > > which in your case happens to give the same result. > > Cheers, > H. > > > >> >> >> Regards. >> Fahim >> >> On Thu, Nov 17, 2011 at 9:13 PM, Vincent Carey >> <stvjc@channing.harvard.edu>**wrote: >> >> >>> >>> On Thu, Nov 17, 2011 at 6:28 PM, Fahim Mohammad<fahim.md@gmail.com>** >>> wrote: >>> >>> Hi >>>> In the following example, I am trying to use 'reduce()' function to >>>> reduce >>>> the genomic intervals and find intervals corresponding to 'Gene'. It is >>>> behaving as it is desired. However the output of the reduce just give >>>> the >>>> intervals (I am losing the 'Gene' metadata). >>>> Is there a way to retain 'Gene' metadata. >>>> >>>> >>>> gr<- GRanges(+ seqnames = Rle(c("chr1", >>>>> >>>> "chr2", "chr2"), c(6, 2,2)), + ranges = >>>> IRanges(c(1:6, >>>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand >>>> = >>>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >>>> transcript = head(letters, 10),+ Gene = >>>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >>>> )> grGRanges with 10 ranges and 2 elementMetadata values: >>>> >>>> seqnames ranges strand | transcript Gene >>>> <rle> <iranges> <rle> |<character> <character> >>>> [1] chr1 [ 1, 7] - | a xxx >>>> [2] chr1 [ 2, 8] - | b xxx >>>> [3] chr1 [ 3, 9] - | c xxx >>>> [4] chr1 [ 4, 10] + | d yyy >>>> [5] chr1 [ 5, 11] + | e yyy >>>> [6] chr1 [ 6, 12] + | f yyy >>>> [7] chr2 [ 1, 5] - | g zzz >>>> [8] chr2 [ 2, 9] - | h zzz >>>> [9] chr2 [10, 18] * | i www >>>> [10] chr2 [11, 19] * | j www >>>> --- >>>> seqlengths: >>>> chr1 chr2 >>>> NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 >>>> >>>> elementMetadata values: >>>> seqnames ranges strand >>>> <rle> <iranges> <rle> >>>> [1] chr1 [ 4, 12] + >>>> [2] chr1 [ 1, 9] - >>>> [3] chr2 [ 1, 9] - >>>> [4] chr2 [10, 19] * >>>> --- >>>> seqlengths: >>>> chr1 chr2 >>>> NA NA >>>> >>>> >>> This doesn't do exactly what you seek, but it gives a mangled names >>> attributed that you can unmangle >>> and reattach as metadata if you like >>> >>> unlist(reduce(split(gr, elementMetadata(gr)$Gene))) >>> >>> >>> >>>> >>>>> >>>> >>>> Thanks >>>> >>>> >>>> Fahim >>>> >>>> -- >>>> ------------------------------**- >>>> Fahim Mohammad >>>> Bioinforformatics Lab >>>> University of Louisville >>>> Louisville, KY, USA >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> >> >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
I went ahead and removed that unlist() method. Michael 2011/11/23 Michael Lawrence <michafla@gene.com> > > > 2011/11/18 Hervé Pagès <hpages@fhcrc.org> > >> Hi Fahim and others, >> >> >> On 11-11-18 07:45 AM, Fahim Mohammad wrote: >> >>> Thanks a lot Vincent, >>> Your suggestion helped me a lot, but the problem of mangled names and >>> unsorted ranges remained. Also the 'unlist' function add sufiix in the >>> name. >>> >>> unlist(reduce(split(gr, elementMetadata(gr)$Gene)))**GRanges with 4 >>>> ranges and 0 elementMetadata values: >>>> >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> www.1 chr2 [10, 19] * >>> xxx.2 chr1 [ 1, 9] - >>> yyy.3 chr1 [ 4, 12] + >>> zzz.4 chr2 [ 1, 9] - >>> --- >>> seqlengths: >>> chr1 chr2 >>> NA NA >>> >> >> Yes this mangling is definitely not helping. We have control over the >> "unlist" method for GRangesList objects so we could change that if >> nobody objects. I think it was originally implemented that way for 2 >> reasons: >> >> (1) to mimic what base::unlist() does (even though most people >> don't like it); >> >> (2) also until recently the names of a GRanges object had to be >> unique. >> >> > I think the annoyance comes from #2, not #1. Even if the unlistData does > not have names, it is forced to have them, and thus we always have the > indices on the end. Can we please delete this method? > > setMethod("unlist", "GRangesList", > function(x, recursive = TRUE, use.names = TRUE) > { > if (use.names && is.null(names(x@unlistData))) > names(x@unlistData) <- seq_len(length(x@unlistData)) > callNextMethod() > } > ) > > Now that the names of a GRanges object don't have to be unique anymore >> (since BioC 2.9), maybe we could change the behaviour of unlist(). >> Does anybody object? >> >> >> >>> So I ended up calling GRanges function twice. I know that this approach >>> may >>> not be the best and there may be other efficient workaround. >>> Thanks again for your help. >>> Here is what I did. >>> >>> >>> gr<- GRanges(+ seqnames = Rle(c("chr1", >>>> "chr2", "chr2"), c(6, 2,2)), + ranges = IRanges(c(1:6, >>>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + strand = >>>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >>>> transcript = head(letters, 10),+ Gene = >>>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >>>> )> grGRanges with 10 ranges and 2 elementMetadata values: >>>> >>> seqnames ranges strand | transcript Gene >>> <rle> <iranges> <rle> |<character> <character> >>> [1] chr1 [ 1, 7] - | a xxx >>> [2] chr1 [ 2, 8] - | b xxx >>> [3] chr1 [ 3, 9] - | c xxx >>> [4] chr1 [ 4, 10] + | d yyy >>> [5] chr1 [ 5, 11] + | e yyy >>> [6] chr1 [ 6, 12] + | f yyy >>> [7] chr2 [ 1, 5] - | g zzz >>> [8] chr2 [ 2, 9] - | h zzz >>> [9] chr2 [10, 18] * | i www >>> [10] chr2 [11, 19] * | j www >>> --- >>> seqlengths: >>> chr1 chr2 >>> NA NA> reducedRanges = reduce(split(gr, >>> elementMetadata(gr)$Gene)) #reduce the ranges according to 'Gene' >>> >>> # Convert the rr (reduced ranges) to data frame >>> >>> >>> tmp = as.data.frame(reducedRanges)> tmp element seqnames start >>>> end width strand >>>> >>> 1 www chr2 10 19 10 * >>> 2 xxx chr1 1 9 9 - >>> 3 yyy chr1 4 12 9 + >>> 4 zzz chr2 1 9 9 -> rr = >>> tmp[order(tmp[,'seqnames'], tmp[,'start']),]; #sort the data frame >>> first on 'seqname' and 'start' location> rr element seqnames start >>> end width strand >>> 2 xxx chr1 1 9 9 - >>> 3 yyy chr1 4 12 9 + >>> 4 zzz chr2 1 9 9 - >>> 1 www chr2 10 19 10 * >>> >>> # And now convert the data frame again into GRanges object. >>> >>> >>> grFinal<- GRanges(seqnames = Rle(rr[,'seqnames']),+ >>>> ranges = IRanges (start =rr[,'start'], end = rr[,'end'] ),+ >>>> strand = Rle(rr[,'strand']),+ >>>> Gene = rr[,'element']+ )> grFinalGRanges >>>> with 4 ranges and 1 elementMetadata value: >>>> >>> seqnames ranges strand | Gene >>> <rle> <iranges> <rle> |<character> >>> [1] chr1 [ 1, 9] - | xxx >>> [2] chr1 [ 4, 12] + | yyy >>> [3] chr2 [ 1, 9] - | zzz >>> [4] chr2 [10, 19] * | www >>> --- >>> seqlengths: >>> chr1 chr2 >>> NA NA >>> >> >> Alternatively you could do: >> >> grl <- reduce(split(gr, elementMetadata(gr)$Gene)) >> reducedRanges <- unlist(grl, use.names=FALSE) >> elementMetadata(reducedRanges)**$Gene <- rep(names(grl), >> elementLengths(grl)) >> >> > reducedRanges >> >> GRanges with 4 ranges and 1 elementMetadata value: >> seqnames ranges strand | Gene >> <rle> <iranges> <rle> | <character> >> [1] chr2 [10, 19] * | www >> [2] chr1 [ 1, 9] - | xxx >> [3] chr1 [ 4, 12] + | yyy >> [4] chr2 [ 1, 9] - | zzz >> >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> >> Then if you want to restore the original order: >> >> oo <- order(match(elementMetadata(**reducedRanges)$Gene, >> unique(elementMetadata(gr)$**Gene))) >> grFinal <- reducedRanges[oo] >> >> > grFinal >> >> GRanges with 4 ranges and 1 elementMetadata value: >> seqnames ranges strand | Gene >> <rle> <iranges> <rle> | <character> >> [1] chr1 [ 1, 9] - | xxx >> [2] chr1 [ 4, 12] + | yyy >> [3] chr2 [ 1, 9] - | zzz >> [4] chr2 [10, 19] * | www >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> >> Or if you want to order by seqnames and start: >> >> oo2 <- order(as.vector(seqnames(**reducedRanges)), start(reducedRanges)) >> >> > reducedRanges[oo2] >> >> GRanges with 4 ranges and 1 elementMetadata value: >> seqnames ranges strand | Gene >> <rle> <iranges> <rle> | <character> >> [1] chr1 [ 1, 9] - | xxx >> [2] chr1 [ 4, 12] + | yyy >> [3] chr2 [ 1, 9] - | zzz >> [4] chr2 [10, 19] * | www >> --- >> seqlengths: >> chr1 chr2 >> NA NA >> >> which in your case happens to give the same result. >> >> Cheers, >> H. >> >> >> >>> >>> >>> Regards. >>> Fahim >>> >>> On Thu, Nov 17, 2011 at 9:13 PM, Vincent Carey >>> <stvjc@channing.harvard.edu>**wrote: >>> >>> >>>> >>>> On Thu, Nov 17, 2011 at 6:28 PM, Fahim Mohammad<fahim.md@gmail.com>** >>>> wrote: >>>> >>>> Hi >>>>> In the following example, I am trying to use 'reduce()' function to >>>>> reduce >>>>> the genomic intervals and find intervals corresponding to 'Gene'. It >>>>> is >>>>> behaving as it is desired. However the output of the reduce just give >>>>> the >>>>> intervals (I am losing the 'Gene' metadata). >>>>> Is there a way to retain 'Gene' metadata. >>>>> >>>>> >>>>> gr<- GRanges(+ seqnames = Rle(c("chr1", >>>>>> >>>>> "chr2", "chr2"), c(6, 2,2)), + ranges = >>>>> IRanges(c(1:6, >>>>> 1:2, 10:11) , end = c(7:12, 5,9, 18:19 )), + >>>>> strand = >>>>> Rle(strand(c("-", "+", "-", "*")),c(3, 3, 2,2)), + >>>>> transcript = head(letters, 10),+ Gene = >>>>> c(rep("xxx",3), rep("yyy",3), rep("zzz",2), rep("www",2)) + >>>>> )> grGRanges with 10 ranges and 2 elementMetadata values: >>>>> >>>>> seqnames ranges strand | transcript Gene >>>>> <rle> <iranges> <rle> |<character> <character> >>>>> [1] chr1 [ 1, 7] - | a xxx >>>>> [2] chr1 [ 2, 8] - | b xxx >>>>> [3] chr1 [ 3, 9] - | c xxx >>>>> [4] chr1 [ 4, 10] + | d yyy >>>>> [5] chr1 [ 5, 11] + | e yyy >>>>> [6] chr1 [ 6, 12] + | f yyy >>>>> [7] chr2 [ 1, 5] - | g zzz >>>>> [8] chr2 [ 2, 9] - | h zzz >>>>> [9] chr2 [10, 18] * | i www >>>>> [10] chr2 [11, 19] * | j www >>>>> --- >>>>> seqlengths: >>>>> chr1 chr2 >>>>> NA NA> x = reduce(gr)> xGRanges with 4 ranges and 0 >>>>> >>>>> elementMetadata values: >>>>> seqnames ranges strand >>>>> <rle> <iranges> <rle> >>>>> [1] chr1 [ 4, 12] + >>>>> [2] chr1 [ 1, 9] - >>>>> [3] chr2 [ 1, 9] - >>>>> [4] chr2 [10, 19] * >>>>> --- >>>>> seqlengths: >>>>> chr1 chr2 >>>>> NA NA >>>>> >>>>> >>>> This doesn't do exactly what you seek, but it gives a mangled names >>>> attributed that you can unmangle >>>> and reattach as metadata if you like >>>> >>>> unlist(reduce(split(gr, elementMetadata(gr)$Gene))) >>>> >>>> >>>> >>>>> >>>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> Fahim >>>>> >>>>> -- >>>>> ------------------------------**- >>>>> Fahim Mohammad >>>>> Bioinforformatics Lab >>>>> University of Louisville >>>>> Louisville, KY, USA >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**cond uctor<http: news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>>> >>>>> >>>> >>>> >>> >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Jason Ross ▴ 20
@jason-ross-4967
Last seen 9.6 years ago
Hi Fahim, I am also frustrated by this. The meta-data also vanishes when using findOverlaps(). I'm thinking of writing some wrapper functions to place the meta-data back into the Granges object. Regards, Jason.
ADD COMMENT
0
Entering edit mode
On 11/17/2011 05:57 PM, Jason Ross wrote: > Hi Fahim, > > I am also frustrated by this. The meta-data also vanishes when using > findOverlaps(). I'm thinking of writing some wrapper functions to place the > meta-data back into the Granges object. Hi Jason et al., The problem in 'reduce' is that the elementMetadata columns need to be 'reduce'd too, and there is no universal way to do that -- for 'transcripts' in Fahim's example, maybe it's just collapsing entries into a CharacterList, whereas for "Gene" it's split-by-reduced-range and 'unique'. For numeric values one might sum or mean or max or .... Can you be more specific about findOverlaps? It's not really clear which data you'd like to have propagated. For Fahim's question, I arrived at values(r)[["Gene"]] <- tapply(values(gr)[["Gene"]], match(gr, r), unique) which I think is quite robust, but I'd recommend checking carefully on complicated data. Martin > > Regards, > Jason. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
On Fri, Nov 18, 2011 at 9:12 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 11/17/2011 05:57 PM, Jason Ross wrote: >> >> Hi Fahim, >> >> I am also frustrated by this. The meta-data also vanishes when using >> findOverlaps(). I'm thinking of writing some wrapper functions to place >> the >> meta-data back into the Granges object. > > Hi Jason et al., > > The problem in 'reduce' is that the elementMetadata columns need to be > 'reduce'd too, and there is no universal way to do that -- for 'transcripts' > in Fahim's example, maybe it's just collapsing entries into a CharacterList, > whereas for "Gene" it's split-by-reduced-range and 'unique'. For numeric > values one might sum or mean or max or .... > > Can you be more specific about findOverlaps? It's not really clear which > data you'd like to have propagated. > > For Fahim's question, I arrived at > > values(r)[["Gene"]] <- > ? ?tapply(values(gr)[["Gene"]], match(gr, r), unique) > > which I think is quite robust, but I'd recommend checking carefully on > complicated data. When I am faced with distangling a reduce() call I always use findOverlaps, like grRed = reduce(gr) findOverlaps(gr, grRed) This may seem weird (kind of doing the computation twice), but in my experience it is quite fast. Kasper
ADD REPLY
0
Entering edit mode
Jason Ross ▴ 20
@jason-ross-4967
Last seen 9.6 years ago
Martin Morgan <mtmorgan at="" ...=""> writes: > > On 11/17/2011 05:57 PM, Jason Ross wrote: > > Hi Fahim, > > > > I am also frustrated by this. The meta-data also vanishes when using > > findOverlaps(). I'm thinking of writing some wrapper functions to place the > > meta-data back into the Granges object. > > Hi Jason et al., > > The problem in 'reduce' is that the elementMetadata columns need to be > 'reduce'd too, and there is no universal way to do that -- for > 'transcripts' in Fahim's example, maybe it's just collapsing entries > into a CharacterList, whereas for "Gene" it's split-by-reduced-range and > 'unique'. For numeric values one might sum or mean or max or .... > > Can you be more specific about findOverlaps? It's not really clear which > data you'd like to have propagated. > > For Fahim's question, I arrived at > > values(r)[["Gene"]] <- > tapply(values(gr)[["Gene"]], match(gr, r), unique) > > which I think is quite robust, but I'd recommend checking carefully on > complicated data. > > Martin > Hi Martin, I tend to use GenomicRanges objects a lot for annotating features so I want R merge or SQL join like functionality. I was joining data to annotations using mySQL but found the indices broke with range joins. I considered BEDtools but didn't like the constraints of only using BED/GFF and the shell. I switched to using GenomicRanges and findOverlaps as I liked the very efficient interval tree approach. I usually wrap the output of findOverlaps into a function emulating a left or inner join from two data frames. This process is handled natively, but rather inelegantly in BEDtools. GRanges is more powerful but doesn't offer boolean switches on union/intersect, etc or wrapper functions that keep the metadata. I appreciate that their is no universal way to disentangle metadata when aggregating but it would be nice to have some of the options available in the union/intersection/reduce functions, or in wrapper functions. At the moment I roll my own. Regardless, I find GenomicRanges, etc to be very useful and powerful and it's my preferred strategy in dealing with genomic data. Cheers, Jason. At first I created GRanges objects from the dataframes
ADD COMMENT
0
Entering edit mode
It would be cool to have more methods on the RangesMatching structure that could do these types of operations. RangesMatching does not contain the original objects though; those would need to be provided to the methods. Contributions are welcome and encouraged. Michael On Tue, Nov 29, 2011 at 5:44 PM, Jason Ross <jason.ross@csiro.au> wrote: > Martin Morgan <mtmorgan@...> writes: > > > > > On 11/17/2011 05:57 PM, Jason Ross wrote: > > > Hi Fahim, > > > > > > I am also frustrated by this. The meta-data also vanishes when using > > > findOverlaps(). I'm thinking of writing some wrapper functions to > place the > > > meta-data back into the Granges object. > > > > Hi Jason et al., > > > > The problem in 'reduce' is that the elementMetadata columns need to be > > 'reduce'd too, and there is no universal way to do that -- for > > 'transcripts' in Fahim's example, maybe it's just collapsing entries > > into a CharacterList, whereas for "Gene" it's split-by-reduced- range and > > 'unique'. For numeric values one might sum or mean or max or .... > > > > Can you be more specific about findOverlaps? It's not really clear which > > data you'd like to have propagated. > > > > For Fahim's question, I arrived at > > > > values(r)[["Gene"]] <- > > tapply(values(gr)[["Gene"]], match(gr, r), unique) > > > > which I think is quite robust, but I'd recommend checking carefully on > > complicated data. > > > > Martin > > > > Hi Martin, > > I tend to use GenomicRanges objects a lot for annotating features so I > want R > merge or SQL join like functionality. I was joining data to annotations > using > mySQL but found the indices broke with range joins. I considered BEDtools > but > didn't like the constraints of only using BED/GFF and the shell. I > switched to > using GenomicRanges and findOverlaps as I liked the very efficient > interval tree > approach. I usually wrap the output of findOverlaps into a function > emulating a > left or inner join from two data frames. This process is handled natively, > but > rather inelegantly in BEDtools. GRanges is more powerful but doesn't offer > boolean switches on union/intersect, etc or wrapper functions that keep the > metadata. > > I appreciate that their is no universal way to disentangle metadata when > aggregating but it would be nice to have some of the options available in > the > union/intersection/reduce functions, or in wrapper functions. At the > moment I > roll my own. > > Regardless, I find GenomicRanges, etc to be very useful and powerful and > it's my > preferred strategy in dealing with genomic data. > > Cheers, > Jason. > > > > > > > > > > > > At first I created GRanges objects from the dataframes > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6