Question: subset GRanges object via ElementMetadata
0
gravatar for Hermann Norpois
6.8 years ago by
Germany
Hermann Norpois170 wrote:
Hello, I am looking for a method to subset a GRangesObject by means of values (or ElementMetadata column), for instance over==2. How does it work? Thanks Hermann > test.gr GRanges with 6 ranges and 3 metadata columns: seqnames ranges strand | edensity epeak over <rle> <iranges> <rle> | <integer> <integer> <integer> [1] chr1 [713844, 714487] * | 1000 256 1 [2] chr1 [762136, 763199] * | 1000 771 2 [3] chr1 [780124, 780289] * | 519 74 0 [4] chr1 [780533, 780677] * | 516 68 0 [5] chr1 [781104, 781387] * | 601 140 0 [6] chr1 [793830, 794396] * | 610 290 0 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA > dput test.gr) new("GRanges" , seqnames = new("Rle" , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY"), class = "factor") , lengths = 6L , elementMetadata = NULL , metadata = list() ) , ranges = new("IRanges" , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L) , width = c(644L, 1064L, 166L, 145L, 284L, 567L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) , strand = new("Rle" , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") , lengths = 6L , elementMetadata = NULL , metadata = list() ) , elementMetadata = new("DataFrame" , rownames = NULL , nrows = 6L , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L, 601L, 610L ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over")) , elementType = "ANY" , elementMetadata = NULL , metadata = list() ) , seqinfo = new("Seqinfo" , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY") , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_) , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA) , genome = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ ) ) , metadata = list() ) [[alternative HTML version deleted]]
• 4.0k views
ADD COMMENTlink modified 6.8 years ago by Arnaud Amzallag100 • written 6.8 years ago by Hermann Norpois170
Answer: subset GRanges object via ElementMetadata
0
gravatar for Tim Triche
6.8 years ago by
Tim Triche4.2k
United States
Tim Triche4.2k wrote:
the shorthand method would be GR[ GR$over == 2 ] and in your example, R> test.gr GRanges with 6 ranges and 3 metadata columns: seqnames ranges strand | edensity epeak over <rle> <iranges> <rle> | <integer> <integer> <integer> [1] chr1 [713844, 714487] * | 1000 256 1 [2] chr1 [762136, 763199] * | 1000 771 2 [3] chr1 [780124, 780289] * | 519 74 0 [4] chr1 [780533, 780677] * | 516 68 0 [5] chr1 [781104, 781387] * | 601 140 0 [6] chr1 [793830, 794396] * | 610 290 0 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA R> test.gr[ test.gr$over == 2 ] GRanges with 1 range and 3 metadata columns: seqnames ranges strand | edensity epeak over <rle> <iranges> <rle> | <integer> <integer> <integer> [1] chr1 [762136, 763199] * | 1000 771 2 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA On Fri, Feb 22, 2013 at 7:33 AM, Hermann Norpois <hnorpois@gmail.com> wrote: > Hello, > > I am looking for a method to subset a GRangesObject by means of values (or > ElementMetadata column), for instance > over==2. > > How does it work? > > Thanks > Hermann > > > > test.gr > GRanges with 6 ranges and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [713844, 714487] * | 1000 256 1 > [2] chr1 [762136, 763199] * | 1000 771 2 > [3] chr1 [780124, 780289] * | 519 74 0 > [4] chr1 [780533, 780677] * | 516 68 0 > [5] chr1 [781104, 781387] * | 601 140 0 > [6] chr1 [793830, 794396] * | 610 290 0 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA NA NA NA > NA > > dput test.gr) > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 6L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L) > , width = c(644L, 1064L, 166L, 145L, 284L, 567L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > , lengths = 6L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 6L > , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L, > 601L, 610L > ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, > 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over")) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > ) > ) > , metadata = list() > ) > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENTlink written 6.8 years ago by Tim Triche4.2k
Btw, I hacked together a subset() method for GenomicRanges yesterday. It respects the metadata columns. Someone could probably come up with some reason why that violates the conceptual foundations of something, but I find it useful. So you could do: subset(gr, over == 2) Will commit shortly. Michael On Fri, Feb 22, 2013 at 10:10 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > the shorthand method would be > > GR[ GR$over == 2 ] > > and in your example, > > R> test.gr > GRanges with 6 ranges and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [713844, 714487] * | 1000 256 1 > [2] chr1 [762136, 763199] * | 1000 771 2 > [3] chr1 [780124, 780289] * | 519 74 0 > [4] chr1 [780533, 780677] * | 516 68 0 > [5] chr1 [781104, 781387] * | 601 140 0 > [6] chr1 [793830, 794396] * | 610 290 0 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA NA NA NA > NA > R> test.gr[ test.gr$over == 2 ] > GRanges with 1 range and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [762136, 763199] * | 1000 771 2 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA NA NA NA > NA > > > > > On Fri, Feb 22, 2013 at 7:33 AM, Hermann Norpois <hnorpois@gmail.com> > wrote: > > > Hello, > > > > I am looking for a method to subset a GRangesObject by means of values > (or > > ElementMetadata column), for instance > > over==2. > > > > How does it work? > > > > Thanks > > Hermann > > > > > > > test.gr > > GRanges with 6 ranges and 3 metadata columns: > > seqnames ranges strand | edensity epeak over > > <rle> <iranges> <rle> | <integer> <integer> <integer> > > [1] chr1 [713844, 714487] * | 1000 256 1 > > [2] chr1 [762136, 763199] * | 1000 771 2 > > [3] chr1 [780124, 780289] * | 519 74 0 > > [4] chr1 [780533, 780677] * | 516 68 0 > > [5] chr1 [781104, 781387] * | 601 140 0 > > [6] chr1 [793830, 794396] * | 610 290 0 > > --- > > seqlengths: > > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > > chrY > > NA NA NA NA NA NA ... NA NA NA NA NA > > NA > > > dput test.gr) > > new("GRanges" > > , seqnames = new("Rle" > > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", > "chr12", > > "chr13", > > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > > "chr8", "chr9", "chrX", "chrY"), class = "factor") > > , lengths = 6L > > , elementMetadata = NULL > > , metadata = list() > > ) > > , ranges = new("IRanges" > > , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L) > > , width = c(644L, 1064L, 166L, 145L, 284L, 567L) > > , NAMES = NULL > > , elementType = "integer" > > , elementMetadata = NULL > > , metadata = list() > > ) > > , strand = new("Rle" > > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > > , lengths = 6L > > , elementMetadata = NULL > > , metadata = list() > > ) > > , elementMetadata = new("DataFrame" > > , rownames = NULL > > , nrows = 6L > > , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L, > > 601L, 610L > > ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, > > 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over")) > > , elementType = "ANY" > > , elementMetadata = NULL > > , metadata = list() > > ) > > , seqinfo = new("Seqinfo" > > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > > "chr15", > > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > > "chrX", "chrY") > > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > > NA_integer_, > > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA) > > , genome = c(NA_character_, NA_character_, NA_character_, > > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > > ) > > ) > > , metadata = list() > > ) > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper< > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 6.8 years ago by Michael Lawrence11k
On Fri, Feb 22, 2013 at 3:56 PM, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: > Btw, I hacked together a subset() method for GenomicRanges yesterday. It > respects the metadata columns. Someone could probably come up with some > reason why that violates the conceptual foundations of something, but I > find it useful. > > So you could do: > subset(gr, over == 2) > > Will commit shortly. Yeah! My `love-of-all-things-semantically-impure`-ing self has been wanting this one for a long time :-) http://thread.gmane.org/gmane.comp.lang.r.sequencing/1239 Thanks! -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLYlink written 6.8 years ago by Steve Lianoglou12k
Hi Michael, On 02/22/2013 12:56 PM, Michael Lawrence wrote: > Btw, I hacked together a subset() method for GenomicRanges yesterday. It > respects the metadata columns. Someone could probably come up with some > reason why that violates the conceptual foundations of something, but I > find it useful. > > So you could do: > subset(gr, over == 2) Sounds good to me. Hopefully you set the method on Vector objects, rather than just GenomicRanges objects. Thanks, H. > > Will commit shortly. > > Michael > > > > > > On Fri, Feb 22, 2013 at 10:10 AM, Tim Triche, Jr. <tim.triche at="" gmail.com="">wrote: > >> the shorthand method would be >> >> GR[ GR$over == 2 ] >> >> and in your example, >> >> R> test.gr >> GRanges with 6 ranges and 3 metadata columns: >> seqnames ranges strand | edensity epeak over >> <rle> <iranges> <rle> | <integer> <integer> <integer> >> [1] chr1 [713844, 714487] * | 1000 256 1 >> [2] chr1 [762136, 763199] * | 1000 771 2 >> [3] chr1 [780124, 780289] * | 519 74 0 >> [4] chr1 [780533, 780677] * | 516 68 0 >> [5] chr1 [781104, 781387] * | 601 140 0 >> [6] chr1 [793830, 794396] * | 610 290 0 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >> R> test.gr[ test.gr$over == 2 ] >> GRanges with 1 range and 3 metadata columns: >> seqnames ranges strand | edensity epeak over >> <rle> <iranges> <rle> | <integer> <integer> <integer> >> [1] chr1 [762136, 763199] * | 1000 771 2 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >> >> >> >> >> On Fri, Feb 22, 2013 at 7:33 AM, Hermann Norpois <hnorpois at="" gmail.com=""> >> wrote: >> >>> Hello, >>> >>> I am looking for a method to subset a GRangesObject by means of values >> (or >>> ElementMetadata column), for instance >>> over==2. >>> >>> How does it work? >>> >>> Thanks >>> Hermann >>> >>> >>>> test.gr >>> GRanges with 6 ranges and 3 metadata columns: >>> seqnames ranges strand | edensity epeak over >>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>> [1] chr1 [713844, 714487] * | 1000 256 1 >>> [2] chr1 [762136, 763199] * | 1000 771 2 >>> [3] chr1 [780124, 780289] * | 519 74 0 >>> [4] chr1 [780533, 780677] * | 516 68 0 >>> [5] chr1 [781104, 781387] * | 601 140 0 >>> [6] chr1 [793830, 794396] * | 610 290 0 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA ... NA NA NA NA NA >>> NA >>>> dput test.gr) >>> new("GRanges" >>> , seqnames = new("Rle" >>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >> "chr12", >>> "chr13", >>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>> , lengths = 6L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , ranges = new("IRanges" >>> , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L) >>> , width = c(644L, 1064L, 166L, 145L, 284L, 567L) >>> , NAMES = NULL >>> , elementType = "integer" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , strand = new("Rle" >>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>> , lengths = 6L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , elementMetadata = new("DataFrame" >>> , rownames = NULL >>> , nrows = 6L >>> , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L, >>> 601L, 610L >>> ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, >>> 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over")) >>> , elementType = "ANY" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , seqinfo = new("Seqinfo" >>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>> "chr15", >>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>> "chrX", "chrY") >>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> NA, NA, >>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>> , genome = c(NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>> ) >>> ) >>> , metadata = list() >>> ) >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper< >> http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 6.8 years ago by Hervé Pagès ♦♦ 14k
Hi Hervé, That's what I ended up doing, actually. One question that came up though is whether we want to support 2D subsetting of all (or at least most) Vector objects, in the same manner as GRanges. I think it would work, how about you? Michael On Fri, Feb 22, 2013 at 5:33 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Michael, > > > On 02/22/2013 12:56 PM, Michael Lawrence wrote: > >> Btw, I hacked together a subset() method for GenomicRanges yesterday. It >> respects the metadata columns. Someone could probably come up with some >> reason why that violates the conceptual foundations of something, but I >> find it useful. >> >> So you could do: >> subset(gr, over == 2) >> > > Sounds good to me. Hopefully you set the method on Vector objects, > rather than just GenomicRanges objects. > > Thanks, > H. > > > >> Will commit shortly. >> >> Michael >> >> >> >> >> >> On Fri, Feb 22, 2013 at 10:10 AM, Tim Triche, Jr. <tim.triche@gmail.com>> >wrote: >> >> the shorthand method would be >>> >>> GR[ GR$over == 2 ] >>> >>> and in your example, >>> >>> R> test.gr >>> GRanges with 6 ranges and 3 metadata columns: >>> seqnames ranges strand | edensity epeak over >>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>> [1] chr1 [713844, 714487] * | 1000 256 1 >>> [2] chr1 [762136, 763199] * | 1000 771 2 >>> [3] chr1 [780124, 780289] * | 519 74 0 >>> [4] chr1 [780533, 780677] * | 516 68 0 >>> [5] chr1 [781104, 781387] * | 601 140 0 >>> [6] chr1 [793830, 794396] * | 610 290 0 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA ... NA NA NA NA NA >>> NA >>> R> test.gr[ test.gr$over == 2 ] >>> GRanges with 1 range and 3 metadata columns: >>> seqnames ranges strand | edensity epeak over >>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>> [1] chr1 [762136, 763199] * | 1000 771 2 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA ... NA NA NA NA NA >>> NA >>> >>> >>> >>> >>> On Fri, Feb 22, 2013 at 7:33 AM, Hermann Norpois <hnorpois@gmail.com> >>> wrote: >>> >>> Hello, >>>> >>>> I am looking for a method to subset a GRangesObject by means of values >>>> >>> (or >>> >>>> ElementMetadata column), for instance >>>> over==2. >>>> >>>> How does it work? >>>> >>>> Thanks >>>> Hermann >>>> >>>> >>>> test.gr >>>>> >>>> GRanges with 6 ranges and 3 metadata columns: >>>> seqnames ranges strand | edensity epeak over >>>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>>> [1] chr1 [713844, 714487] * | 1000 256 1 >>>> [2] chr1 [762136, 763199] * | 1000 771 2 >>>> [3] chr1 [780124, 780289] * | 519 74 0 >>>> [4] chr1 [780533, 780677] * | 516 68 0 >>>> [5] chr1 [781104, 781387] * | 601 140 0 >>>> [6] chr1 [793830, 794396] * | 610 290 0 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 >>>> chrX >>>> chrY >>>> NA NA NA NA NA NA ... NA NA NA NA >>>> NA >>>> NA >>>> >>>>> dput test.gr) >>>>> >>>> new("GRanges" >>>> , seqnames = new("Rle" >>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>> >>> "chr12", >>> >>>> "chr13", >>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>> , lengths = 6L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , ranges = new("IRanges" >>>> , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L) >>>> , width = c(644L, 1064L, 166L, 145L, 284L, 567L) >>>> , NAMES = NULL >>>> , elementType = "integer" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , strand = new("Rle" >>>> , values = structure(3L, .Label = c("+", "-", "*"), class = >>>> "factor") >>>> , lengths = 6L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , elementMetadata = new("DataFrame" >>>> , rownames = NULL >>>> , nrows = 6L >>>> , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L, >>>> 601L, 610L >>>> ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, >>>> 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over")) >>>> , elementType = "ANY" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , seqinfo = new("Seqinfo" >>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>>> "chr15", >>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>> "chrX", "chrY") >>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, >>>> NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>> NA, >>>> NA, NA, >>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> >>> NA_character_, >>> >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> >>> NA_character_, >>> >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> >>> NA_character_, >>> >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_ >>>> ) >>>> ) >>>> , metadata = list() >>>> ) >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> >>> -- >>> *A model is a lie that helps you see the truth.* >>> * >>> * >>> Howard Skipper< >>> http://cancerres.aacrjournals.**org/content/31/9/1173.full.pdf<htt p:="" cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >>> **> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
ADD REPLYlink written 6.7 years ago by Michael Lawrence11k
Hi Michael, On 02/23/2013 04:50 AM, Michael Lawrence wrote: > Hi Herv?, > > That's what I ended up doing, actually. One question that came up though > is whether we want to support 2D subsetting of all (or at least most) > Vector objects, in the same manner as GRanges. I think it would work, > how about you? If by 2D subsetting you're referring to gr[i,j], I'm opposed to it. I think it's a mistake to try to put the 2D *low-level* API on top of objects that are conceptually not 2D objects. The current situation where we have 2D subsetting already work on both GRanges and GRangesList objects but do different things is messy and tells me that we shouldn't have provided this in the first place. Sounds like the gr$foo story again. Hopefully gr$foo will remain a 1 time exception. I think subset() is already giving you something similar to the 2D subsetting right? H. > > Michael > > > On Fri, Feb 22, 2013 at 5:33 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote: > > Hi Michael, > > > On 02/22/2013 12:56 PM, Michael Lawrence wrote: > > Btw, I hacked together a subset() method for GenomicRanges > yesterday. It > respects the metadata columns. Someone could probably come up > with some > reason why that violates the conceptual foundations of > something, but I > find it useful. > > So you could do: > subset(gr, over == 2) > > > Sounds good to me. Hopefully you set the method on Vector objects, > rather than just GenomicRanges objects. > > Thanks, > H. > > > > Will commit shortly. > > Michael > > > > > > On Fri, Feb 22, 2013 at 10:10 AM, Tim Triche, Jr. > <tim.triche at="" gmail.com="" <mailto:tim.triche="" at="" gmail.com="">>wrote: > > the shorthand method would be > > GR[ GR$over == 2 ] > > and in your example, > > R> test.gr <http: test.gr=""> > GRanges with 6 ranges and 3 metadata columns: > seqnames ranges strand | edensity > epeak over > <rle> <iranges> <rle> | <integer> > <integer> <integer> > [1] chr1 [713844, 714487] * | 1000 > 256 1 > [2] chr1 [762136, 763199] * | 1000 > 771 2 > [3] chr1 [780124, 780289] * | 519 > 74 0 > [4] chr1 [780533, 780677] * | 516 > 68 0 > [5] chr1 [781104, 781387] * | 601 > 140 0 > [6] chr1 [793830, 794396] * | 610 > 290 0 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 > chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA > NA NA NA > NA > R> test.gr <http: test.gr="">[ test.gr <http: test.gr="">$over > == 2 ] > GRanges with 1 range and 3 metadata columns: > seqnames ranges strand | edensity > epeak over > <rle> <iranges> <rle> | <integer> > <integer> <integer> > [1] chr1 [762136, 763199] * | 1000 > 771 2 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 > chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA > NA NA NA > NA > > > > > On Fri, Feb 22, 2013 at 7:33 AM, Hermann Norpois > <hnorpois at="" gmail.com="" <mailto:hnorpois="" at="" gmail.com="">> > wrote: > > Hello, > > I am looking for a method to subset a GRangesObject by > means of values > > (or > > ElementMetadata column), for instance > over==2. > > How does it work? > > Thanks > Hermann > > > test.gr <http: test.gr=""> > > GRanges with 6 ranges and 3 metadata columns: > seqnames ranges strand | edensity > epeak over > <rle> <iranges> <rle> | <integer> > <integer> <integer> > [1] chr1 [713844, 714487] * | 1000 > 256 1 > [2] chr1 [762136, 763199] * | 1000 > 771 2 > [3] chr1 [780124, 780289] * | 519 > 74 0 > [4] chr1 [780533, 780677] * | 516 > 68 0 > [5] chr1 [781104, 781387] * | 601 > 140 0 > [6] chr1 [793830, 794396] * | 610 > 290 0 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 > chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA > NA NA NA > NA > > dput test.gr <http: test.gr="">) > > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", > "chr10", "chr11", > > "chr12", > > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", > "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", > "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 6L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(713844L, 762136L, 780124L, 780533L, > 781104L, 793830L) > , width = c(644L, 1064L, 166L, 145L, 284L, 567L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", > "*"), class = "factor") > , lengths = 6L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 6L > , listData = structure(list(edensity = c(1000L, > 1000L, 519L, 516L, > 601L, 610L > ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, > 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", > "over")) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", > "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", > "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", > "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, > NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, > NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, > > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, > > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, > > NA_character_, > > NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_ > ) > ) > , metadata = list() > ) > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper< > http://cancerres.aacrjournals.__org/content/31/9/1173.full.pdf > <http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">__> > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 6.7 years ago by Hervé Pagès ♦♦ 14k
The [i,meta_j] syntax is probably not very useful (at least in my experience), so I wouldn't favor it to the extent of $meta_name. For now, subset(select=) is probably sufficient. Michael On Mon, Feb 25, 2013 at 12:00 AM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Michael, > > > On 02/23/2013 04:50 AM, Michael Lawrence wrote: > >> Hi Hervé, >> >> That's what I ended up doing, actually. One question that came up though >> is whether we want to support 2D subsetting of all (or at least most) >> Vector objects, in the same manner as GRanges. I think it would work, >> how about you? >> > > If by 2D subsetting you're referring to gr[i,j], I'm opposed to it. > I think it's a mistake to try to put the 2D *low-level* API on top of > objects that are conceptually not 2D objects. The current situation > where we have 2D subsetting already work on both GRanges and > GRangesList objects but do different things is messy and tells me > that we shouldn't have provided this in the first place. > > Sounds like the gr$foo story again. Hopefully gr$foo will remain a 1 > time exception. > > I think subset() is already giving you something similar to the 2D > subsetting right? > > H. > > >> Michael >> >> >> On Fri, Feb 22, 2013 at 5:33 PM, Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>> wrote: >> >> Hi Michael, >> >> >> On 02/22/2013 12:56 PM, Michael Lawrence wrote: >> >> Btw, I hacked together a subset() method for GenomicRanges >> yesterday. It >> respects the metadata columns. Someone could probably come up >> with some >> reason why that violates the conceptual foundations of >> something, but I >> find it useful. >> >> So you could do: >> subset(gr, over == 2) >> >> >> Sounds good to me. Hopefully you set the method on Vector objects, >> rather than just GenomicRanges objects. >> >> Thanks, >> H. >> >> >> >> Will commit shortly. >> >> Michael >> >> >> >> >> >> On Fri, Feb 22, 2013 at 10:10 AM, Tim Triche, Jr. >> <tim.triche@gmail.com <mailto:tim.triche@gmail.com="">>**wrote: >> >> >> the shorthand method would be >> >> GR[ GR$over == 2 ] >> >> and in your example, >> >> R> test.gr <http: test.gr=""> >> >> GRanges with 6 ranges and 3 metadata columns: >> seqnames ranges strand | edensity >> epeak over >> <rle> <iranges> <rle> | <integer> >> <integer> <integer> >> [1] chr1 [713844, 714487] * | 1000 >> 256 1 >> [2] chr1 [762136, 763199] * | 1000 >> 771 2 >> [3] chr1 [780124, 780289] * | 519 >> 74 0 >> [4] chr1 [780533, 780677] * | 516 >> 68 0 >> [5] chr1 [781104, 781387] * | 601 >> 140 0 >> [6] chr1 [793830, 794396] * | 610 >> 290 0 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 >> chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA >> NA NA NA >> NA >> R> test.gr <http: test.gr="">[ test.gr <http: test.gr="">$over >> >> == 2 ] >> GRanges with 1 range and 3 metadata columns: >> seqnames ranges strand | edensity >> epeak over >> <rle> <iranges> <rle> | <integer> >> <integer> <integer> >> [1] chr1 [762136, 763199] * | 1000 >> 771 2 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 >> chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA >> NA NA NA >> NA >> >> >> >> >> On Fri, Feb 22, 2013 at 7:33 AM, Hermann Norpois >> <hnorpois@gmail.com <mailto:hnorpois@gmail.com="">> >> >> wrote: >> >> Hello, >> >> I am looking for a method to subset a GRangesObject by >> means of values >> >> (or >> >> ElementMetadata column), for instance >> over==2. >> >> How does it work? >> >> Thanks >> Hermann >> >> >> test.gr <http: test.gr=""> >> >> >> GRanges with 6 ranges and 3 metadata columns: >> seqnames ranges strand | edensity >> epeak over >> <rle> <iranges> <rle> | <integer> >> <integer> <integer> >> [1] chr1 [713844, 714487] * | 1000 >> 256 1 >> [2] chr1 [762136, 763199] * | 1000 >> 771 2 >> [3] chr1 [780124, 780289] * | 519 >> 74 0 >> [4] chr1 [780533, 780677] * | 516 >> 68 0 >> [5] chr1 [781104, 781387] * | 601 >> 140 0 >> [6] chr1 [793830, 794396] * | 610 >> 290 0 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 >> chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA >> NA NA NA >> NA >> >> dput test.gr <http: test.gr="">) >> >> >> new("GRanges" >> , seqnames = new("Rle" >> , values = structure(1L, .Label = c("chr1", >> "chr10", "chr11", >> >> "chr12", >> >> "chr13", >> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", >> "chr2", >> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", >> "chr6", "chr7", >> "chr8", "chr9", "chrX", "chrY"), class = "factor") >> , lengths = 6L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , ranges = new("IRanges" >> , start = c(713844L, 762136L, 780124L, 780533L, >> 781104L, 793830L) >> , width = c(644L, 1064L, 166L, 145L, 284L, 567L) >> , NAMES = NULL >> , elementType = "integer" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , strand = new("Rle" >> , values = structure(3L, .Label = c("+", "-", >> "*"), class = "factor") >> , lengths = 6L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , elementMetadata = new("DataFrame" >> , rownames = NULL >> , nrows = 6L >> , listData = structure(list(edensity = c(1000L, >> 1000L, 519L, 516L, >> 601L, 610L >> ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = >> c(1L, >> 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", >> "over")) >> , elementType = "ANY" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , seqinfo = new("Seqinfo" >> , seqnames = c("chr1", "chr10", "chr11", "chr12", >> "chr13", "chr14", >> "chr15", >> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", >> "chr21", >> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", >> "chr9", >> "chrX", "chrY") >> , seqlengths = c(NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, >> NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, >> NA, NA, NA, NA, NA, >> NA, NA, >> NA, NA, NA, NA, NA, NA, NA, NA, NA) >> , genome = c(NA_character_, NA_character_, >> NA_character_, >> NA_character_, >> NA_character_, NA_character_, NA_character_, >> NA_character_, >> >> NA_character_, >> >> NA_character_, NA_character_, NA_character_, >> NA_character_, >> >> NA_character_, >> >> NA_character_, NA_character_, NA_character_, >> NA_character_, >> >> NA_character_, >> >> NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_ >> ) >> ) >> , metadata = list() >> ) >> >> [[alternative HTML version deleted]] >> >> ______________________________**___________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> https://stat.ethz.ch/mailman/_**_listinfo/bioconduc tor<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> >> >> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconduct="" or<https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> > >> Search the archives: >> http://news.gmane.org/gmane.__** >> science.biology.informatics.__**conductor<http: news.gmane.org="" gma="" ne.__science.biology.informatics.__conductor=""> >> >> <http: news.gmane.org="" gmane.**="">> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > >> >> >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper< >> http://cancerres.aacrjournals.**__org/content/31/9/1173.full. >> **pdf >> <http: cancerres.**aacrjournals.org="" content="" 31="" 9="" **="">> 1173.full.pdf<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.f="" ull.pdf=""> >> >__> >> >> [[alternative HTML version deleted]] >> >> ______________________________**___________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**>> project.org <bioconductor@r-project.org>> >> https://stat.ethz.ch/mailman/_**_listinfo/bioconductor< https://stat.ethz.ch/mailman/__listinfo/bioconductor> >> >> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> > >> Search the archives: >> http://news.gmane.org/gmane.__** >> science.biology.informatics.__**conductor<http: news.gmane.org="" gma="" ne.__science.biology.informatics.__conductor=""> >> >> <http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> > >> >> >> [[alternative HTML version deleted]] >> >> ______________________________**___________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> https://stat.ethz.ch/mailman/_**_listinfo/bioconductor<http s:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> >> >> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> > >> Search the archives: >> http://news.gmane.org/gmane.__**science.biology.informatics.__** >> conductor<http: news.gmane.org="" gmane.__science.biology.informatics="" .__conductor=""> >> >> <http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> > >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
ADD REPLYlink written 6.7 years ago by Michael Lawrence11k
Answer: subset GRanges object via ElementMetadata
0
gravatar for Arnaud Amzallag
6.8 years ago by
Arnaud Amzallag100 wrote:
test.gr[valuestest.gr)$over %in% 2] works. test.gr[valuestest.gr)$over == 2] works too if over does not contains NAs. Arnaud On Feb 22, 2013, at 10:33 AM, Hermann Norpois wrote: > Hello, > > I am looking for a method to subset a GRangesObject by means of values (or > ElementMetadata column), for instance > over==2. > > How does it work? > > Thanks > Hermann > > >> test.gr > GRanges with 6 ranges and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [713844, 714487] * | 1000 256 1 > [2] chr1 [762136, 763199] * | 1000 771 2 > [3] chr1 [780124, 780289] * | 519 74 0 > [4] chr1 [780533, 780677] * | 516 68 0 > [5] chr1 [781104, 781387] * | 601 140 0 > [6] chr1 [793830, 794396] * | 610 290 0 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA NA NA NA > NA >> dput test.gr) > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 6L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L) > , width = c(644L, 1064L, 166L, 145L, 284L, 567L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > , lengths = 6L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 6L > , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L, > 601L, 610L > ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, > 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over")) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > ) > ) > , metadata = list() > ) > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 6.8 years ago by Arnaud Amzallag100
That's odd... I added an NA and sure enough, it fails: R> test.gr[ test.gr$over == 2 ] Error in IRanges:::normalizeSingleBracketSubscript(i, x) : subscript contains NAs But which() works fine: R> test.gr[ whichtest.gr$over == 2) ] GRanges with 1 range and 3 metadata columns: seqnames ranges strand | edensity epeak over <rle> <iranges> <rle> | <integer> <integer> <integer> [1] chr1 [762136, 763199] * | 1000 771 2 --- I wonder if this is an easy fix, too? On Fri, Feb 22, 2013 at 2:26 PM, Arnaud Amzallag <arnaud.amzallag@gmail.com>wrote: > test.gr[valuestest.gr)$over %in% 2] > > works. > > test.gr[valuestest.gr)$over == 2] works too if over does not contains > NAs. > > Arnaud > > On Feb 22, 2013, at 10:33 AM, Hermann Norpois wrote: > > > Hello, > > > > I am looking for a method to subset a GRangesObject by means of values > (or > > ElementMetadata column), for instance > > over==2. > > > > How does it work? > > > > Thanks > > Hermann > > > > > >> test.gr > > GRanges with 6 ranges and 3 metadata columns: > > seqnames ranges strand | edensity epeak over > > <rle> <iranges> <rle> | <integer> <integer> <integer> > > [1] chr1 [713844, 714487] * | 1000 256 1 > > [2] chr1 [762136, 763199] * | 1000 771 2 > > [3] chr1 [780124, 780289] * | 519 74 0 > > [4] chr1 [780533, 780677] * | 516 68 0 > > [5] chr1 [781104, 781387] * | 601 140 0 > > [6] chr1 [793830, 794396] * | 610 290 0 > > --- > > seqlengths: > > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > > chrY > > NA NA NA NA NA NA ... NA NA NA NA NA > > NA > >> dput test.gr) > > new("GRanges" > > , seqnames = new("Rle" > > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > > "chr13", > > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > > "chr8", "chr9", "chrX", "chrY"), class = "factor") > > , lengths = 6L > > , elementMetadata = NULL > > , metadata = list() > > ) > > , ranges = new("IRanges" > > , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L) > > , width = c(644L, 1064L, 166L, 145L, 284L, 567L) > > , NAMES = NULL > > , elementType = "integer" > > , elementMetadata = NULL > > , metadata = list() > > ) > > , strand = new("Rle" > > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > > , lengths = 6L > > , elementMetadata = NULL > > , metadata = list() > > ) > > , elementMetadata = new("DataFrame" > > , rownames = NULL > > , nrows = 6L > > , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L, > > 601L, 610L > > ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, > > 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over")) > > , elementType = "ANY" > > , elementMetadata = NULL > > , metadata = list() > > ) > > , seqinfo = new("Seqinfo" > > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > > "chr15", > > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > > "chrX", "chrY") > > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > > NA_integer_, > > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA) > > , genome = c(NA_character_, NA_character_, NA_character_, > > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, > > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > > ) > > ) > > , metadata = list() > > ) > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLYlink written 6.8 years ago by Tim Triche4.2k
On 02/22/2013 02:35 PM, Tim Triche, Jr. wrote: > That's odd... I added an NA and sure enough, it fails: > > R> test.gr[ test.gr$over == 2 ] > Error in IRanges:::normalizeSingleBracketSubscript(i, x) : > subscript contains NAs > > But which() works fine: > > R> test.gr[ whichtest.gr$over == 2) ] > GRanges with 1 range and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [762136, 763199] * | 1000 771 2 > --- > > I wonder if this is an easy fix, too? In base R, subscripting with NA leads to > x = 1:5 > x[NA] [1] NA NA NA NA NA which makes a weird sense (recycling a length 1 NA) but I/GRanges don't support the notion of NA-ranges. So not implemented by design and hence not fixable is probably the answer. Martin > > > > > On Fri, Feb 22, 2013 at 2:26 PM, Arnaud Amzallag > <arnaud.amzallag at="" gmail.com="">wrote: > >> test.gr[valuestest.gr)$over %in% 2] >> >> works. >> >> test.gr[valuestest.gr)$over == 2] works too if over does not contains >> NAs. >> >> Arnaud >> >> On Feb 22, 2013, at 10:33 AM, Hermann Norpois wrote: >> >>> Hello, >>> >>> I am looking for a method to subset a GRangesObject by means of values >> (or >>> ElementMetadata column), for instance >>> over==2. >>> >>> How does it work? >>> >>> Thanks >>> Hermann >>> >>> >>>> test.gr >>> GRanges with 6 ranges and 3 metadata columns: >>> seqnames ranges strand | edensity epeak over >>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>> [1] chr1 [713844, 714487] * | 1000 256 1 >>> [2] chr1 [762136, 763199] * | 1000 771 2 >>> [3] chr1 [780124, 780289] * | 519 74 0 >>> [4] chr1 [780533, 780677] * | 516 68 0 >>> [5] chr1 [781104, 781387] * | 601 140 0 >>> [6] chr1 [793830, 794396] * | 610 290 0 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA ... NA NA NA NA NA >>> NA >>>> dput test.gr) >>> new("GRanges" >>> , seqnames = new("Rle" >>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", >>> "chr13", >>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>> , lengths = 6L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , ranges = new("IRanges" >>> , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L) >>> , width = c(644L, 1064L, 166L, 145L, 284L, 567L) >>> , NAMES = NULL >>> , elementType = "integer" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , strand = new("Rle" >>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>> , lengths = 6L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , elementMetadata = new("DataFrame" >>> , rownames = NULL >>> , nrows = 6L >>> , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L, >>> 601L, 610L >>> ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L, >>> 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over")) >>> , elementType = "ANY" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , seqinfo = new("Seqinfo" >>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>> "chr15", >>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>> "chrX", "chrY") >>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> NA, NA, >>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>> , genome = c(NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>> ) >>> ) >>> , metadata = list() >>> ) >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLYlink written 6.8 years ago by Martin Morgan ♦♦ 24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 286 users visited in the last hour