Selecting elements in GRanges object by element metadata
2
1
Entering edit mode
@michael-muratet-3076
Last seen 10.2 years ago
Greetings I would like to select elements from a GRanges object by testing values in the metadata columns. This seems to work OK: x.gr[whichelementMetadatax.gr)$fdr<0.05)] So does this, although there's nothing in the documentation about the [] operator accepting logical values: fosl2.th17.gr[elementMetadatafosl2.th17.gr)$fdr<0.05] The problem arises when I try to select from a GRanges object where the metadata columns have NAs: > tss.annot.gr[na.omitelementMetadatatss.annot.gr)$GENE>0),"GENE"] GRanges with 4028 ranges and 1 elementMetadata col: seqnames ranges strand | GENE <rle> <iranges> <rle> | <numeric> [1] chr1 [ 3659579, 3662079] - | <na> [2] chr1 [ 4847394, 4849894] + | 0 [3] chr1 [10025979, 10028479] - | <na> [4] chr1 [17085879, 17088379] - | <na> [5] chr1 [21067298, 21069798] - | <na> [6] chr1 [21949662, 21952162] - | 0 [7] chr1 [23388014, 23390514] - | <na> [8] chr1 [23768264, 23770764] + | <na> [9] chr1 [23927128, 23929628] - | <na> ... ... ... ... ... ... [4020] chr2 [126607180, 126609680] - | 0 [4021] chr2 [127345106, 127347606] - | 0 [4022] chr2 [129195132, 129197632] + | -1.223140339 [4023] chr2 [129194856, 129197356] - | -1.628782357 [4024] chr2 [129360338, 129362838] - | -1.475535653 [4025] chr2 [129837609, 129840109] + | 0 [4026] chr2 [129948520, 129951020] + | 0 [4027] chr2 [140213446, 140215946] - | 0 [4028] chr2 [148267271, 148269771] - | -1.564551101 The values returned violate the condition. It won't work at all without na.omit. I can coerce the GRanges object to a data.frame, do the selection and create a new GRanges object, but I'm hoping there is a way to do it directly. Am I using the syntax correctly? Is there something peculiar about a DataFrame vs a data.frame that's getting in the way? Thanks Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806
• 4.8k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 17 months ago
United States
What you are reporting is true for any (well, there may be exceptions I guess) subsetting. Try for example with a standard matrix. The solution is to add which(). Contrast > x = c(1,2,NA) > x == 2 [1] FALSE TRUE NA > which(x == 2) [1] 2 Kasper On Wed, Jul 11, 2012 at 11:09 AM, Michael Muratet <mmuratet at="" hudsonalpha.org=""> wrote: > Greetings > > I would like to select elements from a GRanges object by testing values in > the metadata columns. This seems to work OK: > > x.gr[which(elementMetadatax.gr)$fdr<0.05)] > > So does this, although there's nothing in the documentation about the [] > operator accepting logical values: > > fosl2.th17.gr[elementMetadatafosl2.th17.gr)$fdr<0.05] > > The problem arises when I try to select from a GRanges object where the > metadata columns have NAs: > >> tss.annot.gr[na.omit(elementMetadatatss.annot.gr)$GENE>0),"GENE"] > GRanges with 4028 ranges and 1 elementMetadata col: > seqnames ranges strand | GENE > <rle> <iranges> <rle> | <numeric> > [1] chr1 [ 3659579, 3662079] - | <na> > [2] chr1 [ 4847394, 4849894] + | 0 > [3] chr1 [10025979, 10028479] - | <na> > [4] chr1 [17085879, 17088379] - | <na> > [5] chr1 [21067298, 21069798] - | <na> > [6] chr1 [21949662, 21952162] - | 0 > [7] chr1 [23388014, 23390514] - | <na> > [8] chr1 [23768264, 23770764] + | <na> > [9] chr1 [23927128, 23929628] - | <na> > ... ... ... ... ... ... > [4020] chr2 [126607180, 126609680] - | 0 > [4021] chr2 [127345106, 127347606] - | 0 > [4022] chr2 [129195132, 129197632] + | -1.223140339 > [4023] chr2 [129194856, 129197356] - | -1.628782357 > [4024] chr2 [129360338, 129362838] - | -1.475535653 > [4025] chr2 [129837609, 129840109] + | 0 > [4026] chr2 [129948520, 129951020] + | 0 > [4027] chr2 [140213446, 140215946] - | 0 > [4028] chr2 [148267271, 148269771] - | -1.564551101 > > The values returned violate the condition. It won't work at all without > na.omit. > > I can coerce the GRanges object to a data.frame, do the selection and create > a new GRanges object, but I'm hoping there is a way to do it directly. > > Am I using the syntax correctly? Is there something peculiar about a > DataFrame vs a data.frame that's getting in the way? > > Thanks > > Mike > > > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
On Jul 11, 2012, at 10:21 AM, Kasper Daniel Hansen wrote: > What you are reporting is true for any (well, there may be exceptions > I guess) subsetting. Try for example with a standard matrix. The > solution is to add which(). Contrast > >> x = c(1,2,NA) >> x == 2 > [1] FALSE TRUE NA >> which(x == 2) > [1] 2 > > Kasper > Thanks, I should have tried that before. This syntax works: > tss.annot.gr[na.omit(which(elementMetadatatss.annot.gr) $GENE>0)),"GENE"] GRanges with 3446 ranges and 1 elementMetadata col: seqnames ranges strand | GENE <rle> <iranges> <rle> | <numeric> [1] chr1 [ 4773791, 4776291] - | 1.063973966 [2] chr1 [ 5007460, 5009960] - | 1.668134677 [3] chr1 [16092486, 16094986] - | 1.748685661 [4] chr1 [36737931, 36740431] - | 1.465666717 [5] chr1 [38052053, 38054553] - | 1.750940655 [6] chr1 [38054354, 38056854] + | 1.677518675 [7] chr1 [39592146, 39594646] + | 0.696900841 [8] chr1 [40380974, 40383474] + | 0.777552281 [9] chr1 [40738056, 40740556] + | 0.511665769 Mike > On Wed, Jul 11, 2012 at 11:09 AM, Michael Muratet > <mmuratet at="" hudsonalpha.org=""> wrote: >> Greetings >> >> I would like to select elements from a GRanges object by testing >> values in >> the metadata columns. This seems to work OK: >> >> x.gr[which(elementMetadatax.gr)$fdr<0.05)] >> >> So does this, although there's nothing in the documentation about >> the [] >> operator accepting logical values: >> >> fosl2.th17.gr[elementMetadatafosl2.th17.gr)$fdr<0.05] >> >> The problem arises when I try to select from a GRanges object where >> the >> metadata columns have NAs: >> >>> tss.annot.gr[na.omit(elementMetadatatss.annot.gr)$GENE>0),"GENE"] >> GRanges with 4028 ranges and 1 elementMetadata col: >> seqnames ranges strand | GENE >> <rle> <iranges> <rle> | <numeric> >> [1] chr1 [ 3659579, 3662079] - | <na> >> [2] chr1 [ 4847394, 4849894] + | 0 >> [3] chr1 [10025979, 10028479] - | <na> >> [4] chr1 [17085879, 17088379] - | <na> >> [5] chr1 [21067298, 21069798] - | <na> >> [6] chr1 [21949662, 21952162] - | 0 >> [7] chr1 [23388014, 23390514] - | <na> >> [8] chr1 [23768264, 23770764] + | <na> >> [9] chr1 [23927128, 23929628] - | <na> >> ... ... ... ... ... ... >> [4020] chr2 [126607180, 126609680] - | 0 >> [4021] chr2 [127345106, 127347606] - | 0 >> [4022] chr2 [129195132, 129197632] + | -1.223140339 >> [4023] chr2 [129194856, 129197356] - | -1.628782357 >> [4024] chr2 [129360338, 129362838] - | -1.475535653 >> [4025] chr2 [129837609, 129840109] + | 0 >> [4026] chr2 [129948520, 129951020] + | 0 >> [4027] chr2 [140213446, 140215946] - | 0 >> [4028] chr2 [148267271, 148269771] - | -1.564551101 >> >> The values returned violate the condition. It won't work at all >> without >> na.omit. >> >> I can coerce the GRanges object to a data.frame, do the selection >> and create >> a new GRanges object, but I'm hoping there is a way to do it >> directly. >> >> Am I using the syntax correctly? Is there something peculiar about a >> DataFrame vs a data.frame that's getting in the way? >> >> Thanks >> >> Mike >> >> >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806
ADD REPLY
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
On Wed, Jul 11, 2012 at 8:09 AM, Michael Muratet <mmuratet@hudsonalpha.org>wrote: > Greetings > > I would like to select elements from a GRanges object by testing values in > the metadata columns. This seems to work OK: > > x.gr[which(elementMetadata(x.**gr <http: x.gr="">)$fdr<0.05)] > > So does this, although there's nothing in the documentation about the [] > operator accepting logical values: > > fosl2.th17.gr[elementMetadata(**fosl2.th17.gr <http: fosl2.th17.gr=""> > )$fdr<0.05] > > The problem arises when I try to select from a GRanges object where the > metadata columns have NAs: > > > tss.annot.gr[na.omit(**elementMetadatatss.annot.gr)$**GENE>0),"GENE"] > GRanges with 4028 ranges and 1 elementMetadata col: > seqnames ranges strand | GENE > <rle> <iranges> <rle> | <numeric> > [1] chr1 [ 3659579, 3662079] - | <na> > [2] chr1 [ 4847394, 4849894] + | 0 > [3] chr1 [10025979, 10028479] - | <na> > [4] chr1 [17085879, 17088379] - | <na> > [5] chr1 [21067298, 21069798] - | <na> > [6] chr1 [21949662, 21952162] - | 0 > [7] chr1 [23388014, 23390514] - | <na> > [8] chr1 [23768264, 23770764] + | <na> > [9] chr1 [23927128, 23929628] - | <na> > ... ... ... ... ... ... > [4020] chr2 [126607180, 126609680] - | 0 > [4021] chr2 [127345106, 127347606] - | 0 > [4022] chr2 [129195132, 129197632] + | -1.223140339 > [4023] chr2 [129194856, 129197356] - | -1.628782357 > [4024] chr2 [129360338, 129362838] - | -1.475535653 > [4025] chr2 [129837609, 129840109] + | 0 > [4026] chr2 [129948520, 129951020] + | 0 > [4027] chr2 [140213446, 140215946] - | 0 > [4028] chr2 [148267271, 148269771] - | -1.564551101 > > The values returned violate the condition. It won't work at all without > na.omit. > > I can coerce the GRanges object to a data.frame, do the selection and > create a new GRanges object, but I'm hoping there is a way to do it > directly. > > Am I using the syntax correctly? Is there something peculiar about a > DataFrame vs a data.frame that's getting in the way? > > It sounds like you are not very familiar with how R works. The logical subsetting works by selecting every element "i" for which v[i] is TRUE. If you call na.omit(), you're changing the length/geometry of the vector by dropping elements and losing the parallel correspondence between the logical vector and the granges. If you had an integer vector, then dropping elements is just fine. But as Kasper points out, you don't need to call na.omit on a which result, because it only reports indexes that are TRUE. I just wanted to make sure that you understood the reasons behind the behavior. Apologies if it was already obvious. Thanks > > Mike > > > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet@hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6