Question

[GenomicRanges] subsetByOverlaps to keep info from both GRanges objects?

2

Entering edit mode

enricoferrero ▴ 660

@enricoferrero-6037

Last seen 2.4 years ago

Switzerland

Hi, I have two GRanges objects, the first one is a list of SNPs, the second one are DNase hypersensitivity sites: ########## library(GenomicRanges) ... > snp GRanges with 192 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> rs000001 chr1 [ 37967779, 37967780] + | 0 rs000002 chr1 [165967416, 165967417] - | 0 rs000003 chr1 [218860069, 218860070] - | 0 rs000004 chr1 [ 17306673, 17306674] - | 0 rs000005 chr1 [ 41293414, 41293415] + | 0 ... ... ... ... ... ... rs000188 chr8 [ 97522507, 97522508] - | 0 rs000189 chr8 [ 15532582, 15532583] + | 0 rs000190 chr8 [ 72270031, 72270032] + | 0 rs000191 chr9 [126511086, 126511087] + | 0 rs000192 chr9 [ 98231008, 98231009] + | 0 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 ... chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 NA NA NA NA NA NA NA NA NA ... NA NA NA NA NA NA NA NA NA > dnase GRanges with 145038 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> [1] chr1 [ 10120, 10270] * | 0 [2] chr1 [237700, 237850] * | 0 [3] chr1 [521440, 521590] * | 0 [4] chr1 [565560, 565710] * | 0 [5] chr1 [565860, 566010] * | 0 ... ... ... ... ... ... [145034] chrX [154543640, 154543790] * | 0 [145035] chrX [154560420, 154560570] * | 0 [145036] chrX [154563960, 154564110] * | 0 [145037] chrX [154842100, 154842250] * | 0 [145038] chrX [154862200, 154862350] * | 0 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ########## I can use subsetByOverlaps() in both directions to compute the overlap between them and return a GRanges object: ########## > subsetByOverlaps(dnase, snp) GRanges with 5 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> [1] chr1 [ 17306560, 17306710] * | 0 [2] chr2 [169869820, 169869970] * | 0 [3] chr4 [145506440, 145506590] * | 0 [4] chr5 [ 15014080, 15014230] * | 0 [5] chr5 [ 15117400, 15117550] * | 0 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > subsetByOverlaps(snp, dnase) GRanges with 6 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> rs2235746 chr1 [ 17306671, 17306672] - | 0 rs4157777 chr2 [169869904, 169869904] - | 0 rs6858330 chr4 [145506558, 145506559] + | 0 rs13146741 chr4 [145506453, 145506454] + | 0 rs32847 chr5 [ 15117438, 15117439] + | 0 rs7341842 chr5 [ 15014184, 15014185] + | 0 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ########## The first GRanges objects stores the DNase genomic locations overlapping with the SNPs, while the second one contains the SNPs IDs (as GRanges names) and genomic locations overlapping with the DNase dataset. Now, what I actually need is a GRanges object that stores the SNPs IDs and the DNase genomic locations. Is this possible? Thank you. Best, -- Enrico Ferrero PhD Student Department of Genetics Cambridge Systems Biology Centre University of Cambridge

• 13k views

ADD COMMENT • link updated 10.7 years ago by Valerie Obenchain ★ 6.8k • written 10.7 years ago by enricoferrero ▴ 660

score 1 · Answer 1 · 2013-08-20

1

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hi Enrico, Here is a toy example of two GRanges, one with snp locations and the other with dnase. gr1 <- GRanges("chr1", IRanges(1:5, width=5, names=paste0("rsid:", letters[1:5])), score=1:5) > gr1 GRanges with 5 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> rsid:a chr1 [1, 5] * | 1 rsid:b chr1 [2, 6] * | 2 rsid:c chr1 [3, 7] * | 3 rsid:d chr1 [4, 8] * | 4 rsid:e chr1 [5, 9] * | 5 --- seqlengths: chr1 NA gr2 <- GRanges("chr1", IRanges(8:12, width=5, names=paste0("dnase:", letters[1:5])), score=10:14) > gr2 GRanges with 5 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> dnase:a chr1 [ 8, 12] * | 10 dnase:b chr1 [ 9, 13] * | 11 dnase:c chr1 [10, 14] * | 12 dnase:d chr1 [11, 15] * | 13 dnase:e chr1 [12, 16] * | 14 --- seqlengths: chr1 NA ranges <- subsetByOverlaps(gr2, gr1) > ranges GRanges with 2 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> dnase:a chr1 [8, 12] * | 10 dnase:b chr1 [9, 13] * | 11 --- seqlengths: chr1 NA The function called by subsetByOverlaps is findOverlaps (described on same man page as ?subsetByOverlaps). The output of findOverlaps is a 'Hits' object indicating which of the query and subject overlap. hits <- findOverlaps(gr2, gr1) > hits Hits of length 3 queryLength: 5 subjectLength: 5 queryHits subjectHits <integer> <integer> 1 1 4 2 1 5 3 2 5 We want meta information from gr1 (snps). In the call to findOverlaps gr1 was the subject so we use the 'subjectHits' indices to subset the snp GRanges. idx <- unique(subjectHits(hits)) values <- DataFrame(rsid=names(gr1)[idx], snpscore=gr1$score[idx]) Add the metadata to the dnase ranges. mcols(ranges) <- c(mcols(ranges), values) > ranges GRanges with 2 ranges and 3 metadata columns: seqnames ranges strand | score rsid snpscore <rle> <iranges> <rle> | <integer> <character> <integer> dnase:a chr1 [8, 12] * | 10 rsid:d 4 dnase:b chr1 [9, 13] * | 11 rsid:e 5 --- seqlengths: chr1 NA Valerie On 08/20/2013 03:51 AM, Enrico Ferrero wrote: > Hi, > > I have two GRanges objects, the first one is a list of SNPs, the > second one are DNase hypersensitivity sites: > > ########## > library(GenomicRanges) > ... > >> snp > GRanges with 192 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > rs000001 chr1 [ 37967779, 37967780] + | 0 > rs000002 chr1 [165967416, 165967417] - | 0 > rs000003 chr1 [218860069, 218860070] - | 0 > rs000004 chr1 [ 17306673, 17306674] - | 0 > rs000005 chr1 [ 41293414, 41293415] + | 0 > ... ... ... ... ... ... > rs000188 chr8 [ 97522507, 97522508] - | 0 > rs000189 chr8 [ 15532582, 15532583] + | 0 > rs000190 chr8 [ 72270031, 72270032] + | 0 > rs000191 chr9 [126511086, 126511087] + | 0 > rs000192 chr9 [ 98231008, 98231009] + | 0 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 ... chr21 > chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 > NA NA NA NA NA NA NA NA NA ... NA > NA NA NA NA NA NA NA NA > >> dnase > GRanges with 145038 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > [1] chr1 [ 10120, 10270] * | 0 > [2] chr1 [237700, 237850] * | 0 > [3] chr1 [521440, 521590] * | 0 > [4] chr1 [565560, 565710] * | 0 > [5] chr1 [565860, 566010] * | 0 > ... ... ... ... ... ... > [145034] chrX [154543640, 154543790] * | 0 > [145035] chrX [154560420, 154560570] * | 0 > [145036] chrX [154563960, 154564110] * | 0 > [145037] chrX [154842100, 154842250] * | 0 > [145038] chrX [154862200, 154862350] * | 0 > > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 > chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA NA NA NA NA NA > NA NA NA NA NA NA NA NA NA NA NA NA > NA > ########## > > I can use subsetByOverlaps() in both directions to compute the overlap > between them and return a GRanges object: > > ########## >> subsetByOverlaps(dnase, snp) > GRanges with 5 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > [1] chr1 [ 17306560, 17306710] * | 0 > [2] chr2 [169869820, 169869970] * | 0 > [3] chr4 [145506440, 145506590] * | 0 > [4] chr5 [ 15014080, 15014230] * | 0 > [5] chr5 [ 15117400, 15117550] * | 0 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 > chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA NA NA NA NA NA > NA NA NA NA NA NA NA NA NA NA NA NA > NA > >> subsetByOverlaps(snp, dnase) > GRanges with 6 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > rs2235746 chr1 [ 17306671, 17306672] - | 0 > rs4157777 chr2 [169869904, 169869904] - | 0 > rs6858330 chr4 [145506558, 145506559] + | 0 > rs13146741 chr4 [145506453, 145506454] + | 0 > rs32847 chr5 [ 15117438, 15117439] + | 0 > rs7341842 chr5 [ 15014184, 15014185] + | 0 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 > chr2 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 > NA NA NA NA NA NA NA NA NA NA NA > NA NA NA NA NA NA NA NA NA NA > ########## > > The first GRanges objects stores the DNase genomic locations > overlapping with the SNPs, while the second one contains the SNPs IDs > (as GRanges names) and genomic locations overlapping with the DNase > dataset. > > Now, what I actually need is a GRanges object that stores the SNPs IDs > and the DNase genomic locations. Is this possible? > > Thank you. > Best, > >

ADD COMMENT • link 10.7 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Hi Valerie, Thanks very much for your reply. You propose an interesting approach, which unfortunately doesn't work for my specific case. In short, the problem is that multiple SNPs (single nucleotides) can map to a single DNase HS site (often very large regions). I have edited your example a bit so that it's easier to reproduce the issue: ########## > gr1 <- GRanges("chr1", IRanges(c(5, 10, 15), c(5, 10, 15), names=paste0("rsid:", letters[1:3])), score=rep(0,3)) > gr1 GRanges with 3 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <numeric> rsid:a chr1 [ 5, 5] * | 0 rsid:b chr1 [10, 10] * | 0 rsid:c chr1 [15, 15] * | 0 --- seqlengths: chr1 NA > gr2 <- GRanges("chr1", IRanges(c(4,6,45), c(8,20,50), names=paste0("dnase:", letters[1:3])), score=rep(0,3)) > gr2 GRanges with 3 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <numeric> dnase:a chr1 [ 4, 8] * | 0 dnase:b chr1 [ 6, 20] * | 0 dnase:c chr1 [45, 50] * | 0 --- seqlengths: chr1 NA > ranges <- subsetByOverlaps(gr2,gr1) > ranges GRanges with 2 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <numeric> dnase:a chr1 [4, 8] * | 0 dnase:b chr1 [6, 20] * | 0 --- seqlengths: chr1 NA > revRanges <- subsetByOverlaps(gr1,gr2) > revRanges GRanges with 3 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <numeric> rsid:a chr1 [ 5, 5] * | 0 rsid:b chr1 [10, 10] * | 0 rsid:c chr1 [15, 15] * | 0 --- seqlengths: chr1 NA > hits <- findOverlaps(gr2, gr1) > idx <- unique(subjectHits(hits)) > values <- DataFrame(rsid=names(gr1)[idx]) > mcols(ranges) <- c(mcols(ranges), values) > ranges GRanges with 2 ranges and 2 metadata columns: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 3 ########## As you can see, rsid:a is in the dnase:a region, but both rsid:b and rsid:c hit the dnase:b region. As a result, when you try to add the SNPs IDs from gr1 as metacolumns of ranges, you create a GRanges object with different number of rows. Is there any workaround for this? At the moment I'm creating two GRanges objects for each comparison, one keeping the SNPs IDs and coordinates, and the other storing the DNase regions hit by the SNPs, but this is less than ideal and produces a lot of unnecessary output (R objects and exported BED files). Ideally, I'd like to have a GRanges object where each row is a DNase HS site hit by a specific SNP. Thank you. Best, On 20 August 2013 17:57, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > Hi Enrico, > > Here is a toy example of two GRanges, one with snp locations and the other > with dnase. > > gr1 <- GRanges("chr1", IRanges(1:5, width=5, > names=paste0("rsid:", letters[1:5])), score=1:5) >> gr1 > > GRanges with 5 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > rsid:a chr1 [1, 5] * | 1 > rsid:b chr1 [2, 6] * | 2 > rsid:c chr1 [3, 7] * | 3 > rsid:d chr1 [4, 8] * | 4 > rsid:e chr1 [5, 9] * | 5 > --- > seqlengths: > chr1 > NA > > gr2 <- GRanges("chr1", IRanges(8:12, width=5, > names=paste0("dnase:", letters[1:5])), score=10:14) >> gr2 > > GRanges with 5 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > dnase:a chr1 [ 8, 12] * | 10 > dnase:b chr1 [ 9, 13] * | 11 > dnase:c chr1 [10, 14] * | 12 > dnase:d chr1 [11, 15] * | 13 > dnase:e chr1 [12, 16] * | 14 > --- > seqlengths: > chr1 > NA > > > ranges <- subsetByOverlaps(gr2, gr1) >> ranges > GRanges with 2 ranges and 1 metadata column: > > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > dnase:a chr1 [8, 12] * | 10 > dnase:b chr1 [9, 13] * | 11 > --- > seqlengths: > chr1 > NA > > The function called by subsetByOverlaps is findOverlaps (described on same > man page as ?subsetByOverlaps). The output of findOverlaps is a 'Hits' > object indicating which of the query and subject overlap. > > hits <- findOverlaps(gr2, gr1) >> hits > Hits of length 3 > queryLength: 5 > subjectLength: 5 > queryHits subjectHits > <integer> <integer> > 1 1 4 > 2 1 5 > 3 2 5 > > We want meta information from gr1 (snps). In the call to findOverlaps gr1 > was the subject so we use the 'subjectHits' indices to subset the snp > GRanges. > > idx <- unique(subjectHits(hits)) > values <- DataFrame(rsid=names(gr1)[idx], snpscore=gr1$score[idx]) > > Add the metadata to the dnase ranges. > > mcols(ranges) <- c(mcols(ranges), values) >> ranges > GRanges with 2 ranges and 3 metadata columns: > seqnames ranges strand | score rsid snpscore > <rle> <iranges> <rle> | <integer> <character> <integer> > dnase:a chr1 [8, 12] * | 10 rsid:d 4 > dnase:b chr1 [9, 13] * | 11 rsid:e 5 > --- > seqlengths: > chr1 > NA > > > Valerie > > > On 08/20/2013 03:51 AM, Enrico Ferrero wrote: >> >> Hi, >> >> I have two GRanges objects, the first one is a list of SNPs, the >> second one are DNase hypersensitivity sites: >> >> ########## >> library(GenomicRanges) >> ... >> >>> snp >> >> GRanges with 192 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <integer> >> rs000001 chr1 [ 37967779, 37967780] + | 0 >> rs000002 chr1 [165967416, 165967417] - | 0 >> rs000003 chr1 [218860069, 218860070] - | 0 >> rs000004 chr1 [ 17306673, 17306674] - | 0 >> rs000005 chr1 [ 41293414, 41293415] + | 0 >> ... ... ... ... ... ... >> rs000188 chr8 [ 97522507, 97522508] - | 0 >> rs000189 chr8 [ 15532582, 15532583] + | 0 >> rs000190 chr8 [ 72270031, 72270032] + | 0 >> rs000191 chr9 [126511086, 126511087] + | 0 >> rs000192 chr9 [ 98231008, 98231009] + | 0 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 ... chr21 >> chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 >> NA NA NA NA NA NA NA NA NA ... NA >> NA NA NA NA NA NA NA NA >> >>> dnase >> >> GRanges with 145038 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <integer> >> [1] chr1 [ 10120, 10270] * | 0 >> [2] chr1 [237700, 237850] * | 0 >> [3] chr1 [521440, 521590] * | 0 >> [4] chr1 [565560, 565710] * | 0 >> [5] chr1 [565860, 566010] * | 0 >> ... ... ... ... ... ... >> [145034] chrX [154543640, 154543790] * | 0 >> [145035] chrX [154560420, 154560570] * | 0 >> [145036] chrX [154563960, 154564110] * | 0 >> [145037] chrX [154842100, 154842250] * | 0 >> [145038] chrX [154862200, 154862350] * | 0 >> >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >> chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA NA NA NA NA NA >> NA NA NA NA NA NA NA NA NA NA NA NA >> NA >> ########## >> >> I can use subsetByOverlaps() in both directions to compute the overlap >> between them and return a GRanges object: >> >> ########## >>> >>> subsetByOverlaps(dnase, snp) >> >> GRanges with 5 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <integer> >> [1] chr1 [ 17306560, 17306710] * | 0 >> [2] chr2 [169869820, 169869970] * | 0 >> [3] chr4 [145506440, 145506590] * | 0 >> [4] chr5 [ 15014080, 15014230] * | 0 >> [5] chr5 [ 15117400, 15117550] * | 0 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >> chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA NA NA NA NA NA >> NA NA NA NA NA NA NA NA NA NA NA NA >> NA >> >>> subsetByOverlaps(snp, dnase) >> >> GRanges with 6 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <integer> >> rs2235746 chr1 [ 17306671, 17306672] - | 0 >> rs4157777 chr2 [169869904, 169869904] - | 0 >> rs6858330 chr4 [145506558, 145506559] + | 0 >> rs13146741 chr4 [145506453, 145506454] + | 0 >> rs32847 chr5 [ 15117438, 15117439] + | 0 >> rs7341842 chr5 [ 15014184, 15014185] + | 0 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >> chr2 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 >> NA NA NA NA NA NA NA NA NA NA NA >> NA NA NA NA NA NA NA NA NA NA >> ########## >> >> The first GRanges objects stores the DNase genomic locations >> overlapping with the SNPs, while the second one contains the SNPs IDs >> (as GRanges names) and genomic locations overlapping with the DNase >> dataset. >> >> Now, what I actually need is a GRanges object that stores the SNPs IDs >> and the DNase genomic locations. Is this possible? >> >> Thank you. >> Best, >> >> > -- Enrico Ferrero Department of Genetics Cambridge Systems Biology Centre University of Cambridge

ADD REPLY • link 10.7 years ago enricoferrero ▴ 660

0

Entering edit mode

To handle multiple mappings you can use a CharacterList. > gr1 <- GRanges("chr1", IRanges(c(5, 10, 15), c(5, 10, 15), names=paste0("rsid:", letters[1:3])), score=1:3) > gr2 <- GRanges("chr1", IRanges(c(4,6,45), c(8,20,50), names=paste0("dnase:", letters[1:3])), score=4:6) > ranges <- subsetByOverlaps(gr2,gr1) > hits <- findOverlaps(gr2, gr1) > rsid <- CharacterList(split(names(gr1)[subjectHits(hits)], queryHits(hits))) > snpscore <- CharacterList(split(gr1$score[subjectHits(hits)], queryHits(hits))) > mcols(ranges) <- DataFrame(mcols(ranges), rsid, snpscore) > ranges > GRanges with 2 ranges and 3 metadata columns: > seqnames ranges strand | score rsid snpscore > <rle> <iranges> <rle> | <integer> <characterlist> <characterlist> > dnase:a chr1 [4, 8] * | 4 rsid:a 1 > dnase:b chr1 [6, 20] * | 5 rsid:b,rsid:c 2,3 Valerie On 08/21/2013 04:10 AM, Enrico Ferrero wrote: > Hi Valerie, > > Thanks very much for your reply. > You propose an interesting approach, which unfortunately doesn't work > for my specific case. > > In short, the problem is that multiple SNPs (single nucleotides) can > map to a single DNase HS site (often very large regions). > I have edited your example a bit so that it's easier to reproduce the issue: > > ########## >> gr1 <- GRanges("chr1", IRanges(c(5, 10, 15), c(5, 10, 15), names=paste0("rsid:", letters[1:3])), score=rep(0,3)) >> gr1 > GRanges with 3 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <numeric> > rsid:a chr1 [ 5, 5] * | 0 > rsid:b chr1 [10, 10] * | 0 > rsid:c chr1 [15, 15] * | 0 > --- > seqlengths: > chr1 > NA > >> gr2 <- GRanges("chr1", IRanges(c(4,6,45), c(8,20,50), names=paste0("dnase:", letters[1:3])), score=rep(0,3)) >> gr2 > GRanges with 3 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <numeric> > dnase:a chr1 [ 4, 8] * | 0 > dnase:b chr1 [ 6, 20] * | 0 > dnase:c chr1 [45, 50] * | 0 > --- > seqlengths: > chr1 > NA > >> ranges <- subsetByOverlaps(gr2,gr1) >> ranges > GRanges with 2 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <numeric> > dnase:a chr1 [4, 8] * | 0 > dnase:b chr1 [6, 20] * | 0 > --- > seqlengths: > chr1 > NA > >> revRanges <- subsetByOverlaps(gr1,gr2) >> revRanges > GRanges with 3 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <numeric> > rsid:a chr1 [ 5, 5] * | 0 > rsid:b chr1 [10, 10] * | 0 > rsid:c chr1 [15, 15] * | 0 > --- > seqlengths: > chr1 > NA > >> hits <- findOverlaps(gr2, gr1) >> idx <- unique(subjectHits(hits)) >> values <- DataFrame(rsid=names(gr1)[idx]) >> mcols(ranges) <- c(mcols(ranges), values) >> ranges > GRanges with 2 ranges and 2 metadata columns: > Error in (function (..., row.names = NULL, check.rows = FALSE, > check.names = TRUE, : > arguments imply differing number of rows: 2, 3 > ########## > > As you can see, rsid:a is in the dnase:a region, but both rsid:b and > rsid:c hit the dnase:b region. As a result, when you try to add the > SNPs IDs from gr1 as metacolumns of ranges, you create a GRanges > object with different number of rows. > > Is there any workaround for this? > > At the moment I'm creating two GRanges objects for each comparison, > one keeping the SNPs IDs and coordinates, and the other storing the > DNase regions hit by the SNPs, but this is less than ideal and > produces a lot of unnecessary output (R objects and exported BED > files). Ideally, I'd like to have a GRanges object where each row is a > DNase HS site hit by a specific SNP. > > Thank you. > Best, > > On 20 August 2013 17:57, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: >> Hi Enrico, >> >> Here is a toy example of two GRanges, one with snp locations and the other >> with dnase. >> >> gr1 <- GRanges("chr1", IRanges(1:5, width=5, >> names=paste0("rsid:", letters[1:5])), score=1:5) >>> gr1 >> >> GRanges with 5 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <integer> >> rsid:a chr1 [1, 5] * | 1 >> rsid:b chr1 [2, 6] * | 2 >> rsid:c chr1 [3, 7] * | 3 >> rsid:d chr1 [4, 8] * | 4 >> rsid:e chr1 [5, 9] * | 5 >> --- >> seqlengths: >> chr1 >> NA >> >> gr2 <- GRanges("chr1", IRanges(8:12, width=5, >> names=paste0("dnase:", letters[1:5])), score=10:14) >>> gr2 >> >> GRanges with 5 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <integer> >> dnase:a chr1 [ 8, 12] * | 10 >> dnase:b chr1 [ 9, 13] * | 11 >> dnase:c chr1 [10, 14] * | 12 >> dnase:d chr1 [11, 15] * | 13 >> dnase:e chr1 [12, 16] * | 14 >> --- >> seqlengths: >> chr1 >> NA >> >> >> ranges <- subsetByOverlaps(gr2, gr1) >>> ranges >> GRanges with 2 ranges and 1 metadata column: >> >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <integer> >> dnase:a chr1 [8, 12] * | 10 >> dnase:b chr1 [9, 13] * | 11 >> --- >> seqlengths: >> chr1 >> NA >> >> The function called by subsetByOverlaps is findOverlaps (described on same >> man page as ?subsetByOverlaps). The output of findOverlaps is a 'Hits' >> object indicating which of the query and subject overlap. >> >> hits <- findOverlaps(gr2, gr1) >>> hits >> Hits of length 3 >> queryLength: 5 >> subjectLength: 5 >> queryHits subjectHits >> <integer> <integer> >> 1 1 4 >> 2 1 5 >> 3 2 5 >> >> We want meta information from gr1 (snps). In the call to findOverlaps gr1 >> was the subject so we use the 'subjectHits' indices to subset the snp >> GRanges. >> >> idx <- unique(subjectHits(hits)) >> values <- DataFrame(rsid=names(gr1)[idx], snpscore=gr1$score[idx]) >> >> Add the metadata to the dnase ranges. >> >> mcols(ranges) <- c(mcols(ranges), values) >>> ranges >> GRanges with 2 ranges and 3 metadata columns: >> seqnames ranges strand | score rsid snpscore >> <rle> <iranges> <rle> | <integer> <character> <integer> >> dnase:a chr1 [8, 12] * | 10 rsid:d 4 >> dnase:b chr1 [9, 13] * | 11 rsid:e 5 >> --- >> seqlengths: >> chr1 >> NA >> >> >> Valerie >> >> >> On 08/20/2013 03:51 AM, Enrico Ferrero wrote: >>> >>> Hi, >>> >>> I have two GRanges objects, the first one is a list of SNPs, the >>> second one are DNase hypersensitivity sites: >>> >>> ########## >>> library(GenomicRanges) >>> ... >>> >>>> snp >>> >>> GRanges with 192 ranges and 1 metadata column: >>> seqnames ranges strand | score >>> <rle> <iranges> <rle> | <integer> >>> rs000001 chr1 [ 37967779, 37967780] + | 0 >>> rs000002 chr1 [165967416, 165967417] - | 0 >>> rs000003 chr1 [218860069, 218860070] - | 0 >>> rs000004 chr1 [ 17306673, 17306674] - | 0 >>> rs000005 chr1 [ 41293414, 41293415] + | 0 >>> ... ... ... ... ... ... >>> rs000188 chr8 [ 97522507, 97522508] - | 0 >>> rs000189 chr8 [ 15532582, 15532583] + | 0 >>> rs000190 chr8 [ 72270031, 72270032] + | 0 >>> rs000191 chr9 [126511086, 126511087] + | 0 >>> rs000192 chr9 [ 98231008, 98231009] + | 0 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 ... chr21 >>> chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 >>> NA NA NA NA NA NA NA NA NA ... NA >>> NA NA NA NA NA NA NA NA >>> >>>> dnase >>> >>> GRanges with 145038 ranges and 1 metadata column: >>> seqnames ranges strand | score >>> <rle> <iranges> <rle> | <integer> >>> [1] chr1 [ 10120, 10270] * | 0 >>> [2] chr1 [237700, 237850] * | 0 >>> [3] chr1 [521440, 521590] * | 0 >>> [4] chr1 [565560, 565710] * | 0 >>> [5] chr1 [565860, 566010] * | 0 >>> ... ... ... ... ... ... >>> [145034] chrX [154543640, 154543790] * | 0 >>> [145035] chrX [154560420, 154560570] * | 0 >>> [145036] chrX [154563960, 154564110] * | 0 >>> [145037] chrX [154842100, 154842250] * | 0 >>> [145038] chrX [154862200, 154862350] * | 0 >>> >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >>> chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA NA NA NA NA NA >>> NA NA NA NA NA NA NA NA NA NA NA NA >>> NA >>> ########## >>> >>> I can use subsetByOverlaps() in both directions to compute the overlap >>> between them and return a GRanges object: >>> >>> ########## >>>> >>>> subsetByOverlaps(dnase, snp) >>> >>> GRanges with 5 ranges and 1 metadata column: >>> seqnames ranges strand | score >>> <rle> <iranges> <rle> | <integer> >>> [1] chr1 [ 17306560, 17306710] * | 0 >>> [2] chr2 [169869820, 169869970] * | 0 >>> [3] chr4 [145506440, 145506590] * | 0 >>> [4] chr5 [ 15014080, 15014230] * | 0 >>> [5] chr5 [ 15117400, 15117550] * | 0 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >>> chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA NA NA NA NA NA >>> NA NA NA NA NA NA NA NA NA NA NA NA >>> NA >>> >>>> subsetByOverlaps(snp, dnase) >>> >>> GRanges with 6 ranges and 1 metadata column: >>> seqnames ranges strand | score >>> <rle> <iranges> <rle> | <integer> >>> rs2235746 chr1 [ 17306671, 17306672] - | 0 >>> rs4157777 chr2 [169869904, 169869904] - | 0 >>> rs6858330 chr4 [145506558, 145506559] + | 0 >>> rs13146741 chr4 [145506453, 145506454] + | 0 >>> rs32847 chr5 [ 15117438, 15117439] + | 0 >>> rs7341842 chr5 [ 15014184, 15014185] + | 0 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >>> chr2 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 >>> NA NA NA NA NA NA NA NA NA NA NA >>> NA NA NA NA NA NA NA NA NA NA >>> ########## >>> >>> The first GRanges objects stores the DNase genomic locations >>> overlapping with the SNPs, while the second one contains the SNPs IDs >>> (as GRanges names) and genomic locations overlapping with the DNase >>> dataset. >>> >>> Now, what I actually need is a GRanges object that stores the SNPs IDs >>> and the DNase genomic locations. Is this possible? >>> >>> Thank you. >>> Best, >>> >>> >> > > >

ADD REPLY • link 10.7 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Hi Valerie, I just tested your code with toy data and can't try it with real data until next week, but it seems to be doing exactly what I need. That call to CharacterList(split()) does some serious magic! Thanks a lot! I would have never figured this out myself. Best, On 21 August 2013 16:39, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > To handle multiple mappings you can use a CharacterList. > >> gr1 <- GRanges("chr1", IRanges(c(5, 10, 15), c(5, 10, 15), >> names=paste0("rsid:", letters[1:3])), score=1:3) >> gr2 <- GRanges("chr1", IRanges(c(4,6,45), c(8,20,50), >> names=paste0("dnase:", letters[1:3])), score=4:6) >> ranges <- subsetByOverlaps(gr2,gr1) >> hits <- findOverlaps(gr2, gr1) >> rsid <- CharacterList(split(names(gr1)[subjectHits(hits)], >> queryHits(hits))) >> snpscore <- CharacterList(split(gr1$score[subjectHits(hits)], >> queryHits(hits))) >> mcols(ranges) <- DataFrame(mcols(ranges), rsid, snpscore) > > >> ranges >> GRanges with 2 ranges and 3 metadata columns: >> seqnames ranges strand | score rsid >> snpscore >> <rle> <iranges> <rle> | <integer> <characterlist> >> <characterlist> >> dnase:a chr1 [4, 8] * | 4 rsid:a >> 1 >> dnase:b chr1 [6, 20] * | 5 rsid:b,rsid:c >> 2,3 > > > > Valerie > > > > > On 08/21/2013 04:10 AM, Enrico Ferrero wrote: >> >> Hi Valerie, >> >> Thanks very much for your reply. >> You propose an interesting approach, which unfortunately doesn't work >> for my specific case. >> >> In short, the problem is that multiple SNPs (single nucleotides) can >> map to a single DNase HS site (often very large regions). >> I have edited your example a bit so that it's easier to reproduce the >> issue: >> >> ########## >>> >>> gr1 <- GRanges("chr1", IRanges(c(5, 10, 15), c(5, 10, 15), >>> names=paste0("rsid:", letters[1:3])), score=rep(0,3)) >>> gr1 >> >> GRanges with 3 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <numeric> >> rsid:a chr1 [ 5, 5] * | 0 >> rsid:b chr1 [10, 10] * | 0 >> rsid:c chr1 [15, 15] * | 0 >> --- >> seqlengths: >> chr1 >> NA >> >>> gr2 <- GRanges("chr1", IRanges(c(4,6,45), c(8,20,50), >>> names=paste0("dnase:", letters[1:3])), score=rep(0,3)) >>> gr2 >> >> GRanges with 3 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <numeric> >> dnase:a chr1 [ 4, 8] * | 0 >> dnase:b chr1 [ 6, 20] * | 0 >> dnase:c chr1 [45, 50] * | 0 >> --- >> seqlengths: >> chr1 >> NA >> >>> ranges <- subsetByOverlaps(gr2,gr1) >>> ranges >> >> GRanges with 2 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <numeric> >> dnase:a chr1 [4, 8] * | 0 >> dnase:b chr1 [6, 20] * | 0 >> --- >> seqlengths: >> chr1 >> NA >> >>> revRanges <- subsetByOverlaps(gr1,gr2) >>> revRanges >> >> GRanges with 3 ranges and 1 metadata column: >> seqnames ranges strand | score >> <rle> <iranges> <rle> | <numeric> >> rsid:a chr1 [ 5, 5] * | 0 >> rsid:b chr1 [10, 10] * | 0 >> rsid:c chr1 [15, 15] * | 0 >> --- >> seqlengths: >> chr1 >> NA >> >>> hits <- findOverlaps(gr2, gr1) >>> idx <- unique(subjectHits(hits)) >>> values <- DataFrame(rsid=names(gr1)[idx]) >>> mcols(ranges) <- c(mcols(ranges), values) >>> ranges >> >> GRanges with 2 ranges and 2 metadata columns: >> Error in (function (..., row.names = NULL, check.rows = FALSE, >> check.names = TRUE, : >> arguments imply differing number of rows: 2, 3 >> ########## >> >> As you can see, rsid:a is in the dnase:a region, but both rsid:b and >> rsid:c hit the dnase:b region. As a result, when you try to add the >> SNPs IDs from gr1 as metacolumns of ranges, you create a GRanges >> object with different number of rows. >> >> Is there any workaround for this? >> >> At the moment I'm creating two GRanges objects for each comparison, >> one keeping the SNPs IDs and coordinates, and the other storing the >> DNase regions hit by the SNPs, but this is less than ideal and >> produces a lot of unnecessary output (R objects and exported BED >> files). Ideally, I'd like to have a GRanges object where each row is a >> DNase HS site hit by a specific SNP. >> >> Thank you. >> Best, >> >> On 20 August 2013 17:57, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: >>> >>> Hi Enrico, >>> >>> Here is a toy example of two GRanges, one with snp locations and the >>> other >>> with dnase. >>> >>> gr1 <- GRanges("chr1", IRanges(1:5, width=5, >>> names=paste0("rsid:", letters[1:5])), score=1:5) >>>> >>>> gr1 >>> >>> >>> GRanges with 5 ranges and 1 metadata column: >>> seqnames ranges strand | score >>> <rle> <iranges> <rle> | <integer> >>> rsid:a chr1 [1, 5] * | 1 >>> rsid:b chr1 [2, 6] * | 2 >>> rsid:c chr1 [3, 7] * | 3 >>> rsid:d chr1 [4, 8] * | 4 >>> rsid:e chr1 [5, 9] * | 5 >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>> gr2 <- GRanges("chr1", IRanges(8:12, width=5, >>> names=paste0("dnase:", letters[1:5])), score=10:14) >>>> >>>> gr2 >>> >>> >>> GRanges with 5 ranges and 1 metadata column: >>> seqnames ranges strand | score >>> <rle> <iranges> <rle> | <integer> >>> dnase:a chr1 [ 8, 12] * | 10 >>> dnase:b chr1 [ 9, 13] * | 11 >>> dnase:c chr1 [10, 14] * | 12 >>> dnase:d chr1 [11, 15] * | 13 >>> dnase:e chr1 [12, 16] * | 14 >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>> >>> ranges <- subsetByOverlaps(gr2, gr1) >>>> >>>> ranges >>> >>> GRanges with 2 ranges and 1 metadata column: >>> >>> seqnames ranges strand | score >>> <rle> <iranges> <rle> | <integer> >>> dnase:a chr1 [8, 12] * | 10 >>> dnase:b chr1 [9, 13] * | 11 >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>> The function called by subsetByOverlaps is findOverlaps (described on >>> same >>> man page as ?subsetByOverlaps). The output of findOverlaps is a 'Hits' >>> object indicating which of the query and subject overlap. >>> >>> hits <- findOverlaps(gr2, gr1) >>>> >>>> hits >>> >>> Hits of length 3 >>> queryLength: 5 >>> subjectLength: 5 >>> queryHits subjectHits >>> <integer> <integer> >>> 1 1 4 >>> 2 1 5 >>> 3 2 5 >>> >>> We want meta information from gr1 (snps). In the call to findOverlaps gr1 >>> was the subject so we use the 'subjectHits' indices to subset the snp >>> GRanges. >>> >>> idx <- unique(subjectHits(hits)) >>> values <- DataFrame(rsid=names(gr1)[idx], snpscore=gr1$score[idx]) >>> >>> Add the metadata to the dnase ranges. >>> >>> mcols(ranges) <- c(mcols(ranges), values) >>>> >>>> ranges >>> >>> GRanges with 2 ranges and 3 metadata columns: >>> seqnames ranges strand | score rsid snpscore >>> <rle> <iranges> <rle> | <integer> <character> <integer> >>> dnase:a chr1 [8, 12] * | 10 rsid:d 4 >>> dnase:b chr1 [9, 13] * | 11 rsid:e 5 >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>> >>> Valerie >>> >>> >>> On 08/20/2013 03:51 AM, Enrico Ferrero wrote: >>>> >>>> >>>> Hi, >>>> >>>> I have two GRanges objects, the first one is a list of SNPs, the >>>> second one are DNase hypersensitivity sites: >>>> >>>> ########## >>>> library(GenomicRanges) >>>> ... >>>> >>>>> snp >>>> >>>> >>>> GRanges with 192 ranges and 1 metadata column: >>>> seqnames ranges strand | score >>>> <rle> <iranges> <rle> | <integer> >>>> rs000001 chr1 [ 37967779, 37967780] + | 0 >>>> rs000002 chr1 [165967416, 165967417] - | 0 >>>> rs000003 chr1 [218860069, 218860070] - | 0 >>>> rs000004 chr1 [ 17306673, 17306674] - | 0 >>>> rs000005 chr1 [ 41293414, 41293415] + | 0 >>>> ... ... ... ... ... ... >>>> rs000188 chr8 [ 97522507, 97522508] - | 0 >>>> rs000189 chr8 [ 15532582, 15532583] + | 0 >>>> rs000190 chr8 [ 72270031, 72270032] + | 0 >>>> rs000191 chr9 [126511086, 126511087] + | 0 >>>> rs000192 chr9 [ 98231008, 98231009] + | 0 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 ... chr21 >>>> chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 >>>> NA NA NA NA NA NA NA NA NA ... NA >>>> NA NA NA NA NA NA NA NA >>>> >>>>> dnase >>>> >>>> >>>> GRanges with 145038 ranges and 1 metadata column: >>>> seqnames ranges strand | score >>>> <rle> <iranges> <rle> | <integer> >>>> [1] chr1 [ 10120, 10270] * | 0 >>>> [2] chr1 [237700, 237850] * | 0 >>>> [3] chr1 [521440, 521590] * | 0 >>>> [4] chr1 [565560, 565710] * | 0 >>>> [5] chr1 [565860, 566010] * | 0 >>>> ... ... ... ... ... ... >>>> [145034] chrX [154543640, 154543790] * | 0 >>>> [145035] chrX [154560420, 154560570] * | 0 >>>> [145036] chrX [154563960, 154564110] * | 0 >>>> [145037] chrX [154842100, 154842250] * | 0 >>>> [145038] chrX [154862200, 154862350] * | 0 >>>> >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >>>> chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX >>>> chrY >>>> NA NA NA NA NA NA NA NA NA NA NA >>>> NA NA NA NA NA NA NA NA NA NA NA NA >>>> NA >>>> ########## >>>> >>>> I can use subsetByOverlaps() in both directions to compute the overlap >>>> between them and return a GRanges object: >>>> >>>> ########## >>>>> >>>>> >>>>> subsetByOverlaps(dnase, snp) >>>> >>>> >>>> GRanges with 5 ranges and 1 metadata column: >>>> seqnames ranges strand | score >>>> <rle> <iranges> <rle> | <integer> >>>> [1] chr1 [ 17306560, 17306710] * | 0 >>>> [2] chr2 [169869820, 169869970] * | 0 >>>> [3] chr4 [145506440, 145506590] * | 0 >>>> [4] chr5 [ 15014080, 15014230] * | 0 >>>> [5] chr5 [ 15117400, 15117550] * | 0 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >>>> chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX >>>> chrY >>>> NA NA NA NA NA NA NA NA NA NA NA >>>> NA NA NA NA NA NA NA NA NA NA NA NA >>>> NA >>>> >>>>> subsetByOverlaps(snp, dnase) >>>> >>>> >>>> GRanges with 6 ranges and 1 metadata column: >>>> seqnames ranges strand | score >>>> <rle> <iranges> <rle> | <integer> >>>> rs2235746 chr1 [ 17306671, 17306672] - | 0 >>>> rs4157777 chr2 [169869904, 169869904] - | 0 >>>> rs6858330 chr4 [145506558, 145506559] + | 0 >>>> rs13146741 chr4 [145506453, 145506454] + | 0 >>>> rs32847 chr5 [ 15117438, 15117439] + | 0 >>>> rs7341842 chr5 [ 15014184, 15014185] + | 0 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 >>>> chr2 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 >>>> NA NA NA NA NA NA NA NA NA NA NA >>>> NA NA NA NA NA NA NA NA NA NA >>>> ########## >>>> >>>> The first GRanges objects stores the DNase genomic locations >>>> overlapping with the SNPs, while the second one contains the SNPs IDs >>>> (as GRanges names) and genomic locations overlapping with the DNase >>>> dataset. >>>> >>>> Now, what I actually need is a GRanges object that stores the SNPs IDs >>>> and the DNase genomic locations. Is this possible? >>>> >>>> Thank you. >>>> Best, >>>> >>>> >>> >> >> >> > -- Enrico Ferrero Department of Genetics Cambridge Systems Biology Centre University of Cambridge

ADD REPLY • link 10.7 years ago enricoferrero ▴ 660