overlap regions between two GRanges (or GRangesList)
4
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Zhao, I'm going to try to rephrase what you want to do: seems like you want to keep the ranges in 'g' that have an overlap with any of the ranges in 'g2': > g GRanges with 4 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [ 1, 12] - [2] chr2 [ 2, 15] + [3] chr2 [ 3, 20] + [4] chr3 [10, 14] - --- seqlengths: chr1 chr2 chr3 NA NA NA > g2 GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 8] - [2] chr2 [2, 30] + --- seqlengths: chr1 chr2 NA NA > subsetByOverlaps(g, g2) GRanges with 3 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 12] - [2] chr2 [2, 15] + [3] chr2 [3, 20] + --- seqlengths: chr1 chr2 chr3 NA NA NA However, in the desired output you show below, you omitted the 1st range (the range on chr1). So it's not clear to me what you are really after. Did you omit it because it's not *within* the overlapping range in 'g2'? If that's the case, then: > subsetByOverlaps(g, g2, type="within") GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr2 [2, 15] + [2] chr2 [3, 20] + --- seqlengths: chr1 chr2 chr3 NA NA NA HTH, H. On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote: > Seems intersect can do what I want, but not quietly exact. > > I would like to output two records (for b,c) below instead of unite them > into a single record [2, 20] when call *intersect(g,g2)**.* > > [2, 15] > > [3, 20] > > Thanks, > > Shanrong > > P.S > >> g > > GRanges with 4 ranges and 2 metadata columns: > > seqnames ranges strand | score GC > > <rle> <iranges> <rle> | <integer> <numeric> > > a chr1 [ 1, 12] - | 1 1 > > b chr2 [ 2, 15] + | 2 0.888888888888889 > > c chr2 [ 3, 20] + | 3 0.777777777777778 > > j chr3 [10, 14] - | 10 0 > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > >> g2 > > GRanges with 2 ranges and 2 metadata columns: > > seqnames ranges strand | score GC > > <rle> <iranges> <rle> | <integer> <numeric> > > a chr1 [1, 8] - | 1 1 > > * b chr2 [2, 30] + | 2 0.888888888888889* > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > >> intersect(g,g2) > > GRanges with 2 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] chr1 [1, 8] - > > [2] chr2 [*2, 20*] + > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > > *From:*Zhao, Shanrong [JRDUS] > *Sent:* Thursday, March 13, 2014 9:40 PM > *To:* 'hpages at fhcrc.org' > *Subject:* overlap regions between two GRanges (or GRangesList) > > Dear Dr. Pages, > > I am exploring Bioconductor packages to analyze DNA methylation data. > One question I don?t know how to solve it. I have two GRanges (OR > GRangesList), Now I want to identify the common (*overlapping) > regions*, not just overlapping or not--- *gc <- overlapRegions(ga,gb)* > > The other question I have: I am interested in all *cytosines* in > promoters regions, I have already had promoter in GRange object. What is > the most efficient way to identify the total number of Cs? I plan to > extract all DNA sequences corresponding to promoters, and then call > letterFrequency (by set letters=?C?). > > Best regards, > > Shanrong > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Cancer Cancer • 3.1k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
On 03/14/2014 03:31 PM, Zhao, Shanrong [JRDUS] wrote: > Thank you. subsetByOverlaps is not what I want. I just want to keep the > ranges in 'g' that overlaps with g2, but only keep *"overlapping" > regions* (instead of the original ranges in 'g1'). Thanks for clarifying. But that still doesn't explain why you want to *completely* get rid of the 1st range in 'g' (the range on chr1). Note that in the general case, when you replace the original range with the overlapping region, you might need more than 1 range for that. That's because the original range in 'g' can overlap with more than 1 range in 'g2'. Assuming any given range in 'g' overlaps with at most 1 range in 'g2': > ov <- findOverlaps(g, g2) > pintersect(g[queryHits(ov)], g2[subjectHits(ov)]) GRanges with 3 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 8] - [2] chr2 [2, 15] + [3] chr2 [3, 20] + --- seqlengths: chr1 chr2 chr3 NA NA NA Now you would need to explain why you don't want to see the range on chr1 in that result... Thanks, H. > > Thanks again, > > Shanrong > > -----Original Message----- > From: Hervé Pagès [mailto:hpages at fhcrc.org] > Sent: Friday, March 14, 2014 3:19 PM > To: Zhao, Shanrong [JRDUS] > Cc: bioconductor at r-project.org > Subject: Re: overlap regions between two GRanges (or GRangesList) > > Hi Zhao, > > I'm going to try to rephrase what you want to do: seems like you want to > keep the ranges in 'g' that have an overlap with any of the ranges in 'g2': > > > g > > GRanges with 4 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] chr1 [ 1, 12] - > > [2] chr2 [ 2, 15] + > > [3] chr2 [ 3, 20] + > > [4] chr3 [10, 14] - > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > > > g2 > > GRanges with 2 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] chr1 [1, 8] - > > [2] chr2 [2, 30] + > > --- > > seqlengths: > > chr1 chr2 > > NA NA > > > subsetByOverlaps(g, g2) > > GRanges with 3 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] chr1 [1, 12] - > > [2] chr2 [2, 15] + > > [3] chr2 [3, 20] + > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > > However, in the desired output you show below, you omitted the 1st range > (the range on chr1). So it's not clear to me what you are really after. > Did you omit it because it's not *within* the overlapping range in 'g2'? > If that's the case, then: > > > subsetByOverlaps(g, g2, type="within") > > GRanges with 2 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] chr2 [2, 15] + > > [2] chr2 [3, 20] + > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > > HTH, > > H. > > On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote: > > > Seems intersect can do what I want, but not quietly exact. > > > > > > I would like to output two records (for b,c) below instead of unite > > > them into a single record [2, 20] when call *intersect(g,g2)**.* > > > > > > [2, 15] > > > > > > [3, 20] > > > > > > Thanks, > > > > > > Shanrong > > > > > > P.S > > > > > >> g > > > > > > GRanges with 4 ranges and 2 metadata columns: > > > > > > seqnames ranges strand | score GC > > > > > > <rle> <iranges> <rle> | <integer> <numeric> > > > > > > a chr1 [ 1, 12] - | 1 1 > > > > > > b chr2 [ 2, 15] + | 2 0.888888888888889 > > > > > > c chr2 [ 3, 20] + | 3 0.777777777777778 > > > > > > j chr3 [10, 14] - | 10 0 > > > > > > --- > > > > > > seqlengths: > > > > > > chr1 chr2 chr3 > > > > > > NA NA NA > > > > > >> g2 > > > > > > GRanges with 2 ranges and 2 metadata columns: > > > > > > seqnames ranges strand | score GC > > > > > > <rle> <iranges> <rle> | <integer> <numeric> > > > > > > a chr1 [1, 8] - | 1 1 > > > > > > * b chr2 [2, 30] + | 2 0.888888888888889* > > > > > > --- > > > > > > seqlengths: > > > > > > chr1 chr2 chr3 > > > > > > NA NA NA > > > > > >> intersect(g,g2) > > > > > > GRanges with 2 ranges and 0 metadata columns: > > > > > > seqnames ranges strand > > > > > > <rle> <iranges> <rle> > > > > > > [1] chr1 [1, 8] - > > > > > > [2] chr2 [*2, 20*] + > > > > > > --- > > > > > > seqlengths: > > > > > > chr1 chr2 chr3 > > > > > > NA NA NA > > > > > > *From:*Zhao, Shanrong [JRDUS] > > > *Sent:* Thursday, March 13, 2014 9:40 PM > > > *To:* 'hpages at fhcrc.org' > > > *Subject:* overlap regions between two GRanges (or GRangesList) > > > > > > Dear Dr. Pages, > > > > > > I am exploring Bioconductor packages to analyze DNA methylation data. > > > One question I don?t know how to solve it. I have two GRanges (OR > > > GRangesList), Now I want to identify the common (*overlapping) > > > regions*, not just overlapping or not--- *gc <- overlapRegions(ga,gb)* > > > > > > The other question I have: I am interested in all *cytosines* in > > > promoters regions, I have already had promoter in GRange object. What is > > > the most efficient way to identify the total number of Cs? I plan to > > > extract all DNA sequences corresponding to promoters, and then call > > > letterFrequency (by set letters=?C?). > > > > > > Best regards, > > > > > > Shanrong > > > > > -- > > Hervé Pagès > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Please use the "Reply All" button so this discussion remains on the mailing list. On 03/14/2014 04:04 PM, Zhao, Shanrong [JRDUS] wrote: > The range in chr1 should be kept. >> intersect(g,g2) > GRanges with 2 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [1, 8] - > [2] chr2 [2, 20] + > --- > seqlengths: > chr1 chr2 chr3 > NA NA NA > > My real question boils down to "chr2 [2,20]". Instead of reducing them, I would rather output "chr2 [2,15]" and "chr2 [3,20]". The solution I sent you earlier (findOverlaps + pintersect) keeps the range on chr1 and outputs "chr2 [2,15]" and "chr2 [3,20]". > > You bring up a very good point: the original range might overlap with more than 1 range. This question seems more complicated what I thought. But for my practical situation, I cannot find a better solution (better than intersect function can offer) Hard to provide more suggestion from here without knowing why the findOverlaps + pintersect solution doesn't work for you. I'll respond to your other off-list question about counting the total number of Cs in your promoters in a separate email. Cheers, H. > > Thanks, > Shanrong > > -----Original Message----- > From: Hervé Pagès [mailto:hpages at fhcrc.org] > Sent: Friday, March 14, 2014 3:49 PM > To: Zhao, Shanrong [JRDUS] > Cc: bioconductor at r-project.org > Subject: Re: overlap regions between two GRanges (or GRangesList) > > On 03/14/2014 03:31 PM, Zhao, Shanrong [JRDUS] wrote: >> Thank you. subsetByOverlaps is not what I want. I just want to keep >> the ranges in 'g' that overlaps with g2, but only keep *"overlapping" >> regions* (instead of the original ranges in 'g1'). > > Thanks for clarifying. But that still doesn't explain why you want to *completely* get rid of the 1st range in 'g' (the range on chr1). > > Note that in the general case, when you replace the original range with the overlapping region, you might need more than 1 range for that. > That's because the original range in 'g' can overlap with more than 1 range in 'g2'. Assuming any given range in 'g' overlaps with at most > 1 range in 'g2': > > > ov <- findOverlaps(g, g2) > > pintersect(g[queryHits(ov)], g2[subjectHits(ov)]) > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [1, 8] - > [2] chr2 [2, 15] + > [3] chr2 [3, 20] + > --- > seqlengths: > chr1 chr2 chr3 > NA NA NA > > Now you would need to explain why you don't want to see the range on chr1 in that result... > > Thanks, > H. > >> >> Thanks again, >> >> Shanrong >> >> -----Original Message----- >> From: Hervé Pagès [mailto:hpages at fhcrc.org] >> Sent: Friday, March 14, 2014 3:19 PM >> To: Zhao, Shanrong [JRDUS] >> Cc: bioconductor at r-project.org >> Subject: Re: overlap regions between two GRanges (or GRangesList) >> >> Hi Zhao, >> >> I'm going to try to rephrase what you want to do: seems like you want >> to keep the ranges in 'g' that have an overlap with any of the ranges in 'g2': >> >> > g >> >> GRanges with 4 ranges and 0 metadata columns: >> >> seqnames ranges strand >> >> <rle> <iranges> <rle> >> >> [1] chr1 [ 1, 12] - >> >> [2] chr2 [ 2, 15] + >> >> [3] chr2 [ 3, 20] + >> >> [4] chr3 [10, 14] - >> >> --- >> >> seqlengths: >> >> chr1 chr2 chr3 >> >> NA NA NA >> >> > g2 >> >> GRanges with 2 ranges and 0 metadata columns: >> >> seqnames ranges strand >> >> <rle> <iranges> <rle> >> >> [1] chr1 [1, 8] - >> >> [2] chr2 [2, 30] + >> >> --- >> >> seqlengths: >> >> chr1 chr2 >> >> NA NA >> >> > subsetByOverlaps(g, g2) >> >> GRanges with 3 ranges and 0 metadata columns: >> >> seqnames ranges strand >> >> <rle> <iranges> <rle> >> >> [1] chr1 [1, 12] - >> >> [2] chr2 [2, 15] + >> >> [3] chr2 [3, 20] + >> >> --- >> >> seqlengths: >> >> chr1 chr2 chr3 >> >> NA NA NA >> >> However, in the desired output you show below, you omitted the 1st >> range (the range on chr1). So it's not clear to me what you are really after. >> Did you omit it because it's not *within* the overlapping range in 'g2'? >> If that's the case, then: >> >> > subsetByOverlaps(g, g2, type="within") >> >> GRanges with 2 ranges and 0 metadata columns: >> >> seqnames ranges strand >> >> <rle> <iranges> <rle> >> >> [1] chr2 [2, 15] + >> >> [2] chr2 [3, 20] + >> >> --- >> >> seqlengths: >> >> chr1 chr2 chr3 >> >> NA NA NA >> >> HTH, >> >> H. >> >> On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote: >> >> > Seems intersect can do what I want, but not quietly exact. >> >> > >> >> > I would like to output two records (for b,c) below instead of unite >> >> > them into a single record [2, 20] when call *intersect(g,g2)**.* >> >> > >> >> > [2, 15] >> >> > >> >> > [3, 20] >> >> > >> >> > Thanks, >> >> > >> >> > Shanrong >> >> > >> >> > P.S >> >> > >> >> >> g >> >> > >> >> > GRanges with 4 ranges and 2 metadata columns: >> >> > >> >> > seqnames ranges strand | score GC >> >> > >> >> > <rle> <iranges> <rle> | <integer> <numeric> >> >> > >> >> > a chr1 [ 1, 12] - | 1 1 >> >> > >> >> > b chr2 [ 2, 15] + | 2 0.888888888888889 >> >> > >> >> > c chr2 [ 3, 20] + | 3 0.777777777777778 >> >> > >> >> > j chr3 [10, 14] - | 10 0 >> >> > >> >> > --- >> >> > >> >> > seqlengths: >> >> > >> >> > chr1 chr2 chr3 >> >> > >> >> > NA NA NA >> >> > >> >> >> g2 >> >> > >> >> > GRanges with 2 ranges and 2 metadata columns: >> >> > >> >> > seqnames ranges strand | score GC >> >> > >> >> > <rle> <iranges> <rle> | <integer> <numeric> >> >> > >> >> > a chr1 [1, 8] - | 1 1 >> >> > >> >> > * b chr2 [2, 30] + | 2 0.888888888888889* >> >> > >> >> > --- >> >> > >> >> > seqlengths: >> >> > >> >> > chr1 chr2 chr3 >> >> > >> >> > NA NA NA >> >> > >> >> >> intersect(g,g2) >> >> > >> >> > GRanges with 2 ranges and 0 metadata columns: >> >> > >> >> > seqnames ranges strand >> >> > >> >> > <rle> <iranges> <rle> >> >> > >> >> > [1] chr1 [1, 8] - >> >> > >> >> > [2] chr2 [*2, 20*] + >> >> > >> >> > --- >> >> > >> >> > seqlengths: >> >> > >> >> > chr1 chr2 chr3 >> >> > >> >> > NA NA NA >> >> > >> >> > *From:*Zhao, Shanrong [JRDUS] >> >> > *Sent:* Thursday, March 13, 2014 9:40 PM >> >> > *To:* 'hpages at fhcrc.org' >> >> > *Subject:* overlap regions between two GRanges (or GRangesList) >> >> > >> >> > Dear Dr. Pages, >> >> > >> >> > I am exploring Bioconductor packages to analyze DNA methylation data. >> >> > One question I don't know how to solve it. I have two GRanges (OR >> >> > GRangesList), Now I want to identify the common (*overlapping) >> >> > regions*, not just overlapping or not--- *gc <- >> overlapRegions(ga,gb)* >> >> > >> >> > The other question I have: I am interested in all *cytosines* in >> >> > promoters regions, I have already had promoter in GRange object. >> What is >> >> > the most efficient way to identify the total number of Cs? I plan to >> >> > extract all DNA sequences corresponding to promoters, and then call >> >> > letterFrequency (by set letters="C"). >> >> > >> >> > Best regards, >> >> > >> >> > Shanrong >> >> > >> >> -- >> >> Hervé Pagès >> >> Program in Computational Biology >> >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> >> 1100 Fairview Ave. N, M1-B514 >> >> P.O. Box 19024 >> >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >> >> Phone: (206) 667-5791 >> >> Fax: (206) 667-1319 >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
On 03/13/2014 09:40 PM, Zhao, Shanrong [JRDUS] wrote: > Dear Dr. Pages, > > I am exploring Bioconductor packages to analyze DNA methylation data. > One question I don?t know how to solve it. I have two GRanges (OR > GRangesList), Now I want to identify the common (*overlapping) > regions*, not just overlapping or not--- *gc <- overlapRegions(ga,gb)* > > The other question I have: I am interested in all *cytosines* in > promoters regions, I have already had promoter in GRange object. What is > the most efficient way to identify the total number of Cs? I plan to > extract all DNA sequences corresponding to promoters, and then call > letterFrequency (by set letters=?C?). Yes: letterFrequency(promoters, "C", collapse=TRUE) or: alphabetFrequency(promoters, baseOnly=TRUE, collapse=TRUE)[["C"]] Both should be fast. H. > > Best regards, > > Shanrong > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
@shanrong-zhao-5713
Last seen 9.6 years ago
Thank you. subsetByOverlaps is not what I want. I just want to keep the ranges in 'g' that overlaps with g2, but only keep "overlapping" regions (instead of the original ranges in 'g1'). Thanks again, Shanrong -----Original Message----- From: Hervé Pagès [mailto:hpages@fhcrc.org] Sent: Friday, March 14, 2014 3:19 PM To: Zhao, Shanrong [JRDUS] Cc: bioconductor@r-project.org Subject: Re: overlap regions between two GRanges (or GRangesList) Hi Zhao, I'm going to try to rephrase what you want to do: seems like you want to keep the ranges in 'g' that have an overlap with any of the ranges in 'g2': > g GRanges with 4 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [ 1, 12] - [2] chr2 [ 2, 15] + [3] chr2 [ 3, 20] + [4] chr3 [10, 14] - --- seqlengths: chr1 chr2 chr3 NA NA NA > g2 GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 8] - [2] chr2 [2, 30] + --- seqlengths: chr1 chr2 NA NA > subsetByOverlaps(g, g2) GRanges with 3 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 12] - [2] chr2 [2, 15] + [3] chr2 [3, 20] + --- seqlengths: chr1 chr2 chr3 NA NA NA However, in the desired output you show below, you omitted the 1st range (the range on chr1). So it's not clear to me what you are really after. Did you omit it because it's not *within* the overlapping range in 'g2'? If that's the case, then: > subsetByOverlaps(g, g2, type="within") GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr2 [2, 15] + [2] chr2 [3, 20] + --- seqlengths: chr1 chr2 chr3 NA NA NA HTH, H. On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote: > Seems intersect can do what I want, but not quietly exact. > > I would like to output two records (for b,c) below instead of unite > them into a single record [2, 20] when call *intersect(g,g2)**.* > > [2, 15] > > [3, 20] > > Thanks, > > Shanrong > > P.S > >> g > > GRanges with 4 ranges and 2 metadata columns: > > seqnames ranges strand | score GC > > <rle> <iranges> <rle> | <integer> <numeric> > > a chr1 [ 1, 12] - | 1 1 > > b chr2 [ 2, 15] + | 2 0.888888888888889 > > c chr2 [ 3, 20] + | 3 0.777777777777778 > > j chr3 [10, 14] - | 10 0 > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > >> g2 > > GRanges with 2 ranges and 2 metadata columns: > > seqnames ranges strand | score GC > > <rle> <iranges> <rle> | <integer> <numeric> > > a chr1 [1, 8] - | 1 1 > > * b chr2 [2, 30] + | 2 0.888888888888889* > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > >> intersect(g,g2) > > GRanges with 2 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] chr1 [1, 8] - > > [2] chr2 [*2, 20*] + > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > > *From:*Zhao, Shanrong [JRDUS] > *Sent:* Thursday, March 13, 2014 9:40 PM > *To:* 'hpages@fhcrc.org' > *Subject:* overlap regions between two GRanges (or GRangesList) > > Dear Dr. Pages, > > I am exploring Bioconductor packages to analyze DNA methylation data. > One question I don't know how to solve it. I have two GRanges (OR > GRangesList), Now I want to identify the common (*overlapping) > regions*, not just overlapping or not--- *gc <- overlapRegions(ga,gb)* > > The other question I have: I am interested in all *cytosines* in > promoters regions, I have already had promoter in GRange object. What is > the most efficient way to identify the total number of Cs? I plan to > extract all DNA sequences corresponding to promoters, and then call > letterFrequency (by set letters="C"). > > Best regards, > > Shanrong > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages@fhcrc.org<mailto:hpages@fhcrc.org> Phone: (206) 667-5791 Fax: (206) 667-1319 [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Zhao, sounds like pintersect is your friend. two steps: 1) use overlaps to find indices into g and g2 2) use pintersect to find the overlapping segments between g and g2 ~ malcolm_cook at stowers.org ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Zhao, Shanrong [JRDUS] [SZhao7@its.jnj.com] Sent: Friday, March 14, 2014 5:31 PM To: Hervé Pagès Cc: bioconductor at r-project.org Subject: Re: [BioC] overlap regions between two GRanges (or GRangesList) Thank you. subsetByOverlaps is not what I want. I just want to keep the ranges in 'g' that overlaps with g2, but only keep "overlapping" regions (instead of the original ranges in 'g1'). Thanks again, Shanrong -----Original Message----- From: Hervé Pagès [mailto:hpages@fhcrc.org] Sent: Friday, March 14, 2014 3:19 PM To: Zhao, Shanrong [JRDUS] Cc: bioconductor at r-project.org Subject: Re: overlap regions between two GRanges (or GRangesList) Hi Zhao, I'm going to try to rephrase what you want to do: seems like you want to keep the ranges in 'g' that have an overlap with any of the ranges in 'g2': > g GRanges with 4 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [ 1, 12] - [2] chr2 [ 2, 15] + [3] chr2 [ 3, 20] + [4] chr3 [10, 14] - --- seqlengths: chr1 chr2 chr3 NA NA NA > g2 GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 8] - [2] chr2 [2, 30] + --- seqlengths: chr1 chr2 NA NA > subsetByOverlaps(g, g2) GRanges with 3 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 12] - [2] chr2 [2, 15] + [3] chr2 [3, 20] + --- seqlengths: chr1 chr2 chr3 NA NA NA However, in the desired output you show below, you omitted the 1st range (the range on chr1). So it's not clear to me what you are really after. Did you omit it because it's not *within* the overlapping range in 'g2'? If that's the case, then: > subsetByOverlaps(g, g2, type="within") GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr2 [2, 15] + [2] chr2 [3, 20] + --- seqlengths: chr1 chr2 chr3 NA NA NA HTH, H. On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote: > Seems intersect can do what I want, but not quietly exact. > > I would like to output two records (for b,c) below instead of unite > them into a single record [2, 20] when call *intersect(g,g2)**.* > > [2, 15] > > [3, 20] > > Thanks, > > Shanrong > > P.S > >> g > > GRanges with 4 ranges and 2 metadata columns: > > seqnames ranges strand | score GC > > <rle> <iranges> <rle> | <integer> <numeric> > > a chr1 [ 1, 12] - | 1 1 > > b chr2 [ 2, 15] + | 2 0.888888888888889 > > c chr2 [ 3, 20] + | 3 0.777777777777778 > > j chr3 [10, 14] - | 10 0 > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > >> g2 > > GRanges with 2 ranges and 2 metadata columns: > > seqnames ranges strand | score GC > > <rle> <iranges> <rle> | <integer> <numeric> > > a chr1 [1, 8] - | 1 1 > > * b chr2 [2, 30] + | 2 0.888888888888889* > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > >> intersect(g,g2) > > GRanges with 2 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] chr1 [1, 8] - > > [2] chr2 [*2, 20*] + > > --- > > seqlengths: > > chr1 chr2 chr3 > > NA NA NA > > *From:*Zhao, Shanrong [JRDUS] > *Sent:* Thursday, March 13, 2014 9:40 PM > *To:* 'hpages at fhcrc.org' > *Subject:* overlap regions between two GRanges (or GRangesList) > > Dear Dr. Pages, > > I am exploring Bioconductor packages to analyze DNA methylation data. > One question I don't know how to solve it. I have two GRanges (OR > GRangesList), Now I want to identify the common (*overlapping) > regions*, not just overlapping or not--- *gc <- overlapRegions(ga,gb)* > > The other question I have: I am interested in all *cytosines* in > promoters regions, I have already had promoter in GRange object. What is > the most efficient way to identify the total number of Cs? I plan to > extract all DNA sequences corresponding to promoters, and then call > letterFrequency (by set letters="C"). > > Best regards, > > Shanrong > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org<mailto:hpages at="" fhcrc.org=""> Phone: (206) 667-5791 Fax: (206) 667-1319 [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6