How do I subset a GRanges object based on chromosome (and approximate region)?
3
1
Entering edit mode
deepue ▴ 10
@deepue-9906
Last seen 23 months ago
France

I have the GRanges object data_GR, from which I would like to extract all the regions specific to a chromosome(eg: chr21). How could I extract it without knowing the regions of interest?

set.seed(123)
data_bed = circlize::generateRandomBed(nr = 1000, nc = 0)
data_GR = makeGRangesFromDataFrame(data_bed)

GRanges object with 1005 ranges and 0 metadata columns:
         seqnames            ranges strand
            <Rle>         <IRanges>  <Rle>
     [1]     chr1   7634457-9204434      *
     [2]     chr1  9853594-10435028      *
     [3]     chr1 10862809-12716970      *
     [4]     chr1 13814692-18272526      *
     [5]     chr1 19243285-20683999      *
     ...      ...               ...    ...
  [1001]     chrY 46296843-48478084      *
  [1002]     chrY 48551532-51056391      *
  [1003]     chrY 52266848-53042784      *
  [1004]     chrY 57968441-58556744      *
  [1005]     chrY 58660263-59131689      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

Is it possible to extract all the regions present between a range?

904  chr21    182543   2542946
905  chr21   5976730   7429360
906  chr21  14592916  14657056
907  chr21  19808058  21397649
908  chr21  21820886  22077901
909  chr21  22561006  23005888
910  chr21  25473663  26160273
911  chr21  26693456  28326067
912  chr21  30501245  34710361
913  chr21  35698126  36052399
914  chr21  36701826  38995722
915  chr21  40122532  40673153
916  chr21  41211634  41248211
917  chr21  41644225  43391767
918  chr21  44023336  44630830
919  chr21  47539670  48127414

For example, the below regions which exist in the range {20000000, 30000000}

908  chr21  21820886  22077901
909  chr21  22561006  23005888
910  chr21  25473663  26160273
911  chr21  26693456  28326067
GRanges • 3.7k views
ADD COMMENT
4
Entering edit mode
merv ▴ 120
@mmfansler-13248
Last seen 5 months ago
MSKCC | New York, NY

plyranges

The plyranges package can be syntactically helpful with this. Here are some example filtering operations:

# BiocManager::install('plyranges')
library(plyranges)

# all 'chr21' ranges
data_GR %>% 
  filter(seqnames == 'chr21')

# filter by one region (stringent, i.e., fully contained in region)
data_GR %>% 
  filter(seqnames == 'chr21', start >= 2e7L, end <= 3e7L)

# filter by one region (permissive, i.e., any overlap with region)
data_GR %>% 
  filter_by_overlaps(as('chr21:20000000-30000000', 'GRanges'))

# filter by multiple regions
data_GR %>% 
  filter_by_overlaps(as(c('chr1:1-10000000', 'chr21:20000000-30000000'), 'GRanges'))

The latter two are effectively doing what Kevin suggested, i.e., they create a GRanges object for region(s) one wishes to filter on. The filter_by_overlaps method also has the same optional maxgap and minoverlap arguments.

ADD COMMENT
2
Entering edit mode
Kevin Blighe ★ 3.9k
@kevin
Last seen 1 day ago
Republic of Ireland

Hi,

It should be a matter of creating a second GRanges object (with your target regions) and then using findOverlaps() or intersect() between both GRanges.

To account for an "approximate" overlap, make use of the maxgap and minoverlap parameters.

Kevin

ADD COMMENT
1
Entering edit mode
@michael-lawrence-3846
Last seen 2.3 years ago
United States

Another option subsetByOverlaps().

ADD COMMENT

Login before adding your answer.

Traffic: 862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6