How to subset GRangesList by seqnames
1
0
Entering edit mode
Junghoon ▴ 10
@649f7bcf
Last seen 3.1 years ago
South Korea

Is there a simple way to subset a GRangesList by its seqnames (for example, select only transcripts in chromosome 1)?

I tried this:

gr1 = GRanges(seqnames = "chr2",
               ranges = IRanges(103, 106),
               strand = "+",
               score = 5L, GC = 0.45)

gr2 = GRanges(seqnames = c("chr1", "chr1"),
               ranges = IRanges(c(107, 113), width = 3),
               strand = c("+", "-"),
               score = 3:4, GC = c(0.3, 0.5))

grl = GRangesList(txA = gr1, txB = gr2)

grl_chr1 = grl[seqnames(grl) == "chr1"]

But grl_chr1 is a GRangesList with the same length as grl with its first element as a zero-length GRanges object. Is there a simple way to remove all zero-length elements from a GRangesList so that I can get a new GRangesList with only transcripts from chr1?

GenomicRanges • 1.5k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States
> gr1 = GRanges(seqnames = "chr2",
               ranges = IRanges(103, 106),
               strand = "+",
               score = 5L, GC = 0.45)

gr2 = GRanges(seqnames = c("chr1", "chr1"),
               ranges = IRanges(c(107, 113), width = 3),
               strand = c("+", "-"),
               score = 3:4, GC = c(0.3, 0.5))

grl = GRangesList(txA = gr1, txB = gr2)
gr1 = GRanges(seqnames = "chr2",
+                ranges = IRanges(103, 106),
+                strand = "+",
+                score = 5L, GC = 0.45)
> 
> gr2 = GRanges(seqnames = c("chr1", "chr1"),
+                ranges = IRanges(c(107, 113), width = 3),
+                strand = c("+", "-"),
+                score = 3:4, GC = c(0.3, 0.5))
> 
> grl = GRangesList(txA = gr1, txB = gr2)
> subsetByOverlaps(grl, GRanges("chr1:1-1000"))
GRangesList object of length 1:
$txB
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
  [1]     chr1   107-109      + |         3       0.3
  [2]     chr1   113-115      - |         4       0.5
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

But do note that it won't work for any transcripts that are on multiple chromosomes.

> gr3 <- GRanges(c("chr1","chr2"), IRanges(c(3,8), c(45, 200)))
> grl <- GRangesList(txA = gr1, txB = gr2, txC = gr3)
> subsetByOverlaps(grl, GRanges("chr1:1-1000"))
GRangesList object of length 2:
$txB
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
  [1]     chr1   107-109      + |         3       0.3
  [2]     chr1   113-115      - |         4       0.5
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

$txC
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
  [1]     chr1      3-45      * |      <NA>        NA
  [2]     chr2     8-200      * |      <NA>        NA
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
ADD REPLY

Login before adding your answer.

Traffic: 607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6