Question

replace, merge and delete entries in GRangeList

0

Entering edit mode

shanrong.zhao • 0

@shanrongzhao-10606

Last seen 7.1 years ago

Background: I have GRangeLit object, and want to know how to merge, replace and delete entries.

txdb <- makeTxDbFromGFF(file="hg19.gencode.v19.gtf", format="gtf")

seqlevels(txdb, force=TRUE) <- "chr22"

exons <- reduce(exonsBy(txdb, by = "gene"))

Then I identify a cluster of overlapping of genes, let’s say #3, #321. Now I want to replace exons[3] by merging exons[3] and exons[321]. My code snippet loos like:

ids <- c(3,321)

id <- ids[1]

exons[[id]] <- reduce(unlist(exons[ids]))

It works, but the statement “exons[[id]] <- reduce(unlist(exons[ids]))” is very slow, and I don’t understand why.

Other generic questions are:

For GRangeList exons object above, what are the most elegant way to delete/merge/and replace entries.

1. replace: asked above, but performance is bad

2. delete: I cannot do something like exons[3] <- NULL.

> exons[[100]] <- NULL
> exons[[100]]
GRanges object with 2 ranges and 0 metadata columns:
   seqnames               ranges strand
      <Rle>            <IRanges> <Rle>
      chr22 [50247490, 50247637]      +
      chr22 [50276982, 50283726]      +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> exons[[100]] <- NULL
> exons[[100]]
GRanges object with 2 ranges and 0 metadata columns:
   seqnames               ranges strand
      <Rle>            <IRanges> <Rle>
      chr22 [50919985, 50920386]      +
      chr22 [50920996, 50924869]      +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> exons[[100]] <- NULL
> exons[[100]]
GRanges object with 12 ranges and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges> <Rle>
         chr22 [20067755, 20067906]      +
         chr22 [20068092, 20068208]      +
         chr22 [20073210, 20074844]      +
         chr22 [20077192, 20077781]      +
         chr22 [20078958, 20079155]      +
...      ...                  ...    ...
         chr22 [20082235, 20082318]      +
         chr22 [20093700, 20093800]      +
         chr22 [20094090, 20094559]      +
         chr22 [20094794, 20096707]      +
         chr22 [20097548, 20099400]      +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

grangeslist • 1.0k views

ADD COMMENT • link updated 8.0 years ago by Michael Lawrence ★ 11k • written 8.0 years ago by shanrong.zhao • 0

score 0 · Answer 1 · 2016-04-30

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

Let's assume you have a list of those clusters, call it idList. If it is a partitioning (a many-to-one grouping) of your exons, then (in devel) you can cast it as such and regroup the exons (currently grouped by transcript) to be grouped by cluster:

exonClusters <- regroup(exons, ManyToOneGrouping(idList))

Then just reduce them:

reduce(exonClusters)

In general, you want to avoid loops and avoid replacement. Instead, model your analysis as functional transformations, where the structure, such as the grouping, is inherent in the object, not your code. Your code should convey only the high-level semantics.

ADD COMMENT • link 8.0 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks. I am using GenomicFeatures_1.20.1, and the two functions you mentioned are not available.

Do you know where I get them (regroup, ManyToOneGrouping)? Also how does ManyToOneGrouping merge the orignal names?

ADD REPLY • link 8.0 years ago shanrong.zhao • 0

0

Entering edit mode

They are from IRanges in devel. Actually, I never got around to exporting and documenting regroup(), so I just did that. IRanges 2.5.47.

ADD REPLY • link 8.0 years ago Michael Lawrence ★ 11k