Fast way to access metadata for GRangesList?
2
0
Entering edit mode
@robert-k-bradley-5997
Last seen 8 months ago
United States

I'm struggling to find a fast way to access the metadata associated with a GRangesList. I can explicitly invoke:

lapply (GRangesList, mcols)

but that is very slow. Instead, I'm hoping to do something like

GRangesList$exon_rank

in order to get an IntegerList containing that data. (I then want to perform downstream operations on the list.)

More specifically, I want to extract a list of exon ranks from a TxDb object. The following code works:

refSeqDb = suppressWarnings (makeTranscriptDbFromUCSC (
"hg19",
tablename = "refGene"))

refseq2exons = exonsBy (refSeqDb, by = "tx")

refseq2exons = refseq2exons[, "exon_rank"]

exonRankList = lapply (lapply (refseq2exons, mcols), "[[", 1)

However, the final step--involving multiple calls to lapply--is extremely slow.

grangeslist metadata • 2.2k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

Hi Robert -- I think you want

ranks = unlist(refseq2exons)$exon_rank

which gives you the ranks, and 

ranksList = relist(ranks, refseq2exons)

which gives you an IntegerList of ranks grouped by transcript (get a plain-old-list as a third step, with as(ranksList, "list")). All will be fast.

The idea is that the GRangesList is represented by a single GRanges and a 'partitioning' vector that tells the GRangesList class how the elements of the GRanges are to be grouped to form the list. unlist()ing the GRangesList just removes the partitioning information (fast) and exposes the underlying GRanges with associated metadata. relist()ing takes the 'flesh' of the exon ranks and wraps them around the geometry of the 'skeleton' implied by the original GRangesList.

This

split (x = unlist (refseq2exons)$exon_rank, f = names (unlist (refseq2exons)))

is identical to as(ranksList[order(names(xx))], "list").

ADD COMMENT
0
Entering edit mode
@robert-k-bradley-5997
Last seen 8 months ago
United States

I figured out a somewhat non-intuitive way to do this using a split/unlist approach:

exonRankList = split (x = unlist (refseq2exons)$exon_rank, f = names (unlist (refseq2exons)))

This is less intuitive than I was hoping for, but it works and is reasonably fast.

ADD COMMENT
0
Entering edit mode

Hi Martin, Robert,

Note that using the unlist/relist approach will always work and do the right thing for this kind of situation. It is therefore the recommended idiom. FWIW the unlist/split approach has the following pitfalls:

  1. It assumes that the original list-like object (refseq2exons in your case) has unique names. This won't always be the case e.g. if you use exonsBy() with use.names=TRUE to obtain refseq2exons.
  2. If the original list-like object has inner names, unlist() will do some strange name mangling in order to "blend" the inner names with the outer names. Then splitting based on the names will likely give an incorrect result.
  3. If the original list-like object is an ordinary list, unlist() will mangle the outer names in a silly way. Then again, splitting based on the names will likely give an incorrect result.
  4. Empty list elements in the original list-like object get lost. That is, the split step is unable to generate list elements that correspond to empty list elements in the original list-like object.
  5. The result of split (your exonRankList list) will generally not be parallel to the original list-like object, parallel meaning that the 2 objects have the same length and the i-th element in one corresponds to the i-th element in the other. For example, in your case the list elements in exonRankList are in a different order than in refseq2exons.

The unlist/relist approach has none of these problems, that is, it will always produce a result that is parallel to and has the same shape as the original object. It's also slightly more efficient than the unlist/split approach.

Cheers,

H.

ADD REPLY

Login before adding your answer.

Traffic: 539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6