Hello,
I'm trying to speed up lapply
applied to a GRangesList
object by replacing it with mclapply
.
library(EnsDb.Hsapiens.v86)
GRsList <- exonsBy(EnsDb.Hsapiens.v86, by = "tx")
# the goal
res <- lapply(GRsList, FUN = gaps)
# Here `gaps` is just for an example.
When I switch to mclapply
, the code does not run faster at all
res <- mclapply(GRsList, FUN = gaps, mc.cores = 4)
Careful inspection shows that inside mclapply
, much of the time is spent on converting GRangesList
to list
.
GRsList <- as(GRsList, "list") # takes forever
Therefore I'm curious about how to convert GRsList
to list
, or simply how to make mclapply
work with large GRangesList
objects.
Thanks,
Thanks @Martin. My goal is to get introns from each transcript, that is what
FUN
inlapply
does. Any suggestions?Then you're looking for
GenomicFeatures::intronsByTranscript()
!But
GenomicFeatures::intronsByTranscript()
does not work for EnsDb.Hspaiens.v86 which is fromensembldb
...intronsByTranscript()
should probably be non-generic, as it should really just rely on functions in the "TxDb" API. But anyway, if you look atselectMethod(intronsByTranscript, "TxDb")
, you will see the basic pattern for achieving what you want with ensembldb objects.That's a good idea. For each transcript,
intronsByTranscript
usespsetdiff
to find regions complementary to the exons andpsetdiff
is very fast.