**20**wrote:

The current paradigm to represent genomic variables (e.g. ChIP-seq coverage) in Bioconductor is as a named RleList or as a metadata column on a disjoint GRanges object. This can provide remarkable functionality and performance. However, how does one properly implement mathematical or statistical operations on these objects?

Consider the following (based on the genomicvars help page of GenomicRanges):

library(BSgenome.Scerevisiae.UCSC.sacCer3) BSgen <- BSgenome.Scerevisiae.UCSC.sacCer3 # Function to make fake artificial genomic variables makeVars <- function(BSgen,n) { RleList( lapply(seqlengths(Scerevisiae), function(seqlen) { tmp <- sample(n, seqlen, replace=TRUE) Rle(cumsum(tmp - rev(tmp))) }) ) } # Make genomic variables set.seed(17) x <- makeVars(BSgen,50) set.seed(300) y <- makeVars(BSgen,50) y[y<0] <- 1 # keep y positive

We can use certain supported functionality for performing mathematical operations, e.g. finding parallel maxima using pmax:

newVar.bioc <- BiocGenerics::pmax(x,y) # works right away and is fast and awesome! newVar.base <- base::pmax(x,y) # slow and unsatisfying but still works identical(newVar.base,newVar.bioc)

However, what if I wanted to use other operations that aren't supported, e.g. ppois:

newVar.ppois <- stats::ppois(x,y)

This can be circumvented by manual coercion, but it defeats the purpose of using RleList

x.n <- lapply(x,as.numeric) # defeats the puropse of using RleList y.n <- lapply(y,as.numeric) ppois.xy <- RleList(lapply(seq_along(x.n), function(i){ x.i <- x.n[[i]] y.i <- y.n[[i]] ppois.i <- ppois(x.i,lambda=y.i) }))

My question, then, is what is the best practice in bioconductor for performing mathematical/statistical operations like ppois on genomicvars? Does this require extending R with c/c++? I'm a little unsure where to begin and any advice would be much appreciated.

Devin

**13k**• written 12 months ago by Devin King •

**20**