Question: Subsetting a GenomicRanges::GRanges with "%in%" fails
gravatar for Aditya
13 days ago by
Aditya120 wrote:

Subsetting a GRanges with == works:

granges <- GenomicRanges::GRanges(
               c('chr1', 'chr2'), 
               seqinfo =  GenomeInfoDb::seqinfo(BSgenome.Mmusculus.UCSC.mm10::Mmusculus))
subset(granges, seqnames == 'chr1')

But subsetting with %in% fails:

subset(granges, seqnames %in% c('chr1', 'chr3'))
Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments
genomicranges • 116 views
ADD COMMENTlink modified 11 days ago • written 13 days ago by Aditya120
Answer: Subsetting a GenomicRanges::GRanges with "%in%" fails
gravatar for Michael Lawrence
12 days ago by
United States
Michael Lawrence11k wrote:

Running that code fails for me:

> granges <- GenomicRanges::GRanges(
+                               c('chr1', 'chr2'), 
+                               '100-200')
> granges %>% subset(seqnames == 'chr1')
Error in granges %>% subset(seqnames == "chr1") : 
  could not find function "%>%"

But the problem is likely that the %in%() generic is not visible, so the version in base is being called, which does not support the run-length encoding of the seqnames (an Rle object). The easiest solution is to just call library(GenomicRanges).

ADD COMMENTlink written 12 days ago by Michael Lawrence11k

Thank you Michael :-). Sorry for forgetting require(magrittr). Question now updated without magrittr to keep it simple. What would be a clean-namespace alternative to require(GenomicRanges) for GRanges subsetting?

ADD REPLYlink modified 12 days ago • written 12 days ago by Aditya120

You could always do something like IRanges::`%in%`(x, y). Keeping the workspace clean can be a pain though, because you'll constantly have to remember the origin of each function.

ADD REPLYlink written 11 days ago by Michael Lawrence11k

I see, IRanges is the place where it lives - thanks!

(Michael, merging my answer - based on your comments - into your's and then deleting mine could keep things clean for future reference)

ADD REPLYlink modified 11 days ago • written 11 days ago by Aditya120

Nope, %in% doesn't live in IRanges. IRanges used to define S4 methods for %in% but not anymore. The fact that IRanges still exports %in% is a leftover from a long time ago. That needs to be removed. The S4Vectors package defines the %in% method for Vector derivatives. Note that %in% is an implicit generic. This means that we don't have a setGeneric() statement somewhere that promotes it to an S4 generic function. It just gets automatically promoted on the first setMethod statement.

A cleaner situation would be to define the %in% generic with an explicit setGeneric statement in BiocGenerics. We will do this soon. Then all you need to do is import the generic from BiocGenerics in your NAMESPACE. Then just use %in% normally. This will call the generic, which takes care of dispatching to the appropriate method. Always call the generic. Trying to call a particular method is not robust.

So no need to do things like S4Vectors::`%in%`(x, y) or BiocGenerics::`%in%`(x, y). If you fully import BiocGenerics and S4Vectors (with import(BiocGenerics) and import(S4Vectors)), which I strongly recommend, then you can start using %in% now without having to worry where the generic is defined.


ADD REPLYlink modified 6 days ago • written 6 days ago by Hervé Pagès ♦♦ 14k

Oh, that's illuminating - thx!

Is the following understanding correct?

  • BioCgenerics contains all generics required by BioC classes. When a need for a new generic arises, it is explicitated in this package and exported. The idiom is the @import BiocGenerics, and then one is free to play.

  • S4Vectors::Vector is the fundamental BioC datatype. Other S4 classes, e.g. GenomicRanges, SummarizedExperiment, etc. inherit from Vector and build further on that. Therefore, the idiom is to also @import S4Vectors and then the BioC playfield lays wide open.

ADD REPLYlink written 6 days ago by Aditya120

Not exactly as there are generics specific to certain data types defined by the packages you mention and many others. Unless you are using a package for a very specific reason (like a single utility function), it's simplest to just import the package in bulk.

ADD REPLYlink written 6 days ago by Michael Lawrence11k

I see - it looks like I have to switch idioms then.

My default idiom has been to maintain a fully clean namespace, knowing exactly which function from which package is being used, being 100 % sure that no namespace clashes can occur.

But it looks like I need to drop that paradigm in the S4 world...

ADD REPLYlink modified 5 days ago • written 6 days ago by Aditya120

Would the following statement be right then?

On the one hand BiocGenerics exports generics common to multiple BioC packages.

On the other hand, GenomicRanges exports GRanges-specific generics, SummarizedExperiment exports SummarizedExperiment-specific generics, etc.

ADD REPLYlink modified 5 days ago • written 5 days ago by Aditya120

Yes, that's the idea. See ?BiocGenerics for the 2 kinds of generics we define in the BiocGenerics package. Also note that the proteomics folks define their own set of S4 generics in the ProtGenerics package.

ADD REPLYlink written 5 days ago by Hervé Pagès ♦♦ 14k

Thanks Herve! So the apparant namespace clashes upon require(BiocGenerics) are actually intentional promotions of base primitives and S3 generics to S4 generics?

ADD REPLYlink modified 5 days ago • written 5 days ago by Aditya120


ADD REPLYlink written 4 days ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 247 users visited in the last hour