Question: Subsetting a GenomicRanges::GRanges with "%in%" fails
0
13 days ago by
Germany

Subsetting a GRanges with == works:

granges <- GenomicRanges::GRanges(
c('chr1', 'chr2'),
'100-200',
seqinfo =  GenomeInfoDb::seqinfo(BSgenome.Mmusculus.UCSC.mm10::Mmusculus))
subset(granges, seqnames == 'chr1')


But subsetting with %in% fails:

subset(granges, seqnames %in% c('chr1', 'chr3'))
Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments

genomicranges • 116 views
modified 11 days ago • written 13 days ago by Aditya120
Answer: Subsetting a GenomicRanges::GRanges with "%in%" fails
0
12 days ago by
United States
Michael Lawrence11k wrote:

Running that code fails for me:

> granges <- GenomicRanges::GRanges(
+                               c('chr1', 'chr2'),
+                               '100-200')
> granges %>% subset(seqnames == 'chr1')
Error in granges %>% subset(seqnames == "chr1") :
could not find function "%>%"


But the problem is likely that the %in%() generic is not visible, so the version in base is being called, which does not support the run-length encoding of the seqnames (an Rle object). The easiest solution is to just call library(GenomicRanges).

Thank you Michael :-). Sorry for forgetting require(magrittr). Question now updated without magrittr to keep it simple. What would be a clean-namespace alternative to require(GenomicRanges) for GRanges subsetting?

1

You could always do something like IRanges::%in%(x, y). Keeping the workspace clean can be a pain though, because you'll constantly have to remember the origin of each function.

I see, IRanges is the place where it lives - thanks!

(Michael, merging my answer - based on your comments - into your's and then deleting mine could keep things clean for future reference)

1

Nope, %in% doesn't live in IRanges. IRanges used to define S4 methods for %in% but not anymore. The fact that IRanges still exports %in% is a leftover from a long time ago. That needs to be removed. The S4Vectors package defines the %in% method for Vector derivatives. Note that %in% is an implicit generic. This means that we don't have a setGeneric() statement somewhere that promotes it to an S4 generic function. It just gets automatically promoted on the first setMethod statement.

A cleaner situation would be to define the %in% generic with an explicit setGeneric statement in BiocGenerics. We will do this soon. Then all you need to do is import the generic from BiocGenerics in your NAMESPACE. Then just use %in% normally. This will call the generic, which takes care of dispatching to the appropriate method. Always call the generic. Trying to call a particular method is not robust.

So no need to do things like S4Vectors::%in%(x, y) or BiocGenerics::%in%(x, y). If you fully import BiocGenerics and S4Vectors (with import(BiocGenerics) and import(S4Vectors)), which I strongly recommend, then you can start using %in% now without having to worry where the generic is defined.

H.

Oh, that's illuminating - thx!

Is the following understanding correct?

• BioCgenerics contains all generics required by BioC classes. When a need for a new generic arises, it is explicitated in this package and exported. The idiom is the @import BiocGenerics, and then one is free to play.

• S4Vectors::Vector is the fundamental BioC datatype. Other S4 classes, e.g. GenomicRanges, SummarizedExperiment, etc. inherit from Vector and build further on that. Therefore, the idiom is to also @import S4Vectors and then the BioC playfield lays wide open.

1

Not exactly as there are generics specific to certain data types defined by the packages you mention and many others. Unless you are using a package for a very specific reason (like a single utility function), it's simplest to just import the package in bulk.

I see - it looks like I have to switch idioms then.

My default idiom has been to maintain a fully clean namespace, knowing exactly which function from which package is being used, being 100 % sure that no namespace clashes can occur.

But it looks like I need to drop that paradigm in the S4 world...

Would the following statement be right then?

On the one hand BiocGenerics exports generics common to multiple BioC packages.

On the other hand, GenomicRanges exports GRanges-specific generics, SummarizedExperiment exports SummarizedExperiment-specific generics, etc.

1

Yes, that's the idea. See ?BiocGenerics for the 2 kinds of generics we define in the BiocGenerics package. Also note that the proteomics folks define their own set of S4 generics in the ProtGenerics package.

Thanks Herve! So the apparant namespace clashes upon require(BiocGenerics) are actually intentional promotions of base primitives and S3 generics to S4 generics?