# Load a BSgenome object
bs <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
# Extracting a single chromosome works fine
bs[['chr1']]
bs[['chr2']]
# Question: how to take a BSgenome subset?
bs[c('chr1', 'chr2')]
Error in bsgenome[c("chr1", "chr2")] : object of type 'S4' is not subsettable
The [[ method will only work on a single chromosome. You can also use the getSeq function that has more utility for subsetting for sequences. The getSeq will take a list of chromosomes or a GRanges object for subsetting
> getSeq(bs, c("chr1","chr2"))
A DNAStringSet instance of length 2
width seq names
[1] 195471971 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr1
[2] 182113224 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr2
as proof of principle to create a GRanges object to subset and get sequences
> gr = as(seqinfo(bs)[c("chr1","chr2")], "GRanges")
> gr
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
chr1 chr1 1-195471971 *
chr2 chr2 1-182113224 *
-------
seqinfo: 2 sequences from mm10 genome
> getSeq(bs, gr)
A DNAStringSet instance of length 2
width seq names
[1] 195471971 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr1
[2] 182113224 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr2
I agree a [ method for BSgenome objects would be nice. Many years ago the bsapply() functionality was added to the BSgenome package to work around this. Note that bsapply() supports the exclude argument to exclude some chormosomes. See ?bsapply.
Also please note that the BSgenome package defines vmatchPattern()/vcountPattern()/vmatchPDict()/vcountPDict() methods for BSgenome objects. See ?`vmatchPattern,BSgenome-method`. These methods use bsapply() internally and also support the exclude argument. But yes, being able to subset the BSgenome object before passing it to vcountPDict() would be more elegant/intuitive. Unfortunately improving the BSgenome functionality is not high priority at the moment.
And by [ method for BSgenome objects, I mean an endomorphic subsetter i.e. a subsetting operation that returns another BSgenome object. The getSeq-based subsetting that Lori showed you is not endomorphic because it returns the selected sequences in a DNAStringSet object instead of a BSgenome object.
BTW this getSeq-based subsetting can be done more simply with:
Thank you Lori :-). Would it be useful/easy to create a subsetter method? Then
Biostrings::vcountPDict(targetseqs, bsgenome)
could be easily modified intoBiostrings::vcountPDict(targetseqs, bsgenome[c('chr1', 'chr2')])
Or would you suggest using
vcountPDict.Biostrings
rather thanvcountPDict.BSgenome
?How would this be different than what the
getSeq
already does as displayed above?I agree a
[
method for BSgenome objects would be nice. Many years ago thebsapply()
functionality was added to the BSgenome package to work around this. Note thatbsapply()
supports theexclude
argument to exclude some chormosomes. See?bsapply
.Also please note that the BSgenome package defines
vmatchPattern()
/vcountPattern()
/vmatchPDict()
/vcountPDict()
methods for BSgenome objects. See?`vmatchPattern,BSgenome-method`
. These methods usebsapply()
internally and also support theexclude
argument. But yes, being able to subset the BSgenome object before passing it tovcountPDict()
would be more elegant/intuitive. Unfortunately improving the BSgenome functionality is not high priority at the moment.H.
And by
[
method for BSgenome objects, I mean an endomorphic subsetter i.e. a subsetting operation that returns another BSgenome object. ThegetSeq
-based subsetting that Lori showed you is not endomorphic because it returns the selected sequences in a DNAStringSet object instead of a BSgenome object.BTW this
getSeq
-based subsetting can be done more simply with:No need to use a GRanges object. See
?`getSeq,BSgenome-method`
H.