Question: How to subset a BSgenome
0
gravatar for Aditya
10 weeks ago by
Aditya120
Germany
Aditya120 wrote:
# Load a BSgenome object
bs <-  BSgenome.Mmusculus.UCSC.mm10::Mmusculus

# Extracting a single chromosome works fine
bs[['chr1']]
bs[['chr2']]

# Question: how to take a BSgenome subset?
bs[c('chr1', 'chr2')]
Error in bsgenome[c("chr1", "chr2")] : object of type 'S4' is not subsettable
ADD COMMENTlink modified 10 weeks ago by shepherl ♦♦ 1.6k • written 10 weeks ago by Aditya120
Answer: How to subset a BSgenome
1
gravatar for shepherl
10 weeks ago by
shepherl ♦♦ 1.6k
United States
shepherl ♦♦ 1.6k wrote:

The [[ method will only work on a single chromosome. You can also use the getSeq function that has more utility for subsetting for sequences. The getSeq will take a list of chromosomes or a GRanges object for subsetting

> getSeq(bs, c("chr1","chr2"))
  A DNAStringSet instance of length 2
        width seq                                           names               
[1] 195471971 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr1
[2] 182113224 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr2

as proof of principle to create a GRanges object to subset and get sequences

> gr =  as(seqinfo(bs)[c("chr1","chr2")], "GRanges")
> gr
GRanges object with 2 ranges and 0 metadata columns:
       seqnames      ranges strand
          <Rle>   <IRanges>  <Rle>
  chr1     chr1 1-195471971      *
  chr2     chr2 1-182113224      *
  -------
  seqinfo: 2 sequences from mm10 genome
> getSeq(bs, gr)
  A DNAStringSet instance of length 2
        width seq                                           names               
[1] 195471971 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr1
[2] 182113224 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr2

ADD COMMENTlink written 10 weeks ago by shepherl ♦♦ 1.6k

Thank you Lori :-). Would it be useful/easy to create a subsetter method? Then

Biostrings::vcountPDict(targetseqs, bsgenome) could be easily modified into Biostrings::vcountPDict(targetseqs, bsgenome[c('chr1', 'chr2')])

Or would you suggest using vcountPDict.Biostrings rather than vcountPDict.BSgenome ?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Aditya120

How would this be different than what thegetSeq already does as displayed above?

ADD REPLYlink written 6 weeks ago by shepherl ♦♦ 1.6k

I agree a [ method for BSgenome objects would be nice. Many years ago the bsapply() functionality was added to the BSgenome package to work around this. Note that bsapply() supports the exclude argument to exclude some chormosomes. See ?bsapply.

Also please note that the BSgenome package defines vmatchPattern()/vcountPattern()/vmatchPDict()/vcountPDict() methods for BSgenome objects. See ?`vmatchPattern,BSgenome-method`. These methods use bsapply() internally and also support the exclude argument. But yes, being able to subset the BSgenome object before passing it to vcountPDict() would be more elegant/intuitive. Unfortunately improving the BSgenome functionality is not high priority at the moment.

H.

ADD REPLYlink written 5 weeks ago by Hervé Pagès ♦♦ 14k

And by [ method for BSgenome objects, I mean an endomorphic subsetter i.e. a subsetting operation that returns another BSgenome object. The getSeq-based subsetting that Lori showed you is not endomorphic because it returns the selected sequences in a DNAStringSet object instead of a BSgenome object.

BTW this getSeq-based subsetting can be done more simply with:

> getSeq(bs, c("chr1", "chr2"))
  A DNAStringSet instance of length 2
        width seq                                           names               
[1] 195471971 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr1
[2] 182113224 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr2

No need to use a GRanges object. See ?`getSeq,BSgenome-method`

H.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 139 users visited in the last hour