BSgenome.Mmusculus.UCSC.mm10 contains mm10 (2012 version).
Is mm10.patch 6 - 2017: also available as a BSgenome?
You could just make your own version.
Thank you James :-).
I could make BSgenome.Mmusculus.UCSC.mm10.p6 and submit it to BioC, but BioC seems to host only the first release of the contemporary major version, am I right?
I wonder why BioC doesn't upgrade the BSgenomes with each new BioC release? Freezing the release guarantees stability, but from the other side the subsequent patches do not alter the genomic coordinates (only a new major version does), and 7 years is a lot... What would you say?
The main reason the BSgenomes don't get updated is lack of personnel to do so. There are maybe 3-4 people who do the bulk of the work for each release, and while some of that involves updating annotation data, probably more involves the logistics of ensuring that thousands of different packages (both analytical and experimental) are all ready to go upon release.
With limited personnel there has to be a hierarchy of necessity, and building BSgenome packages for each successive patch unfortunately comes way down on that hierarchy. Which is why the infrastructure exists to allow people to build their own if they so desire.
That said, there are 819 different TwoBit files on the AnnotationHub for Mus musculus, most of which are Ensembl based. Anything from release 92-97, so far as I know, is p6, so you can always get the TwoBitFile from there, but you probably want the toplevel rather than the primary assembly, so have to choose the strain:
> hub <- AnnotationHub()
> query(hub, c("twobitfile", "musculus"))
AnnotationHub with 819 records
## urg. Do better
> query(hub, c("twobitfile", "musculus", "release-96"))
AnnotationHub with 65 records
# snapshotDate(): 2019-05-02
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: TwoBitFile
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH70174"]]'
AH70174 | Mus_musculus.GRCm38.cdna.all.2bit
AH70175 | Mus_musculus.GRCm38.dna.primary_assembly.2bit
AH70176 | Mus_musculus.GRCm38.dna_rm.primary_assembly.2bit
AH70177 | Mus_musculus.GRCm38.dna_sm.primary_assembly.2bit
AH70178 | Mus_musculus.GRCm38.ncrna.2bit
AH70234 | Mus_musculus_pwkphj.PWK_PhJ_v1.ncrna.2bit
AH70235 | Mus_musculus_wsbeij.WSB_EiJ_v1.cdna.all.2bit
AH70236 | Mus_musculus_wsbeij.WSB_EiJ_v1.dna_rm.toplevel.2bit
AH70237 | Mus_musculus_wsbeij.WSB_EiJ_v1.dna_sm.toplevel.2bit
AH70238 | Mus_musculus_wsbeij.WSB_EiJ_v1.ncrna.2bit
> tb <- hub[["AH70175"]]
downloading 1 resources
retrieving 1 resource
loading from cache
AH70175 : 76921
require( rtracklayer )
> getSeq(tb, GRanges("1:34567-34599"))
A DNAStringSet instance of length 1
 33 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
## huh, seems masked, because primary?
> tb2 <- hub[["AH70201"]] ## black6
> getSeq(tb2, GRanges("1:34567-34599"))
A DNAStringSet instance of length 1
 33 TTTTTCTCCTTAAAATATTCGGGCAAGAAAGGA
I don't do much with BSgenome packages, so I don't know the fundamental differences, but to my eye, the TwoBitFile is pretty similar.
I'll second James' observations, including a work flow using TwoBit (via AnnotationHub) or even fasta files (managed using BiocFileCache) rather than BSgenome if these resources are sufficient for your research purposes.
Thank you Martin :-)
Thank you James for this extensive reply :-). I was not aware of the presence of these twobit files, so this is definitely good to know!
After looking into the ensembl fasta files, I realized the patches are provided in a separate alternate sequences file, leaving the primary assembly untouched, making the patch level information difficult to use for many applications. With a new Mus musculus major release being planned in the not so distant future, I think I will actually work with the current primary assembly for now, and update to the new major release when available.