I have been looking for the latest Macaque genome (mmul10) sequence definition for the BSgenome package; but can't find it. Is it available? If not, is it in progress and when would it be available?
It's not in progress; instead there is functionality that allows you to use a 2bit file from UCSC as if it were the package itself.
## after downloading from https://hgdownload.soe.ucsc.edu/goldenPath/rheMac10/bigZips/rheMac10.2bit
> z <- TwoBitFile("C:/Users/jmacdon/Downloads/rheMac10.2bit")
> getSeq(z, which = GRanges("chr1:1-1000"))
A DNAStringSet instance of length 1
 1000 AGGTTAGAAAATCTCCTAGTATTTCTTCTGATAG...CCCAGGTGTATTTGTGTGGCGCTTGCTGAGTGG
## or if you want everything
> zz <- import(z)
A DNAStringSet instance of length 2939
width seq names
 223616942 AGGTTAGAAAATCTCCTAGT...AGAGAAGGGAAGGGTAAAG chr1
 99517758 CACCCGTGGGCCTCCTCTTA...AGGGTTAGGGTTAGGTTAG chr10
 29490 CATTATATCGGCGCGGGCAG...ATCGAGACCATCCTGGCTA chr10_NW_02116024...
 29551 GCTCCAAAGCCCTCTGGGAC...ATCAAAGAAACACCAATTA chr10_NW_02116024...
 31521 GGAGTTCCAGACCAGCTTGG...CTGCAAGCTCCGCCTCCCG chr10_NW_02116024...
... ... ...
 48696 GTCTAAGCCACAAGGACTAC...ACTAAAGAGCTTCTGCACA chrX_NW_021160381...
 50511 AAACCACAATGAGATACCAT...CCAATTTCAAAGGAAATGC chrX_NW_021160382...
 68997 CGCAACTTTCATGGGATGGA...TGTAAAAGCACTCAACCGC chrX_NW_021160383...
 79627 TGTTAAGTACTACAGTGTAG...GGGACTGCACTGAATCTAT chrX_NW_021160384...
 11753682 GAATTCTCCCATTTAAATTA...NNNNNNNNNNNNNNNNNNN chrY
Right. Although it's not exactly the same. The BSgenome wrapper adds a few conveniences like the ability to rename the seqlevels, "inject" SNPS, properly handle circular sequences, and a cleaner sequence order. Also not all workflows support TwoBitFile objects (even though they probably should).
Note that it's relatively easy to forge your own BSgenome package. The process is documented in the "How to forge a BSgenome data package" vignette (linked on the BSgenome landing page).
Macaque is one of those organisms for which we have traditionally tried to keep up by providing new BSgenome data packages as new assemblies become available. We'll add one for the latest assembly, Mmul_10 (will be BSgenome.Mmulatta.UCSC.rheMac10, following the usual naming scheme).
Ah, thanks, Herve. I forgot to ask about creating my own BSgenome package. Thanks for the link. Will look into that. Do you have a time estimate of when the BSgenome.Mmulatta.UCSC.rheMac10 will be ready?
I forgot to mention that the functionality is in the rtracklayer package.
Thanks, James. I will see if I can work with that.
BSgenome.Mmulatta.UCSC.rheMac10 is now available. See https://support.bioconductor.org/p/126958/
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy