Can't find library BSgenome.Mmulatta.UCSC for latest genome (mmul10)
2
0
Entering edit mode
P Darakjian ▴ 40
@p-darakjian-3255
Last seen 5.0 years ago

I have been looking for the latest Macaque genome (mmul10) sequence definition for the BSgenome package; but can't find it. Is it available? If not, is it in progress and when would it be available?

annotation mmul10 BSgenome • 1.4k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States

It's not in progress; instead there is functionality that allows you to use a 2bit file from UCSC as if it were the package itself.

## after downloading from https://hgdownload.soe.ucsc.edu/goldenPath/rheMac10/bigZips/rheMac10.2bit

> z <- TwoBitFile("C:/Users/jmacdon/Downloads/rheMac10.2bit")
> getSeq(z, which = GRanges("chr1:1-1000"))
  A DNAStringSet instance of length 1
    width seq
[1]  1000 AGGTTAGAAAATCTCCTAGTATTTCTTCTGATAG...CCCAGGTGTATTTGTGTGGCGCTTGCTGAGTGG
## or if you want everything
> zz <- import(z)
> zz
  A DNAStringSet instance of length 2939
           width seq                                        names               
   [1] 223616942 AGGTTAGAAAATCTCCTAGT...AGAGAAGGGAAGGGTAAAG chr1
   [2]  99517758 CACCCGTGGGCCTCCTCTTA...AGGGTTAGGGTTAGGTTAG chr10
   [3]     29490 CATTATATCGGCGCGGGCAG...ATCGAGACCATCCTGGCTA chr10_NW_02116024...
   [4]     29551 GCTCCAAAGCCCTCTGGGAC...ATCAAAGAAACACCAATTA chr10_NW_02116024...
   [5]     31521 GGAGTTCCAGACCAGCTTGG...CTGCAAGCTCCGCCTCCCG chr10_NW_02116024...
   ...       ... ...
[2935]     48696 GTCTAAGCCACAAGGACTAC...ACTAAAGAGCTTCTGCACA chrX_NW_021160381...
[2936]     50511 AAACCACAATGAGATACCAT...CCAATTTCAAAGGAAATGC chrX_NW_021160382...
[2937]     68997 CGCAACTTTCATGGGATGGA...TGTAAAAGCACTCAACCGC chrX_NW_021160383...
[2938]     79627 TGTTAAGTACTACAGTGTAG...GGGACTGCACTGAATCTAT chrX_NW_021160384...
[2939]  11753682 GAATTCTCCCATTTAAATTA...NNNNNNNNNNNNNNNNNNN chrY
ADD COMMENT
1
Entering edit mode

Right. Although it's not exactly the same. The BSgenome wrapper adds a few conveniences like the ability to rename the seqlevels, "inject" SNPS, properly handle circular sequences, and a cleaner sequence order. Also not all workflows support TwoBitFile objects (even though they probably should).

Note that it's relatively easy to forge your own BSgenome package. The process is documented in the "How to forge a BSgenome data package" vignette (linked on the BSgenome landing page).

Macaque is one of those organisms for which we have traditionally tried to keep up by providing new BSgenome data packages as new assemblies become available. We'll add one for the latest assembly, Mmul_10 (will be BSgenome.Mmulatta.UCSC.rheMac10, following the usual naming scheme).

H.

ADD REPLY
0
Entering edit mode

Ah, thanks, Herve. I forgot to ask about creating my own BSgenome package. Thanks for the link. Will look into that. Do you have a time estimate of when the BSgenome.Mmulatta.UCSC.rheMac10 will be ready?

ADD REPLY
0
Entering edit mode

I forgot to mention that the functionality is in the rtracklayer package.

ADD REPLY
0
Entering edit mode

Thanks, James. I will see if I can work with that.

ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 4 hours ago
Seattle, WA, United States

BSgenome.Mmulatta.UCSC.rheMac10 is now available. See https://support.bioconductor.org/p/126958/

Cheers,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6