Hi
Are there any plans to add the most recent Drosophila and Chimpanzee
genomes to the BSgenome list?
The most recent UCSC versions are the Apr. 2006 assembly of the D.
melanogaster genome (dm3) and the Chimpanzee Genome Mar. 2006
(panTro2). The Mac OS packages would be nice to have.
Thanks
Joseph
[[alternative HTML version deleted]]
Hi Joseph,
Are you sure that the dm3 assembly provided by UCSC (based on BDGP
Release 5)
is different from the FlyBase r5.1 assembly? If not then you could
just use
the BSgenome.Dmelanogaster.FlyBase.r51 package which contains the
FlyBase r5.1
assembly (I think that the differences between the various 5.y
releases from
FlyBase are on the annotation side only, but the chromosome sequences
should
be the same).
Anyway I've started building a BSgenome package for dm3. Once it's
ready it
will be easy to verify that the chromosome sequences are indeed the
same than
in FlyBase r5.1 by doing something like:
library(BSgenome.Dmelanogaster.FlyBase.r51)
r51 <- BSgenome.Dmelanogaster.FlyBase.r51::Dmelanogaster
library(BSgenome.Dmelanogaster.UCSC.dm3)
dm3 <- BSgenome.Dmelanogaster.UCSC.dm3::Dmelanogaster
r51$chr2L == unmasked(dm3$chr2L)
I'll take this opportunity to add the same built-in masks to this new
package
than the ones I've already added to other BSgenome data packages (only
Human,
Mouse and Dog so far). Those built-in masks are new in Bioconductor
2.2 and
some examples on how to use them are shown in the GenomeSearching
vignette
(this vignette has been moved from the Biostrings pkg to the BSgenome
pkg).
I will also make a BSgenome data pkg for Chimpanzee (with masks too)
and post
here again when this is ready.
Cheers,
H.
joseph wrote:
> Hi
> Are there any plans to add the most recent Drosophila and Chimpanzee
> genomes to the BSgenome list?
> The most recent UCSC versions are the Apr. 2006 assembly of the D.
> melanogaster genome (dm3) and the Chimpanzee Genome Mar. 2006
> (panTro2). The Mac OS packages would be nice to have.
> Thanks
> Joseph
>
>
Hi Joseph,
The source packages for dm3 (Fly) and panTro2 (Chimp) are now
available. I've also put dm2 back (used to be part of the
BSgenome family in previous versions of Bioconductor, but was
temporarily broken).
I can confirm now that the chromosomes sequences in dm3 are the
same as in FlyBase.r51. The exact set of sequences provided and
their exact names are a little bit different though:
library(BSgenome.Dmelanogaster.FlyBase.r51)
r51 <- BSgenome.Dmelanogaster.FlyBase.r51::Dmelanogaster
library(BSgenome.Dmelanogaster.UCSC.dm3)
dm3 <- BSgenome.Dmelanogaster.UCSC.dm3::Dmelanogaster
Then:
> seqnames(r51)
[1] "2L" "2R"
[3] "3L" "3R"
[5] "4" "X"
[7] "U" "dmel_mitochondrion_genome"
[9] "2LHet" "2RHet"
[11] "3LHet" "3RHet"
[13] "XHet" "YHet"
> seqnames(dm3)
[1] "chr2L" "chr2R" "chr3L" "chr3R" "chr4"
"chrX"
[7] "chrU" "chrM" "chr2LHet" "chr2RHet" "chr3LHet"
"chr3RHet"
[13] "chrXHet" "chrYHet" "chrUextra"
To compare chr2L, or chrM:
> r51[["2L"]] == unmasked(dm3$chr2L)
[1] TRUE
> r51[["dmel_mitochondrion_genome"]] == unmasked(dm3$chrM)
[1] TRUE
The binary versions of the packages for Windows and Mac will follow
soon.
Cheers,
H.
Herve Pages wrote:
> Hi Joseph,
>
> Are you sure that the dm3 assembly provided by UCSC (based on BDGP
> Release 5)
> is different from the FlyBase r5.1 assembly? If not then you could
just use
> the BSgenome.Dmelanogaster.FlyBase.r51 package which contains the
> FlyBase r5.1
> assembly (I think that the differences between the various 5.y
releases
> from
> FlyBase are on the annotation side only, but the chromosome
sequences
> should
> be the same).
>
> Anyway I've started building a BSgenome package for dm3. Once it's
ready it
> will be easy to verify that the chromosome sequences are indeed the
same
> than
> in FlyBase r5.1 by doing something like:
>
> library(BSgenome.Dmelanogaster.FlyBase.r51)
> r51 <- BSgenome.Dmelanogaster.FlyBase.r51::Dmelanogaster
> library(BSgenome.Dmelanogaster.UCSC.dm3)
> dm3 <- BSgenome.Dmelanogaster.UCSC.dm3::Dmelanogaster
> r51$chr2L == unmasked(dm3$chr2L)
>
> I'll take this opportunity to add the same built-in masks to this
new
> package
> than the ones I've already added to other BSgenome data packages
(only
> Human,
> Mouse and Dog so far). Those built-in masks are new in Bioconductor
2.2 and
> some examples on how to use them are shown in the GenomeSearching
vignette
> (this vignette has been moved from the Biostrings pkg to the
BSgenome pkg).
>
> I will also make a BSgenome data pkg for Chimpanzee (with masks too)
and
> post
> here again when this is ready.
>
> Cheers,
> H.
>
>
> joseph wrote:
>> Hi
>> Are there any plans to add the most recent Drosophila and
Chimpanzee
>> genomes to the BSgenome list?
>> The most recent UCSC versions are the Apr. 2006 assembly of the D.
>> melanogaster genome (dm3) and the Chimpanzee Genome Mar. 2006
>> (panTro2). The Mac OS packages would be nice to have.
>> Thanks
>> Joseph
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
Herve Pages wrote:
[...]
>
> The binary versions of the packages for Windows and Mac will follow
soon.
The binary versions of all the BSgenome data packages are now online
(in the
release), ready for download and installation via biocLite() (for
R-2.7 + BioC-2.2
users).
H.