BSgenome package for a. thaliana
1
0
Entering edit mode
@olegstatberkeleyedu-4310
Last seen 9.7 years ago
Hi, all I want to use genome package corresponding to TAIR9 version of a.thaliana genome. It seems that BSgenome makes 2 genome versions available: "BSgenome.Athaliana.TAIR.01222004" and "BSgenome.Athaliana.TAIR.04232008". After checking them out, they actually seem to be the same and represent an earlier version of the genome (TAIR8?). I could probably try to put together TAIR9 genome using BSgenome manual, but I thought there might be a package out there already, since TAIR9 has been around for a while now (TAIR10 has been released last month). If someone knows of one, please let me know! Oleg.
BSgenome BSgenome BSgenome BSgenome • 1.6k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 9 hours ago
Seattle, WA, United States
Hi Oleg, On 12/06/2010 07:25 PM, oleg at stat.berkeley.edu wrote: > Hi, all > I want to use genome package corresponding to TAIR9 version of a.thaliana > genome. It seems that BSgenome makes 2 genome versions available: > "BSgenome.Athaliana.TAIR.01222004" and "BSgenome.Athaliana.TAIR.04232008". > After checking them out, they actually seem to be the same and represent > an earlier version of the genome (TAIR8?). They are not the same: > alphabetFrequency(BSgenome.Athaliana.TAIR.01222004::Athaliana$chr1) A C G T M R W S Y K 9711178 5436538 5422303 9698578 76 37 124 31 85 53 V H D B N - + 0 0 0 0 163560 0 0 > alphabetFrequency(BSgenome.Athaliana.TAIR.04232008::Athaliana$chr1) A C G T M R W S Y K 9709677 5435365 5421130 9697107 76 36 124 30 82 53 V H D B N - + 0 0 0 0 168883 0 0 > I could probably try to put > together TAIR9 genome using BSgenome manual, but I thought there might be > a package out there already, since TAIR9 has been around for a while now > (TAIR10 has been released last month). If someone knows of one, please let > me know! We'll take car of making those 2. As you pointed out, the ones we have are pretty old now and we really need to provide something more recent. I'll post back here when BSgenomes for TAIR9 and TAIR10 are available. Cheers, H. > > Oleg. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Hi Oleg, I finally managed to make a BSgenome package for TAIR9. Release notes for BSgenome.Athaliana.TAIR.TAIR9: - TAIR9 and TAIR10 correspond to the same genome assembly so there is no need for a BSgenome pkg for TAIR10 :-) - Sequences in TAIR9 are named Chr1, Chr2, ..., ChrM, ChrC instead of chr1, chr2, ..., chrM, chrC in previous BSgenome pkg (i.e. in BSgenome.Athaliana.TAIR.04232008). The problem is: there are at least 3 different naming conventions used concurrently at TAIR. Even within the same genome release, chromosomes are not named the same way in FASTA files, GFF files, filenames, etc... However, Chr1, Chr2, ..., ChrM, ChrC seem to be the most widely used sequence naming convention (at least that's what seems to be used in the GFF files I looked at). - Sequences have been slightly redordered in the BSgenome pkg for TAIR9: now ChrM preceeds ChrC (this is to be more consistent with BSgenome pkgs for other organisms). - Seqlengths: > library(BSgenome.Athaliana.TAIR.TAIR9) > seqlengths(Athaliana) Chr1 Chr2 Chr3 Chr4 Chr5 ChrM ChrC 30427671 19698289 23459830 18585056 26975502 366924 154478 - Sequences Chr1 to Chr5 have changed in TAIR9 with respect to previous BSgenome pkg but not the ChrM and ChrC sequences. - The new package still doesn't contain any built-in masks (the locations/sizes of the assembly gaps provided by TAIR seem to be wrong as they don't correspond to the N-blocks found in the sequences). BSgenome.Athaliana.TAIR.TAIR9 will become available in release and devel in about 1 hour (source packages only for now). Also, this is the end of life for BSgenome.Athaliana.TAIR.01222004 and I will drop it from devel. Please let me know if you have any questions. Cheers, H. On 12/07/2010 10:42 AM, Hervé Pagès wrote: > Hi Oleg, > > On 12/06/2010 07:25 PM, oleg at stat.berkeley.edu wrote: >> Hi, all >> I want to use genome package corresponding to TAIR9 version of a.thaliana >> genome. It seems that BSgenome makes 2 genome versions available: >> "BSgenome.Athaliana.TAIR.01222004" and >> "BSgenome.Athaliana.TAIR.04232008". >> After checking them out, they actually seem to be the same and represent >> an earlier version of the genome (TAIR8?). > > They are not the same: > > > alphabetFrequency(BSgenome.Athaliana.TAIR.01222004::Athaliana$chr1) > A C G T M R W S Y K > 9711178 5436538 5422303 9698578 76 37 124 31 85 53 > V H D B N - + > 0 0 0 0 163560 0 0 > > alphabetFrequency(BSgenome.Athaliana.TAIR.04232008::Athaliana$chr1) > A C G T M R W S Y K > 9709677 5435365 5421130 9697107 76 36 124 30 82 53 > V H D B N - + > 0 0 0 0 168883 0 0 > > >> I could probably try to put >> together TAIR9 genome using BSgenome manual, but I thought there might be >> a package out there already, since TAIR9 has been around for a while now >> (TAIR10 has been released last month). If someone knows of one, please >> let >> me know! > > We'll take car of making those 2. As you pointed out, the ones we have > are pretty old now and we really need to provide something more recent. > I'll post back here when BSgenomes for TAIR9 and TAIR10 are available. > > Cheers, > H. > >> >> Oleg. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6