making use of the Apis mellifera BeeBase assembly 4 data in goseq

0

Entering edit mode

Matthew Young ▴ 50

@matthew-young-4865

Last seen 11.4 years ago

And a belated answer to question 3. To match gene ID to GO categories, goseq calls the getgo function with the genome name provided. It then looks up a local variable called .ORG_PACKAGES and will use the organism package that corresponds to the relevant entry in the list. To access this variable type: > goseq:::.ORG_PACKAGES anoGam Arabidopsis bosTau ce "org.Ag.eg" "org.At.tair" "org.Bt.eg" "org.Ce.eg" canFam dm danRer E. coli K12 "org.Cf.eg" "org.Dm.eg" "org.Dr.eg" "org.EcK12.eg" E. coli Sakai galGal hg mm "org.EcSakai.eg" "org.Gg.eg" "org.Hs.eg" "org.Mm.eg" rheMac Malaria panTro rn "org.Mmu.eg" "org.Pf.plasmo" "org.Pt.eg" "org.Rn.eg" sacCer Pig xenTro "org.Sc.sgd" "org.Ss.eg" "org.Xl.eg" If you add the name of your custom package to this variable, everything *should* work fine. However, my advice would be to construct a table of geneID-GO category mappings (and another of geneID-length) and just use them directly. This is described in detail in the goseq vignette. Cheers, Matt On Sat, Feb 25, 2012 at 9:38 AM, Alicia Oshlack <alicia.oshlack@mcri.edu.au>wrote: > Hi Vanessa, > > In answer to your question number 2, in order for you to use a genome which > is not supported (if it's not in UCSC then it's not supported in goseq) > then you are right in that you will need the annotation (gene length) and > the mapping from geneids to GO terms. It's not enough just to have the > genome in order to use goseq. > > Cheers, > Alicia > > > On 25/02/12 6:32 PM, "bioconductor-request@r-project.org" > <bioconductor-request@r-project.org> wrote: > > > > > Message: 9 > > Date: Fri, 24 Feb 2012 12:51:37 -0800 > > From: Herv? Pag?s <hpages@fhcrc.org> > > To: "Corby, Vanessa" <vanessa.corby@ars.usda.gov> > > Cc: "myoung@wehi.edu.au" <myoung@wehi.edu.au>, > > "bioconductor@r-project.org" <bioconductor@r-project.org> > > Subject: Re: [BioC] making use of the Apis mellifera BeeBase assembly > > 4 data in goseq > > Message-ID: <4F47F859.5080200@fhcrc.org> > > Content-Type: text/plain; charset=windows-1252; format=flowed > > > > Hello Vanessa, > > > > On 02/24/2012 10:45 AM, Corby, Vanessa wrote: > >> Hello Herve and Matt, > >> > >> After looking through the Bioconductor documentation for the BeeBase > >> assembly 4 package Herv? posted (information on the Apis 4 annotation > >> stored in Biostrings objects), the documentation for the org.Hs.eg.db > >> Annotation database documentation, the bioconductor mailing list, the > >> BSgenome documentation, and the goseq documentation, I am still very > >> confused about whether I can use the assembly 4 package that Herv? > >> posted in goseq. > > > > Just to clarify, goseq is not my package so I can't "post" anything > > in it, whatever that means. I assume you are talking about the > > BSgenome.Amellifera.BeeBase.assembly4 package that I made and that > > is part of Bioconductor. > > > >> The reason that I want to use the assembly 4 data is > >> that I would presume that it will have more current information than the > >> natively supported (by goseq) Apis release 2. > > > > It's a more recent assembly so I would expect it to be more accurate > > (i.e. closer to reality). > > > >> > >> So, here are my questions: > >> > >> 1.Will release 4 offer much improvement over release 2? If this is not > >> the case, then the next two questions are moot. > > > > It's just a more recent assembly, with all what that implies. > > > >> > >> 2.Do I need to get information on the transcript lengths and the > >> associations between the geneids and GO terms for the Apis 4 release and > >> build 2 new files of this information for goseq to use? > > > > I'm not familiar with the goseq package so I'll let Matt answer this. > > > >> Is that > >> information available (perhaps through UCSC or Baylor?s site for the > >> honeybee projects)? Can I use Bioconductor for this if I have the > >> annotation database file Herv? posted? > > > > The BSgenome.Amellifera.BeeBase.assembly4 package only contains the > > DNA sequences of Apis 4 release. It does *not* contain annotations > > for this assembly. > > > > One advantage of using the BSgenome.Amellifera.UCSC.apiMel2 package > > instead is that you have an easy access to a world of annotations for > > this genome thru the UCSC genome browser. Too bad that the UCSC folks > > have not plans to support apiMel4: > > > > https://lists.soe.ucsc.edu/pipermail/genome/2007-October/014763.html > > > > apiMel2 is 7 year old now! > > > > Note that the GenomicFeatures and rtracklayer packages make it really > > simple to import and work with those annotations in R/Bioconductor. > > > >> > >> 3.Do I just have to rename the Apis 4 genome package that Herv? posted > >> in order to use it in goseq (I see that there are several naming > >> conventions on the Annotation Data packages)? > > > > I'll let Matt answer this. > > > >> > >> You can see that some of these questions are more appropriate for Herv? > >> and some for Matt, so I decided to email both of you. Some of these > >> issues arise simply because I?ve only been successful with the example > >> in the goseq documentation (using the org.Hs.eg.db Annotation database). > >> Others arise because I am just very new to R and the Bioconductor > packages. > > > > For what is worth, I don't think there is any org.* package for Bee > > (would probably be named something like org.Am.eg.db if there was one). > > And if there was one, you would need to double-check that the > > annotations in it are actually compatible with whatever genome assembly > > you finally decided to use. > > > >> > >> Thanks for any help you can offer. And apologies if this is the 100^th > >> time you?ve received an email about this from newbies such as myself. > > > > No problem. Wish I could help more. I'm cc'ing the Bioconductor mailing > > list (hope you don't mind). It's a better place to ask questions like > > this as other people might be able to help and also the whole > > discussion will be archived and searchable for further reference. > > > > Cheers, > > H. > > > >> > >> Vanessa Corby-Harris > >> > >> Research Molecular Biologist > >> > >> USDA-ARS > >> > >> Carl Hayden Bee Research Center > >> > >> 2000 E. Allen Rd., Tucson, AZ 85719 > >> > >> (520) 647-9269 > >> > >> This electronic message contains information generated by the USDA > >> solely for the intended recipients. Any unauthorized interception of > >> this message or the use or disclosure of the information it contains may > >> violate the law and subject the violator to civil or criminal penalties. > >> If you believe you have received this message in error, please notify > >> the sender and delete the email immediately. > > > > > > -- > > Herv? Pag?s > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpages@fhcrc.org > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > > > > > > > ------------------------------ > > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

Annotation GO Cancer Apis mellifera BSgenome Biostrings Category BSgenome rtracklayer GO • 1.5k views

ADD COMMENT • link 13.8 years ago Matthew Young ▴ 50

Login before adding your answer.