Entering edit mode
Matthew Young
▴
50
@matthew-young-4865
Last seen 10.3 years ago
And a belated answer to question 3. To match gene ID to GO
categories,
goseq calls the getgo function with the genome name provided. It then
looks up a local variable called .ORG_PACKAGES and will use the
organism
package that corresponds to the relevant entry in the list. To access
this
variable type:
> goseq:::.ORG_PACKAGES
anoGam Arabidopsis bosTau ce
"org.Ag.eg" "org.At.tair" "org.Bt.eg" "org.Ce.eg"
canFam dm danRer E. coli K12
"org.Cf.eg" "org.Dm.eg" "org.Dr.eg" "org.EcK12.eg"
E. coli Sakai galGal hg mm
"org.EcSakai.eg" "org.Gg.eg" "org.Hs.eg" "org.Mm.eg"
rheMac Malaria panTro rn
"org.Mmu.eg" "org.Pf.plasmo" "org.Pt.eg" "org.Rn.eg"
sacCer Pig xenTro
"org.Sc.sgd" "org.Ss.eg" "org.Xl.eg"
If you add the name of your custom package to this variable,
everything
*should* work fine. However, my advice would be to construct a table
of
geneID-GO category mappings (and another of geneID-length) and just
use
them directly. This is described in detail in the goseq vignette.
Cheers,
Matt
On Sat, Feb 25, 2012 at 9:38 AM, Alicia Oshlack
<alicia.oshlack@mcri.edu.au>wrote:
> Hi Vanessa,
>
> In answer to your question number 2, in order for you to use a
genome which
> is not supported (if it's not in UCSC then it's not supported in
goseq)
> then you are right in that you will need the annotation (gene
length) and
> the mapping from geneids to GO terms. It's not enough just to have
the
> genome in order to use goseq.
>
> Cheers,
> Alicia
>
>
> On 25/02/12 6:32 PM, "bioconductor-request@r-project.org"
> <bioconductor-request@r-project.org> wrote:
>
> >
> > Message: 9
> > Date: Fri, 24 Feb 2012 12:51:37 -0800
> > From: Herv? Pag?s <hpages@fhcrc.org>
> > To: "Corby, Vanessa" <vanessa.corby@ars.usda.gov>
> > Cc: "myoung@wehi.edu.au" <myoung@wehi.edu.au>,
> > "bioconductor@r-project.org" <bioconductor@r-project.org>
> > Subject: Re: [BioC] making use of the Apis mellifera BeeBase
assembly
> > 4 data in goseq
> > Message-ID: <4F47F859.5080200@fhcrc.org>
> > Content-Type: text/plain; charset=windows-1252; format=flowed
> >
> > Hello Vanessa,
> >
> > On 02/24/2012 10:45 AM, Corby, Vanessa wrote:
> >> Hello Herve and Matt,
> >>
> >> After looking through the Bioconductor documentation for the
BeeBase
> >> assembly 4 package Herv? posted (information on the Apis 4
annotation
> >> stored in Biostrings objects), the documentation for the
org.Hs.eg.db
> >> Annotation database documentation, the bioconductor mailing list,
the
> >> BSgenome documentation, and the goseq documentation, I am still
very
> >> confused about whether I can use the assembly 4 package that
Herv?
> >> posted in goseq.
> >
> > Just to clarify, goseq is not my package so I can't "post"
anything
> > in it, whatever that means. I assume you are talking about the
> > BSgenome.Amellifera.BeeBase.assembly4 package that I made and that
> > is part of Bioconductor.
> >
> >> The reason that I want to use the assembly 4 data is
> >> that I would presume that it will have more current information
than the
> >> natively supported (by goseq) Apis release 2.
> >
> > It's a more recent assembly so I would expect it to be more
accurate
> > (i.e. closer to reality).
> >
> >>
> >> So, here are my questions:
> >>
> >> 1.Will release 4 offer much improvement over release 2? If this
is not
> >> the case, then the next two questions are moot.
> >
> > It's just a more recent assembly, with all what that implies.
> >
> >>
> >> 2.Do I need to get information on the transcript lengths and the
> >> associations between the geneids and GO terms for the Apis 4
release and
> >> build 2 new files of this information for goseq to use?
> >
> > I'm not familiar with the goseq package so I'll let Matt answer
this.
> >
> >> Is that
> >> information available (perhaps through UCSC or Baylor?s site for
the
> >> honeybee projects)? Can I use Bioconductor for this if I have the
> >> annotation database file Herv? posted?
> >
> > The BSgenome.Amellifera.BeeBase.assembly4 package only contains
the
> > DNA sequences of Apis 4 release. It does *not* contain annotations
> > for this assembly.
> >
> > One advantage of using the BSgenome.Amellifera.UCSC.apiMel2
package
> > instead is that you have an easy access to a world of annotations
for
> > this genome thru the UCSC genome browser. Too bad that the UCSC
folks
> > have not plans to support apiMel4:
> >
> >
https://lists.soe.ucsc.edu/pipermail/genome/2007-October/014763.html
> >
> > apiMel2 is 7 year old now!
> >
> > Note that the GenomicFeatures and rtracklayer packages make it
really
> > simple to import and work with those annotations in
R/Bioconductor.
> >
> >>
> >> 3.Do I just have to rename the Apis 4 genome package that Herv?
posted
> >> in order to use it in goseq (I see that there are several naming
> >> conventions on the Annotation Data packages)?
> >
> > I'll let Matt answer this.
> >
> >>
> >> You can see that some of these questions are more appropriate for
Herv?
> >> and some for Matt, so I decided to email both of you. Some of
these
> >> issues arise simply because I?ve only been successful with the
example
> >> in the goseq documentation (using the org.Hs.eg.db Annotation
database).
> >> Others arise because I am just very new to R and the Bioconductor
> packages.
> >
> > For what is worth, I don't think there is any org.* package for
Bee
> > (would probably be named something like org.Am.eg.db if there was
one).
> > And if there was one, you would need to double-check that the
> > annotations in it are actually compatible with whatever genome
assembly
> > you finally decided to use.
> >
> >>
> >> Thanks for any help you can offer. And apologies if this is the
100^th
> >> time you?ve received an email about this from newbies such as
myself.
> >
> > No problem. Wish I could help more. I'm cc'ing the Bioconductor
mailing
> > list (hope you don't mind). It's a better place to ask questions
like
> > this as other people might be able to help and also the whole
> > discussion will be archived and searchable for further reference.
> >
> > Cheers,
> > H.
> >
> >>
> >> Vanessa Corby-Harris
> >>
> >> Research Molecular Biologist
> >>
> >> USDA-ARS
> >>
> >> Carl Hayden Bee Research Center
> >>
> >> 2000 E. Allen Rd., Tucson, AZ 85719
> >>
> >> (520) 647-9269
> >>
> >> This electronic message contains information generated by the
USDA
> >> solely for the intended recipients. Any unauthorized interception
of
> >> this message or the use or disclosure of the information it
contains may
> >> violate the law and subject the violator to civil or criminal
penalties.
> >> If you believe you have received this message in error, please
notify
> >> the sender and delete the email immediately.
> >
> >
> > --
> > Herv? Pag?s
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages@fhcrc.org
> > Phone: (206) 667-5791
> > Fax: (206) 667-1319
> >
> >
> >
> > ------------------------------
>
>
>
______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud
service.
> For more information please visit http://www.symanteccloud.com
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]