Dear BioC,
How are the mappings of Affymetrix probe ids to Gene Ontology terms in
metadata package provided by Bioconductor build?
I am trying to use some gene set analysis packages and find some
pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
another package use the external gene set definition, such as MSigDB.
So I want to know the criteria for select specific GO term among
possible terms for each probe id in Bioconductor.
I already read the documents about AnnBuilder package, however.
--
mailto:tomonori.oura at gmail.com
Kyoto University School of Public Health
Department of Biostatistics
On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at="" gmail.com=""> wrote:
> Dear BioC,
>
> How are the mappings of Affymetrix probe ids to Gene Ontology terms
in
> metadata package provided by Bioconductor build?
>
> I am trying to use some gene set analysis packages and find some
> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
> another package use the external gene set definition, such as
MSigDB.
>
> So I want to know the criteria for select specific GO term among
> possible terms for each probe id in Bioconductor.
> I already read the documents about AnnBuilder package, however.
To make a long story short, the annotations available from affy are
mapped to Entrez Gene IDs. Then, the information from Entrez Gene--in
this case, gene ontology--is mapped to affy id. The dates associated
with the data, the source of the data, and how the data are mapped
will all affect the final mapping of affy ID to gene ontology. The
nice thing about gene ontology analyses is that they are typically
based on "sets" of genes making it much less important to start with
EXACTLY the same gene ontology mappings. In fact, in practice, it
will be pretty difficult to do so.
If you want to see the details of the current Bioconductor annotation
package build process, you want to read the AnnotationDbi SQLForge
vignette, as AnnBuilder is outdated.
Finally, if I have misunderstood your question, perhaps you could
clarify.
Sean
Hi Sean
Turning this into a more general question. Whenever I have to deal
with
a new type of Affymetrix array I seem to have to root around
Bioconductor packages to find out how it is annotated etc. By the time
I
come around to do it again it has all changed and is done in a
different
way to how it was done before. My difficulty is it all feels a bit
adhoc
and comes at me in bits and pieces. Also I always feel there is
probably
a better way to do it that I am missing.
Is there anywhere information that gives a better big picture that
pulls
it together a bit? What are the foundation designs/philosophy that all
the packages are following? Is there a routemap type document that
describes Bioconductor's approach to all this?
Any pointers to useful information gratefully received.
Thanks.
John Seers
---
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Sean
Davis
Sent: 02 October 2008 11:55
To: Oura Tomonori
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] How are GO2PROBE built
On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at="" gmail.com="">
wrote:
> Dear BioC,
>
> How are the mappings of Affymetrix probe ids to Gene Ontology terms
in
> metadata package provided by Bioconductor build?
>
> I am trying to use some gene set analysis packages and find some
> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
> another package use the external gene set definition, such as
MSigDB.
>
> So I want to know the criteria for select specific GO term among
> possible terms for each probe id in Bioconductor.
> I already read the documents about AnnBuilder package, however.
To make a long story short, the annotations available from affy are
mapped to Entrez Gene IDs. Then, the information from Entrez Gene--in
this case, gene ontology--is mapped to affy id. The dates associated
with the data, the source of the data, and how the data are mapped
will
all affect the final mapping of affy ID to gene ontology. The nice
thing about gene ontology analyses is that they are typically based on
"sets" of genes making it much less important to start with EXACTLY
the
same gene ontology mappings. In fact, in practice, it will be pretty
difficult to do so.
If you want to see the details of the current Bioconductor annotation
package build process, you want to read the AnnotationDbi SQLForge
vignette, as AnnBuilder is outdated.
Finally, if I have misunderstood your question, perhaps you could
clarify.
Sean
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi John,
The annotation system is not meant to be ad hoc although it has been
changing a lot recently as we have migrated to a much more powerful
database centric system. Please have a look at the vignettes on this
page to see more current information:
http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
Please let me know if you have any further questions.
Marc
john seers (IFR) wrote:
>
>
> Hi Sean
>
> Turning this into a more general question. Whenever I have to deal
with
> a new type of Affymetrix array I seem to have to root around
> Bioconductor packages to find out how it is annotated etc. By the
time I
> come around to do it again it has all changed and is done in a
different
> way to how it was done before. My difficulty is it all feels a bit
adhoc
> and comes at me in bits and pieces. Also I always feel there is
probably
> a better way to do it that I am missing.
>
> Is there anywhere information that gives a better big picture that
pulls
> it together a bit? What are the foundation designs/philosophy that
all
> the packages are following? Is there a routemap type document that
> describes Bioconductor's approach to all this?
>
> Any pointers to useful information gratefully received.
>
> Thanks.
>
>
> John Seers
>
>
>
>
>
>
>
> ---
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Sean
Davis
> Sent: 02 October 2008 11:55
> To: Oura Tomonori
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] How are GO2PROBE built
>
> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at="" gmail.com="">
> wrote:
>
>> Dear BioC,
>>
>> How are the mappings of Affymetrix probe ids to Gene Ontology terms
in
>>
>
>
>> metadata package provided by Bioconductor build?
>>
>> I am trying to use some gene set analysis packages and find some
>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>> another package use the external gene set definition, such as
MSigDB.
>>
>> So I want to know the criteria for select specific GO term among
>> possible terms for each probe id in Bioconductor.
>> I already read the documents about AnnBuilder package, however.
>>
>
> To make a long story short, the annotations available from affy are
> mapped to Entrez Gene IDs. Then, the information from Entrez Gene--
in
> this case, gene ontology--is mapped to affy id. The dates
associated
> with the data, the source of the data, and how the data are mapped
will
> all affect the final mapping of affy ID to gene ontology. The nice
> thing about gene ontology analyses is that they are typically based
on
> "sets" of genes making it much less important to start with EXACTLY
the
> same gene ontology mappings. In fact, in practice, it will be
pretty
> difficult to do so.
>
> If you want to see the details of the current Bioconductor
annotation
> package build process, you want to read the AnnotationDbi SQLForge
> vignette, as AnnBuilder is outdated.
>
> Finally, if I have misunderstood your question, perhaps you could
> clarify.
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Hi Marc and Sean
Thank you both for your answers.
When I said ad-hoc I did not mean to sound critical; it was just
coming
at me a bit ad-hoc because I could not find an overview of it. I think
the change to AnnotationDbi happened at some point so there seemed to
be
a lot of choices and without the knowledge I could not make sense of
it.
Plus dealing with some unusual arrays did not help me.
However Sean's short description is great and I will have a look at
AnnotationDbi.html with interest.
Thank you.
John Seers
---
-----Original Message-----
From: Marc Carlson [mailto:mcarlson@fhcrc.org]
Sent: 02 October 2008 17:09
To: john seers (IFR)
Cc: Sean Davis; Oura Tomonori; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] How are GO2PROBE built
Hi John,
The annotation system is not meant to be ad hoc although it has been
changing a lot recently as we have migrated to a much more powerful
database centric system. Please have a look at the vignettes on this
page to see more current information:
http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
Please let me know if you have any further questions.
Marc
john seers (IFR) wrote:
>
>
> Hi Sean
>
> Turning this into a more general question. Whenever I have to deal
> with a new type of Affymetrix array I seem to have to root around
> Bioconductor packages to find out how it is annotated etc. By the
time
> I come around to do it again it has all changed and is done in a
> different way to how it was done before. My difficulty is it all
feels
> a bit adhoc and comes at me in bits and pieces. Also I always feel
> there is probably a better way to do it that I am missing.
>
> Is there anywhere information that gives a better big picture that
> pulls it together a bit? What are the foundation designs/philosophy
> that all the packages are following? Is there a routemap type
document
> that describes Bioconductor's approach to all this?
>
> Any pointers to useful information gratefully received.
>
> Thanks.
>
>
> John Seers
>
>
>
>
>
>
>
> ---
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Sean
> Davis
> Sent: 02 October 2008 11:55
> To: Oura Tomonori
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] How are GO2PROBE built
>
> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori
> <tomonori.oura at="" gmail.com="">
> wrote:
>
>> Dear BioC,
>>
>> How are the mappings of Affymetrix probe ids to Gene Ontology terms
>> in
>>
>
>
>> metadata package provided by Bioconductor build?
>>
>> I am trying to use some gene set analysis packages and find some
>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>> another package use the external gene set definition, such as
MSigDB.
>>
>> So I want to know the criteria for select specific GO term among
>> possible terms for each probe id in Bioconductor.
>> I already read the documents about AnnBuilder package, however.
>>
>
> To make a long story short, the annotations available from affy are
> mapped to Entrez Gene IDs. Then, the information from Entrez Gene--
in
> this case, gene ontology--is mapped to affy id. The dates
associated
> with the data, the source of the data, and how the data are mapped
> will all affect the final mapping of affy ID to gene ontology. The
> nice thing about gene ontology analyses is that they are typically
> based on "sets" of genes making it much less important to start with
> EXACTLY the same gene ontology mappings. In fact, in practice, it
> will be pretty difficult to do so.
>
> If you want to see the details of the current Bioconductor
annotation
> package build process, you want to read the AnnotationDbi SQLForge
> vignette, as AnnBuilder is outdated.
>
> Finally, if I have misunderstood your question, perhaps you could
> clarify.
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
On Thu, Oct 2, 2008 at 8:18 AM, john seers (IFR) <john.seers at="" bbsrc.ac.uk=""> wrote:
>
>
> Hi Sean
>
> Turning this into a more general question. Whenever I have to deal
with
> a new type of Affymetrix array I seem to have to root around
> Bioconductor packages to find out how it is annotated etc. By the
time I
> come around to do it again it has all changed and is done in a
different
> way to how it was done before. My difficulty is it all feels a bit
adhoc
> and comes at me in bits and pieces. Also I always feel there is
probably
> a better way to do it that I am missing.
>
> Is there anywhere information that gives a better big picture that
pulls
> it together a bit? What are the foundation designs/philosophy that
all
> the packages are following? Is there a routemap type document that
> describes Bioconductor's approach to all this?
All of the annotation packages from Bioconductor follow the same
scheme and contain (generally, depending on organism) the same
information. The AnnotationDbi package describes these packages in
detail in two vignettes and the help pages. In short, though, all the
packages have a "key/value" concept where the key is typically some
gene/probe identifier and the values are annotation associated with
that gene/probe. Currently (in Bioc-2.2 and forward), the
implementation of these packages is a SQLite database accessed via
RSQLite and with a significant API build on top of that. Again, see
the AnnotationDbi documentation and code for details.
Hope that helps.
Sean
> ---
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Sean
Davis
> Sent: 02 October 2008 11:55
> To: Oura Tomonori
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] How are GO2PROBE built
>
> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at="" gmail.com="">
> wrote:
>> Dear BioC,
>>
>> How are the mappings of Affymetrix probe ids to Gene Ontology terms
in
>
>> metadata package provided by Bioconductor build?
>>
>> I am trying to use some gene set analysis packages and find some
>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>> another package use the external gene set definition, such as
MSigDB.
>>
>> So I want to know the criteria for select specific GO term among
>> possible terms for each probe id in Bioconductor.
>> I already read the documents about AnnBuilder package, however.
>
> To make a long story short, the annotations available from affy are
> mapped to Entrez Gene IDs. Then, the information from Entrez Gene--
in
> this case, gene ontology--is mapped to affy id. The dates
associated
> with the data, the source of the data, and how the data are mapped
will
> all affect the final mapping of affy ID to gene ontology. The nice
> thing about gene ontology analyses is that they are typically based
on
> "sets" of genes making it much less important to start with EXACTLY
the
> same gene ontology mappings. In fact, in practice, it will be
pretty
> difficult to do so.
>
> If you want to see the details of the current Bioconductor
annotation
> package build process, you want to read the AnnotationDbi SQLForge
> vignette, as AnnBuilder is outdated.
>
> Finally, if I have misunderstood your question, perhaps you could
> clarify.
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
Hi Sean
Thank you for general information about current annotation mapping
systems.
But, I want to know the specific information about building process of
metadata like,
how to omit the gene ontology terms with redundant or poor
information,
like the description of MSigDB C5 collection bellow,
>From http://www.broad.mit.edu/gsea/msigdb/collection_details.jsp#C5
GO gene sets for very broad categories, such as Biological Process,
have been omitted from MSigDB. GO gene sets with fewer than 10 genes
have also been omitted. Gene sets with the same members have been
resolved based on the GO tree structure: if a parent term has only one
child term and their gene sets have the same members, the child gene
set is omitted; if the gene sets of sibling terms have the same
members, the sibling gene sets are omitted.
Tomonori
2008/10/2 Sean Davis <sdavis2 at="" mail.nih.gov="">:
> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at="" gmail.com=""> wrote:
>> Dear BioC,
>>
>> How are the mappings of Affymetrix probe ids to Gene Ontology terms
in
>> metadata package provided by Bioconductor build?
>>
>> I am trying to use some gene set analysis packages and find some
>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>> another package use the external gene set definition, such as
MSigDB.
>>
>> So I want to know the criteria for select specific GO term among
>> possible terms for each probe id in Bioconductor.
>> I already read the documents about AnnBuilder package, however.
>
> To make a long story short, the annotations available from affy are
> mapped to Entrez Gene IDs. Then, the information from Entrez Gene--
in
> this case, gene ontology--is mapped to affy id. The dates
associated
> with the data, the source of the data, and how the data are mapped
> will all affect the final mapping of affy ID to gene ontology. The
> nice thing about gene ontology analyses is that they are typically
> based on "sets" of genes making it much less important to start with
> EXACTLY the same gene ontology mappings. In fact, in practice, it
> will be pretty difficult to do so.
>
> If you want to see the details of the current Bioconductor
annotation
> package build process, you want to read the AnnotationDbi SQLForge
> vignette, as AnnBuilder is outdated.
>
> Finally, if I have misunderstood your question, perhaps you could
clarify.
>
> Sean
>
On Thu, Oct 2, 2008 at 8:13 PM, Oura Tomonori <tomonori.oura at="" gmail.com=""> wrote:
> Hi Sean
>
> Thank you for general information about current annotation mapping
systems.
>
> But, I want to know the specific information about building process
of
> metadata like,
> how to omit the gene ontology terms with redundant or poor
information,
> like the description of MSigDB C5 collection bellow,
>
> >From http://www.broad.mit.edu/gsea/msigdb/collection_details.jsp#C5
>
> GO gene sets for very broad categories, such as Biological Process,
> have been omitted from MSigDB. GO gene sets with fewer than 10 genes
> have also been omitted. Gene sets with the same members have been
> resolved based on the GO tree structure: if a parent term has only
one
> child term and their gene sets have the same members, the child gene
> set is omitted; if the gene sets of sibling terms have the same
> members, the sibling gene sets are omitted.
Marc Carlson can be authoritative on this, but there is no cleanup or
omission of the data. The data are taken directly from NCBI Entrez
Gene and should agree with that source as of the date that the
packages were built.
Sean
> 2008/10/2 Sean Davis <sdavis2 at="" mail.nih.gov="">:
>> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at="" gmail.com=""> wrote:
>>> Dear BioC,
>>>
>>> How are the mappings of Affymetrix probe ids to Gene Ontology
terms in
>>> metadata package provided by Bioconductor build?
>>>
>>> I am trying to use some gene set analysis packages and find some
>>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>>> another package use the external gene set definition, such as
MSigDB.
>>>
>>> So I want to know the criteria for select specific GO term among
>>> possible terms for each probe id in Bioconductor.
>>> I already read the documents about AnnBuilder package, however.
>>
>> To make a long story short, the annotations available from affy are
>> mapped to Entrez Gene IDs. Then, the information from Entrez Gene
--in
>> this case, gene ontology--is mapped to affy id. The dates
associated
>> with the data, the source of the data, and how the data are mapped
>> will all affect the final mapping of affy ID to gene ontology. The
>> nice thing about gene ontology analyses is that they are typically
>> based on "sets" of genes making it much less important to start
with
>> EXACTLY the same gene ontology mappings. In fact, in practice, it
>> will be pretty difficult to do so.
>>
>> If you want to see the details of the current Bioconductor
annotation
>> package build process, you want to read the AnnotationDbi SQLForge
>> vignette, as AnnBuilder is outdated.
>>
>> Finally, if I have misunderstood your question, perhaps you could
clarify.
>>
>> Sean
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
Sean is right,
We don't try to guess what parts of GO you may or may want in your
analysis. As much as possible, we try to simply present all of the
data. Is is sometimes helpful to let people know that our GO
annotation
data can be found split into two kinds of packages:
The GO.db package just provides a snapshot of the GO ontology with no
information to affiliate GO IDs to any genes. We don't prune anything
off of this ontology.
Then the other kinds of annotation packages (chip and entrez
gene/organism based) contain the mappings between entrez gene IDs and
GO
terms which we obtain from NCBI. These too are intended to be
complete
as possible, and the only things left out are things that specifically
don't belong in the scope of that package. So for example, you should
not find mappings for mouse genes in the human centric package
org.Hs.eg.db.
But I am not sure how comparable this really is to MSigDB...
Marc
Sean Davis wrote:
> On Thu, Oct 2, 2008 at 8:13 PM, Oura Tomonori <tomonori.oura at="" gmail.com=""> wrote:
>
>> Hi Sean
>>
>> Thank you for general information about current annotation mapping
systems.
>>
>> But, I want to know the specific information about building process
of
>> metadata like,
>> how to omit the gene ontology terms with redundant or poor
information,
>> like the description of MSigDB C5 collection bellow,
>>
>> >From
http://www.broad.mit.edu/gsea/msigdb/collection_details.jsp#C5
>>
>> GO gene sets for very broad categories, such as Biological Process,
>> have been omitted from MSigDB. GO gene sets with fewer than 10
genes
>> have also been omitted. Gene sets with the same members have been
>> resolved based on the GO tree structure: if a parent term has only
one
>> child term and their gene sets have the same members, the child
gene
>> set is omitted; if the gene sets of sibling terms have the same
>> members, the sibling gene sets are omitted.
>>
>
> Marc Carlson can be authoritative on this, but there is no cleanup
or
> omission of the data. The data are taken directly from NCBI Entrez
> Gene and should agree with that source as of the date that the
> packages were built.
>
> Sean
>
>
>> 2008/10/2 Sean Davis <sdavis2 at="" mail.nih.gov="">:
>>
>>> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at="" gmail.com=""> wrote:
>>>
>>>> Dear BioC,
>>>>
>>>> How are the mappings of Affymetrix probe ids to Gene Ontology
terms in
>>>> metadata package provided by Bioconductor build?
>>>>
>>>> I am trying to use some gene set analysis packages and find some
>>>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>>>> another package use the external gene set definition, such as
MSigDB.
>>>>
>>>> So I want to know the criteria for select specific GO term among
>>>> possible terms for each probe id in Bioconductor.
>>>> I already read the documents about AnnBuilder package, however.
>>>>
>>> To make a long story short, the annotations available from affy
are
>>> mapped to Entrez Gene IDs. Then, the information from Entrez Gene
--in
>>> this case, gene ontology--is mapped to affy id. The dates
associated
>>> with the data, the source of the data, and how the data are mapped
>>> will all affect the final mapping of affy ID to gene ontology.
The
>>> nice thing about gene ontology analyses is that they are typically
>>> based on "sets" of genes making it much less important to start
with
>>> EXACTLY the same gene ontology mappings. In fact, in practice, it
>>> will be pretty difficult to do so.
>>>
>>> If you want to see the details of the current Bioconductor
annotation
>>> package build process, you want to read the AnnotationDbi SQLForge
>>> vignette, as AnnBuilder is outdated.
>>>
>>> Finally, if I have misunderstood your question, perhaps you could
clarify.
>>>
>>> Sean
>>>
>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
And you can do any/all of the filtering that is described on the Broad
page if you want to (all evidence codes and the entire tree structure
is
available to you).
You could also use GSEAbase and the tools in it to simply retrieve the
Broad sets, if you think you want to use them without having to
recreate.
you may see differences as they downloaded their GO data in January of
last year - we downloaded ours more recently (each package gives you
explicit dates and sources)
Marc Carlson wrote:
> Sean is right,
>
> We don't try to guess what parts of GO you may or may want in your
> analysis. As much as possible, we try to simply present all of the
> data. Is is sometimes helpful to let people know that our GO
annotation
> data can be found split into two kinds of packages:
>
> The GO.db package just provides a snapshot of the GO ontology with
no
> information to affiliate GO IDs to any genes. We don't prune
anything
> off of this ontology.
>
> Then the other kinds of annotation packages (chip and entrez
> gene/organism based) contain the mappings between entrez gene IDs
and GO
> terms which we obtain from NCBI. These too are intended to be
complete
> as possible, and the only things left out are things that
specifically
> don't belong in the scope of that package. So for example, you
should
> not find mappings for mouse genes in the human centric package
> org.Hs.eg.db.
>
> But I am not sure how comparable this really is to MSigDB...
>
> Marc
>
>
>
> Sean Davis wrote:
>> On Thu, Oct 2, 2008 at 8:13 PM, Oura Tomonori <tomonori.oura at="" gmail.com=""> wrote:
>>
>>> Hi Sean
>>>
>>> Thank you for general information about current annotation mapping
systems.
>>>
>>> But, I want to know the specific information about building
process of
>>> metadata like,
>>> how to omit the gene ontology terms with redundant or poor
information,
>>> like the description of MSigDB C5 collection bellow,
>>>
>>> >From
http://www.broad.mit.edu/gsea/msigdb/collection_details.jsp#C5
>>>
>>> GO gene sets for very broad categories, such as Biological
Process,
>>> have been omitted from MSigDB. GO gene sets with fewer than 10
genes
>>> have also been omitted. Gene sets with the same members have been
>>> resolved based on the GO tree structure: if a parent term has only
one
>>> child term and their gene sets have the same members, the child
gene
>>> set is omitted; if the gene sets of sibling terms have the same
>>> members, the sibling gene sets are omitted.
>>>
>> Marc Carlson can be authoritative on this, but there is no cleanup
or
>> omission of the data. The data are taken directly from NCBI Entrez
>> Gene and should agree with that source as of the date that the
>> packages were built.
>>
>> Sean
>>
>>
>>> 2008/10/2 Sean Davis <sdavis2 at="" mail.nih.gov="">:
>>>
>>>> On Thu, Oct 2, 2008 at 3:11 AM, Oura Tomonori <tomonori.oura at="" gmail.com=""> wrote:
>>>>
>>>>> Dear BioC,
>>>>>
>>>>> How are the mappings of Affymetrix probe ids to Gene Ontology
terms in
>>>>> metadata package provided by Bioconductor build?
>>>>>
>>>>> I am trying to use some gene set analysis packages and find some
>>>>> pakage use the *GO2PROBE (ex. hgu133aGO2PROBE) information, but
>>>>> another package use the external gene set definition, such as
MSigDB.
>>>>>
>>>>> So I want to know the criteria for select specific GO term among
>>>>> possible terms for each probe id in Bioconductor.
>>>>> I already read the documents about AnnBuilder package, however.
>>>>>
>>>> To make a long story short, the annotations available from affy
are
>>>> mapped to Entrez Gene IDs. Then, the information from Entrez
Gene--in
>>>> this case, gene ontology--is mapped to affy id. The dates
associated
>>>> with the data, the source of the data, and how the data are
mapped
>>>> will all affect the final mapping of affy ID to gene ontology.
The
>>>> nice thing about gene ontology analyses is that they are
typically
>>>> based on "sets" of genes making it much less important to start
with
>>>> EXACTLY the same gene ontology mappings. In fact, in practice,
it
>>>> will be pretty difficult to do so.
>>>>
>>>> If you want to see the details of the current Bioconductor
annotation
>>>> package build process, you want to read the AnnotationDbi
SQLForge
>>>> vignette, as AnnBuilder is outdated.
>>>>
>>>> Finally, if I have misunderstood your question, perhaps you could
clarify.
>>>>
>>>> Sean
>>>>
>>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org