GOstats and GenePix arrays
4
0
Entering edit mode
@jacob-michaelson-1079
Last seen 10.2 years ago
Hi all, I'm trying to use the "guts" of the GOHyperG function in GOstats as a basis for a similar function for GenePix data. I've found a basic description of the phyper function in the context of GO: # How to implement phyper function for GO analysis # phyper(x-1, m, n-m , k, lower.tail = FALSE) # x: number of sample genes at GO node (can be vector with many entries) # m: number of genes at GO node (works with vector of same length as x) # n: number of unique genes at all GO nodes # k: number of unique genes in test sample that have GO mappings Values for x and k seem straightforward, but I'm wondering about m and n. The arrays we're working with seem to have fewer genes on them than the total number cataloged in the organism's online databases. So should m and n be based on the absolute total number of genes annotated, or the number of genes annotated *on the chip*? Thanks in advance, Jake
GO GOstats GO GOstats • 2.1k views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.2 years ago
Jake <jjmichael at="" comcast.net=""> writes: > Hi all, > > I'm trying to use the "guts" of the GOHyperG function in GOstats as a > basis for a similar function for GenePix data. I've found a basic > description of the phyper function in the context of GO: I recently refactored the guts of GOHyperG and the guts are in the Category package. So it may help to review what is there. I'm not sure I understand what you mean w.r.t a similar func for GenePix data... + seth
ADD COMMENT
0
Entering edit mode
Sean answered my question. By "similar function" I meant a function that was platform agnostic, as opposed to GOHyperG which assumes Affymetrix data. Really I'm not even writing a function, but just using the phyper function as part of a workflow in the place of GOHyperG. --Jake On Thu, 2006-05-11 at 10:41 -0700, Seth Falcon wrote: > Jake <jjmichael at="" comcast.net=""> writes: > > > Hi all, > > > > I'm trying to use the "guts" of the GOHyperG function in GOstats as a > > basis for a similar function for GenePix data. I've found a basic > > description of the phyper function in the context of GO: > > I recently refactored the guts of GOHyperG and the guts are in the > Category package. So it may help to review what is there. > > > I'm not sure I understand what you mean w.r.t a similar func for > GenePix data... > > + seth > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi, Jake wrote: > Sean answered my question. By "similar function" I meant a function > that was platform agnostic, as opposed to GOHyperG which assumes > Affymetrix data. Really I'm not even writing a function, but just using It absolutely does not! Robert > the phyper function as part of a workflow in the place of GOHyperG. > > --Jake > > > > On Thu, 2006-05-11 at 10:41 -0700, Seth Falcon wrote: >> Jake <jjmichael at="" comcast.net=""> writes: >> >>> Hi all, >>> >>> I'm trying to use the "guts" of the GOHyperG function in GOstats as a >>> basis for a similar function for GenePix data. I've found a basic >>> description of the phyper function in the context of GO: >> I recently refactored the guts of GOHyperG and the guts are in the >> Category package. So it may help to review what is there. >> >> >> I'm not sure I understand what you mean w.r.t a similar func for >> GenePix data... >> >> + seth >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.6 years ago
United States
Hi, I am not sure why you think that you should do anything different for GenePix? The array used is completely irrelevant to this sort of hypergeometric testing and there should be no need to modify GOstats in any way. You simply make an annotation package for your array (using AnnBuilder or any other tool of your choice) and then use it. best wishes Robert Jake wrote: > Hi all, > > I'm trying to use the "guts" of the GOHyperG function in GOstats as a > basis for a similar function for GenePix data. I've found a basic > description of the phyper function in the context of GO: > > # How to implement phyper function for GO analysis > # phyper(x-1, m, n-m , k, lower.tail = FALSE) > # x: number of sample genes at GO node (can be vector with many > entries) > # m: number of genes at GO node (works with vector of same length > as x) > # n: number of unique genes at all GO nodes > # k: number of unique genes in test sample that have GO mappings > > Values for x and k seem straightforward, but I'm wondering about m and > n. The arrays we're working with seem to have fewer genes on them than > the total number cataloged in the organism's online databases. So > should m and n be based on the absolute total number of genes annotated, > or the number of genes annotated *on the chip*? > > Thanks in advance, > > Jake > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD COMMENT
0
Entering edit mode
I hadn't thought of going through the trouble of making a custom annotation package. Last time I tried making one was quite a while back and it was quite a pain. I'm sure things work more smoothly now, but by looking at GOHyperG I realized all I really need is phyper and the appropriate GO mappings, which I've gotten through TAIR and the use of GOANCESTOR. I guess in the light of making a custom annotation package, GOHyperG isn't *technically* Affy-only, though with components like "go2Affy", it's obvious what type of data was in mind. Thanks for the comments and insight. --Jake On Thu, 2006-05-11 at 10:57 -0700, rgentlem wrote: > Hi, > > I am not sure why you think that you should do anything different for > GenePix? The array used is completely irrelevant to this sort of > hypergeometric testing and there should be no need to modify GOstats in > any way. > You simply make an annotation package for your array (using AnnBuilder > or any other tool of your choice) and then use it. > > best wishes > Robert > > Jake wrote: > > Hi all, > > > > I'm trying to use the "guts" of the GOHyperG function in GOstats as a > > basis for a similar function for GenePix data. I've found a basic > > description of the phyper function in the context of GO: > > > > # How to implement phyper function for GO analysis > > # phyper(x-1, m, n-m , k, lower.tail = FALSE) > > # x: number of sample genes at GO node (can be vector with many > > entries) > > # m: number of genes at GO node (works with vector of same length > > as x) > > # n: number of unique genes at all GO nodes > > # k: number of unique genes in test sample that have GO mappings > > > > Values for x and k seem straightforward, but I'm wondering about m and > > n. The arrays we're working with seem to have fewer genes on them than > > the total number cataloged in the organism's online databases. So > > should m and n be based on the absolute total number of genes annotated, > > or the number of genes annotated *on the chip*? > > > > Thanks in advance, > > > > Jake > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > >
ADD REPLY
0
Entering edit mode
Jake wrote: > I hadn't thought of going through the trouble of making a custom > annotation package. Last time I tried making one was quite a while back > and it was quite a pain. I'm sure things work more smoothly now, but by > looking at GOHyperG I realized all I really need is phyper and the > appropriate GO mappings, which I've gotten through TAIR and the use of > GOANCESTOR. > Well, if you don't go to the trouble, then you will almost surely be getting the wrong answer, and to paraphrase one of the really clever folks, there are easier and faster ways to do that :-) > I guess in the light of making a custom annotation package, GOHyperG > isn't *technically* Affy-only, though with components like "go2Affy", > it's obvious what type of data was in mind. > > Thanks for the comments and insight. > > --Jake > > > On Thu, 2006-05-11 at 10:57 -0700, rgentlem wrote: >> Hi, >> >> I am not sure why you think that you should do anything different for >> GenePix? The array used is completely irrelevant to this sort of >> hypergeometric testing and there should be no need to modify GOstats in >> any way. >> You simply make an annotation package for your array (using AnnBuilder >> or any other tool of your choice) and then use it. >> >> best wishes >> Robert >> >> Jake wrote: >>> Hi all, >>> >>> I'm trying to use the "guts" of the GOHyperG function in GOstats as a >>> basis for a similar function for GenePix data. I've found a basic >>> description of the phyper function in the context of GO: >>> >>> # How to implement phyper function for GO analysis >>> # phyper(x-1, m, n-m , k, lower.tail = FALSE) >>> # x: number of sample genes at GO node (can be vector with many >>> entries) >>> # m: number of genes at GO node (works with vector of same length >>> as x) >>> # n: number of unique genes at all GO nodes >>> # k: number of unique genes in test sample that have GO mappings >>> >>> Values for x and k seem straightforward, but I'm wondering about m and >>> n. The arrays we're working with seem to have fewer genes on them than >>> the total number cataloged in the organism's online databases. So >>> should m and n be based on the absolute total number of genes annotated, >>> or the number of genes annotated *on the chip*? >>> >>> Thanks in advance, >>> >>> Jake >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
Hi, there must be a growing number of people working with own custom chips and custom annotations for "exotic" half finished genomes, like me. Atleast last time I checked I didn't really find anything addressing this situtation in Annbuilder documentation or list archives. I would be very happy for any advice on how to build an annotation package from self made chipID-GOid (or any other annotation) lists. Maybe it's my lack of experience, but I really don't know where to start Cheers, Mikko At 11:17 11.5.2006 -0700, rgentlem wrote: >Jake wrote: > > I hadn't thought of going through the trouble of making a custom > > annotation package. Last time I tried making one was quite a while back > > and it was quite a pain. I'm sure things work more smoothly now, but by > > looking at GOHyperG I realized all I really need is phyper and the > > appropriate GO mappings, which I've gotten through TAIR and the use of > > GOANCESTOR. > > > > Well, if you don't go to the trouble, then you will almost surely be >getting the wrong answer, and to paraphrase one of the really clever >folks, there are easier and faster ways to do that :-) > > > I guess in the light of making a custom annotation package, GOHyperG > > isn't *technically* Affy-only, though with components like "go2Affy", > > it's obvious what type of data was in mind. > > > > Thanks for the comments and insight. > > > > --Jake > > > > > > On Thu, 2006-05-11 at 10:57 -0700, rgentlem wrote: > >> Hi, > >> > >> I am not sure why you think that you should do anything different for > >> GenePix? The array used is completely irrelevant to this sort of > >> hypergeometric testing and there should be no need to modify GOstats in > >> any way. > >> You simply make an annotation package for your array (using AnnBuilder > >> or any other tool of your choice) and then use it. > >> > >> best wishes > >> Robert > >> > >> Jake wrote: > >>> Hi all, > >>> > >>> I'm trying to use the "guts" of the GOHyperG function in GOstats as a > >>> basis for a similar function for GenePix data. I've found a basic > >>> description of the phyper function in the context of GO: > >>> > >>> # How to implement phyper function for GO analysis > >>> # phyper(x-1, m, n-m , k, lower.tail = FALSE) > >>> # x: number of sample genes at GO node (can be vector with many > >>> entries) > >>> # m: number of genes at GO node (works with vector of same length > >>> as x) > >>> # n: number of unique genes at all GO nodes > >>> # k: number of unique genes in test sample that have GO mappings > >>> > >>> Values for x and k seem straightforward, but I'm wondering about m and > >>> n. The arrays we're working with seem to have fewer genes on them than > >>> the total number cataloged in the organism's online databases. So > >>> should m and n be based on the absolute total number of genes annotated, > >>> or the number of genes annotated *on the chip*? > >>> > >>> Thanks in advance, > >>> > >>> Jake > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at stat.math.ethz.ch > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > > > > > >-- >Robert Gentleman, PhD >Program in Computational Biology >Division of Public Health Sciences >Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N, M2-B876 >PO Box 19024 >Seattle, Washington 98109-1024 >206-667-7700 >rgentlem at fhcrc.org > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Mikko Arvas VTT Industrial Biotechnology e-mail: mikko.arvas at vtt.fi tel: +358-(0)20-722 5827 mobile: +358-(0)44-381 0502 fax: +358-(0)20-722 7071 mail: Tietotie 2, Espoo P.O. Box 1000 FI-02044 VTT, Finland VTT's website: http://www.vtt.fi/ Protein production's website: http://www.vtt.fi/palvelut/cluster4/topic4_3/Proteiinin_tuotto.jsp?lan g=en Welcome to Yeast Systems Biology meeting ISSY25, http://issy25.vtt.fi/ organised by VTT.
ADD REPLY
0
Entering edit mode
Hi, Mikko Arvas wrote: > > Hi, > > there must be a growing number of people working with own custom chips and > custom annotations for "exotic" half finished genomes, like me. > Atleast last time I checked I didn't really find anything addressing > this situtation > in Annbuilder documentation or list archives. > I am not sure what you might expect to find. AnnBuilder works on any input data sources and the mechanism for combining different sources is entirely under your control. We do not have a special page for organisms with one data source and a different one for those with two etc, since it is not needed. Which of the vignettes for AnnBuilder have you read? If your organism is not well documented at one of the places that we know about and use for AnnBuilder then you would need to use the tools AnnBuilder provides to do that (and there has been quite some discussion of this). AFAIK anyone who has taken building such a package seriously has achieved their goal and we have provided assistance to some as needed and will continue to do so. > I would be very happy for any advice on how to build an annotation package > from self made chipID-GOid (or any other annotation) lists. Maybe it's > my lack > of experience, but I really don't know where to start The vignettes, and then Google on AnnBuilder led me to a lot of documentation, both by us and by others who have used AnnBuilder, and asking questions when you get stuck (please read the posting guide as it will help you to get answers) best wishes Robert > > Cheers, > Mikko > > At 11:17 11.5.2006 -0700, rgentlem wrote: > > >> Jake wrote: >> > I hadn't thought of going through the trouble of making a custom >> > annotation package. Last time I tried making one was quite a while >> back >> > and it was quite a pain. I'm sure things work more smoothly now, >> but by >> > looking at GOHyperG I realized all I really need is phyper and the >> > appropriate GO mappings, which I've gotten through TAIR and the use of >> > GOANCESTOR. >> > >> >> Well, if you don't go to the trouble, then you will almost surely be >> getting the wrong answer, and to paraphrase one of the really clever >> folks, there are easier and faster ways to do that :-) >> >> > I guess in the light of making a custom annotation package, GOHyperG >> > isn't *technically* Affy-only, though with components like "go2Affy", >> > it's obvious what type of data was in mind. >> > >> > Thanks for the comments and insight. >> > >> > --Jake >> > >> > >> > On Thu, 2006-05-11 at 10:57 -0700, rgentlem wrote: >> >> Hi, >> >> >> >> I am not sure why you think that you should do anything different >> for >> >> GenePix? The array used is completely irrelevant to this sort of >> >> hypergeometric testing and there should be no need to modify >> GOstats in >> >> any way. >> >> You simply make an annotation package for your array (using >> AnnBuilder >> >> or any other tool of your choice) and then use it. >> >> >> >> best wishes >> >> Robert >> >> >> >> Jake wrote: >> >>> Hi all, >> >>> >> >>> I'm trying to use the "guts" of the GOHyperG function in GOstats as a >> >>> basis for a similar function for GenePix data. I've found a basic >> >>> description of the phyper function in the context of GO: >> >>> >> >>> # How to implement phyper function for GO analysis >> >>> # phyper(x-1, m, n-m , k, lower.tail = FALSE) >> >>> # x: number of sample genes at GO node (can be vector with many >> >>> entries) >> >>> # m: number of genes at GO node (works with vector of same >> length >> >>> as x) >> >>> # n: number of unique genes at all GO nodes >> >>> # k: number of unique genes in test sample that have GO >> mappings >> >>> >> >>> Values for x and k seem straightforward, but I'm wondering about m >> and >> >>> n. The arrays we're working with seem to have fewer genes on them >> than >> >>> the total number cataloged in the organism's online databases. So >> >>> should m and n be based on the absolute total number of genes >> annotated, >> >>> or the number of genes annotated *on the chip*? >> >>> >> >>> Thanks in advance, >> >>> >> >>> Jake >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at stat.math.ethz.ch >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>> >> > >> > >> >> -- >> Robert Gentleman, PhD >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> PO Box 19024 >> Seattle, Washington 98109-1024 >> 206-667-7700 >> rgentlem at fhcrc.org >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > Mikko Arvas > > VTT > Industrial Biotechnology > > e-mail: mikko.arvas at vtt.fi > tel: +358-(0)20-722 5827 > mobile: +358-(0)44-381 0502 > fax: +358-(0)20-722 7071 > mail: Tietotie 2, Espoo > P.O. Box 1000 > FI-02044 VTT, Finland > VTT's website: > http://www.vtt.fi/ > Protein production's website: > http://www.vtt.fi/palvelut/cluster4/topic4_3/Proteiinin_tuotto.jsp?l ang=en > > Welcome to Yeast Systems Biology meeting ISSY25, http://issy25.vtt.fi/ > organised by VTT. > > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
On Thu, 2006-05-11 at 10:57 -0700, rgentlem wrote: > I am not sure why you think that you should do anything different for > GenePix? The array used is completely irrelevant to this sort of > hypergeometric testing and there should be no need to modify GOstats in > any way. I think it's the returned value of "go2Affy", containing (according to the documentation) the "Affymetrix identifiers associated with that node", which leads to confusion. I know it took me a little while to make sure there were no hidden dependencies. Francois
ADD REPLY
0
Entering edit mode
@jacob-michaelson-1079
Last seen 10.2 years ago
Thanks Thomas. This is the kind of thing I was looking for. Thanks to all for their suggestions and encouragement. I know building a custom annotation package is the ideal scenario, but for "niche" organisms (ie not human, rat, mouse) this isn't always realistic. Annotation is often gathered by hand and merged into tables, etc. and it's sometimes difficult to conform to the BioC annotation package standards. It's nice to see functions that will work with both standard BioC annotations as well as more generic tabular annotation. Thanks, Jake On Thu, 2006-05-11 at 11:34 -0700, Thomas Girke wrote: > > Jake, > > I believe I have posted this description on the web for my own version > > of GOhyperG which I called GOHyperGAll. I tried to implement this > > function for my work with organisms that don't have locusID /chipID-to-GO > > mappings. GOHyperGAll allows to work with your own gene-to-GO or chip_feature-to-GO mappings by > > providing your custom mapping file. Feel free to try this fuctions. It is > > available at: > > http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_BioCondManual .html#GOHyperGAll > > > > > > Thomas > > > > > > > > On Thu 05/11/06 11:21, Jake wrote: > > > Hi all, > > > > > > I'm trying to use the "guts" of the GOHyperG function in GOstats as a > > > basis for a similar function for GenePix data. I've found a basic > > > description of the phyper function in the context of GO: > > > > > > # How to implement phyper function for GO analysis > > > # phyper(x-1, m, n-m , k, lower.tail = FALSE) > > > # x: number of sample genes at GO node (can be vector with many > > > entries) > > > # m: number of genes at GO node (works with vector of same length > > > as x) > > > # n: number of unique genes at all GO nodes > > > # k: number of unique genes in test sample that have GO mappings > > > > > > Values for x and k seem straightforward, but I'm wondering about m and > > > n. The arrays we're working with seem to have fewer genes on them than > > > the total number cataloged in the organism's online databases. So > > > should m and n be based on the absolute total number of genes annotated, > > > or the number of genes annotated *on the chip*? > > > > > > Thanks in advance, > > > > > > Jake > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On 5/11/06 1:21 PM, "Jake" <jjmichael at="" comcast.net=""> wrote: > Hi all, > > I'm trying to use the "guts" of the GOHyperG function in GOstats as a > basis for a similar function for GenePix data. I've found a basic > description of the phyper function in the context of GO: > > # How to implement phyper function for GO analysis > # phyper(x-1, m, n-m , k, lower.tail = FALSE) > # x: number of sample genes at GO node (can be vector with many > entries) > # m: number of genes at GO node (works with vector of same length > as x) > # n: number of unique genes at all GO nodes > # k: number of unique genes in test sample that have GO mappings > > Values for x and k seem straightforward, but I'm wondering about m and > n. The arrays we're working with seem to have fewer genes on them than > the total number cataloged in the organism's online databases. So > should m and n be based on the absolute total number of genes annotated, > or the number of genes annotated *on the chip*? Jake, I think the typical definition is that these should be the respective numbers "on the chip", which guards against biases caused by array content. Sean
ADD COMMENT

Login before adding your answer.

Traffic: 451 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6