GOstat with replicates

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.1 years ago

United States

There are times when it makes sense to have genes duplicated in both the universe and the set of interest - e.g. if the geneIds come from BLAST hits of unigenes of an unsequenced species against the genes of a sequenced species. I fiddled a bit with GOstat, but was not able to see how to change the code to allow this. (I can see where duplication was removed in the gene set but not in the universe.) If someone could tell me where to look in the code, I would be happy to contribute back the modified code allowing duplication. Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

• 780 views

ADD COMMENT • link updated 16.7 years ago by Seth Falcon ★ 7.4k • written 16.7 years ago by Naomi Altman ★ 6.0k

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 9.7 years ago

Hi Naomi, Naomi Altman <naomi at="" stat.psu.edu=""> writes: > There are times when it makes sense to have genes duplicated in both > the universe and the set of interest - e.g. if the geneIds come from > BLAST hits of unigenes of an unsequenced species against the genes of > a sequenced species. > > I fiddled a bit with GOstat, but was not able to see how to change > the code to allow this. (I can see where duplication was removed in > the gene set but not in the universe.) > If someone could tell me where to look in the code, I would be happy > to contribute back the modified code allowing duplication. I think you will want to look in the Category package where a fair amount of the infrastructure is located for the GO-based hyperGTest. In particular, you may want to look at .makeValidParams in HyperGParams-accessors.R That said, I find the duplicated gene scenario hard to understand and would worry that the method as implemented won't give useful results. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/

ADD COMMENT • link 16.7 years ago Seth Falcon ★ 7.4k

0

Entering edit mode

Hi, Amplifying a bit on this (and I am not so sure I yet understand Naomi's use case), it seems likely that the issue here is not that one needs duplicates in either the Universe or the gene set, but rather, that in this case the naming scheme is not sufficient and one would like to change it (so that different transcripts had some opportunity to be identified). This is possible, but it does reveal one of the weaknesses of our current approach. We will need to move our GO annotation to a more general mapping scheme (one based on the protein, not the gene), as it is likely that different splice variants have different functions (and hence different GO categorizations). It is still important to consider whether those different splice variants (or other differences) can be detected by the array (in the case of microarray analysis), and if not then it will be important to map to the right level of resolution. My guess is that we will be moving slowly in that direction over the next year or so, and folks that have specific needs should let us know what their use cases are. best wishes Robert Seth Falcon wrote: > Hi Naomi, > > Naomi Altman <naomi at="" stat.psu.edu=""> writes: > >> There are times when it makes sense to have genes duplicated in both >> the universe and the set of interest - e.g. if the geneIds come from >> BLAST hits of unigenes of an unsequenced species against the genes of >> a sequenced species. >> >> I fiddled a bit with GOstat, but was not able to see how to change >> the code to allow this. (I can see where duplication was removed in >> the gene set but not in the universe.) >> If someone could tell me where to look in the code, I would be happy >> to contribute back the modified code allowing duplication. > > I think you will want to look in the Category package where a fair > amount of the infrastructure is located for the GO-based hyperGTest. > > In particular, you may want to look at .makeValidParams in > HyperGParams-accessors.R > > That said, I find the duplicated gene scenario hard to understand and > would worry that the method as implemented won't give useful results. > > > + seth > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 16.7 years ago rgentleman ★ 5.5k

0

Entering edit mode

I really do need to be able to have duplicate names in the gene set and the gene universe, with the restriction that for any gene, the number of copies in the gene universe must be greater than or equal the number in the gene set. I will look at Category and see what I can do. Thanks Robert and Seth --Naomi At 07:25 PM 9/12/2007, Robert Gentleman wrote: >Hi, > Amplifying a bit on this (and I am not so sure I yet understand >Naomi's use case), it seems likely that the issue here is not that one >needs duplicates in either the Universe or the gene set, but rather, >that in this case the naming scheme is not sufficient and one would like >to change it (so that different transcripts had some opportunity to be >identified). > This is possible, but it does reveal one of the weaknesses of our >current approach. We will need to move our GO annotation to a more >general mapping scheme (one based on the protein, not the gene), as it >is likely that different splice variants have different functions (and >hence different GO categorizations). It is still important to consider >whether those different splice variants (or other differences) can be >detected by the array (in the case of microarray analysis), and if not >then it will be important to map to the right level of resolution. > My guess is that we will be moving slowly in that direction over the >next year or so, and folks that have specific needs should let us know >what their use cases are. > > best wishes > Robert > > >Seth Falcon wrote: > > Hi Naomi, > > > > Naomi Altman <naomi at="" stat.psu.edu=""> writes: > > > >> There are times when it makes sense to have genes duplicated in both > >> the universe and the set of interest - e.g. if the geneIds come from > >> BLAST hits of unigenes of an unsequenced species against the genes of > >> a sequenced species. > >> > >> I fiddled a bit with GOstat, but was not able to see how to change > >> the code to allow this. (I can see where duplication was removed in > >> the gene set but not in the universe.) > >> If someone could tell me where to look in the code, I would be happy > >> to contribute back the modified code allowing duplication. > > > > I think you will want to look in the Category package where a fair > > amount of the infrastructure is located for the GO-based hyperGTest. > > > > In particular, you may want to look at .makeValidParams in > > HyperGParams-accessors.R > > > > That said, I find the duplicated gene scenario hard to understand and > > would worry that the method as implemented won't give useful results. > > > > > > + seth > > > >-- >Robert Gentleman, PhD >Program in Computational Biology >Division of Public Health Sciences >Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N, M2-B876 >PO Box 19024 >Seattle, Washington 98109-1024 >206-667-7700 >rgentlem at fhcrc.org > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 16.7 years ago Naomi Altman ★ 6.0k

Login before adding your answer.