gene classification problem
1
0
Entering edit mode
@kimpel-mark-w-727
Last seen 10.2 years ago
My apologies to those with far more statistical expertise than I, but I have what may (or may not) be a straightforward question. After performing SAM analysis of an experiment comparing two strains of rats, I have a list of about 200 significant affy rat probesets (genes) that I have mapped to their chromosomal locations. Some of the genes appear to cluster into discrete physical chromosomal regions, which I suspect is related to underlying genetic differences between the two inbred strains. Based on their chromosomal location, I have clustered these significant genes into discrete bins. Something thing to remember when solving this problem is that the distribution along chromosomes of all affy rat probesets is not uniform. Thus my fear that some of the granularity of the chromosomal locations of significant genes could not only be due to chance, but to granularity of the underlying distribution. At this point I would like to test: 1. if the distribution of sig. genes amongst the bins is statistically different from that of the population of all affy genes from which they were drawn. 2. if the above distribution of sig genes is, as I suspect different, which of the bins are responsible for this significant difference. It would be great to assign significance p values to the significance of each bin. I believe this is similar to the problem faced in analyzing the distribution of genes in GO categories but I am not familiar with the proper solution. Any sample code would be greatly appreciated. For an example, assume that I have two matrices, each of two columns with genes represented by rows. The first column is the probeset ID, the second column the "bin" that it falls into. One matrix is of all rat affy genes, the second on is only the significant genes. Thanks, Mark W. Kimpel MD Department of Psychiatry Indiana University School of Medicine Biotechnology, Research, & Training Center 1345 W. 16th Street Indianapolis, IN 46202 ?
GO affy ASSIGN GO affy ASSIGN • 1.1k views
ADD COMMENT
0
Entering edit mode
Charles Berry ▴ 290
@charles-berry-5754
Last seen 5.7 years ago
United States
Mark, In Borevitz, J.O., Liang, D., Plouffe, D., Chang, H., Zhu, T., Weigel, D., Berry, C.C., Winzeler, E., and Chory. J. (2003) Large Scale Identification of Single Feature Polymorphisms in Complex Genomes Genome Research 13,513-523. we used individual probesets on Affy arrays to search for polymorphisms among inbred strains (hyb'ing genomic DNA rather than RNA). A collection of the tools we used to identify probesets and/or regions that differentially bind according to strain may be found at: http://naturalvariation.org/sfp and the 'Methods' link will connect you to some newer work and scripts. ---------- Although you seem to have somewhat different objectives, it looks like similar statistical tools would apply to your situation. Chuck On Thu, 9 Dec 2004, Kimpel, Mark W wrote: > My apologies to those with far more statistical expertise than I, but I have what may (or may not) be a straightforward question. > > After performing SAM analysis of an experiment comparing two strains of > rats, I have a list of about 200 significant affy rat probesets (genes) > that I have mapped to their chromosomal locations. Some of the genes > appear to cluster into discrete physical chromosomal regions, which I > suspect is related to underlying genetic differences between the two > inbred strains. Based on their chromosomal location, I have clustered > these significant genes into discrete bins. Something thing to remember > when solving this problem is that the distribution along chromosomes of > all affy rat probesets is not uniform. Thus my fear that some of the > granularity of the chromosomal locations of significant genes could not > only be due to chance, but to granularity of the underlying > distribution. > > At this point I would like to test: > > 1. if the distribution of sig. genes amongst the bins is statistically different from that of the population of all affy genes from which they were drawn. > 2. if the above distribution of sig genes is, as I suspect different, which of the bins are responsible for this significant difference. It would be great to assign significance p values to the significance of each bin. > > I believe this is similar to the problem faced in analyzing the distribution of genes in GO categories but I am not familiar with the proper solution. > > Any sample code would be greatly appreciated. For an example, assume that I have two matrices, each of two columns with genes represented by rows. The first column is the probeset ID, the second column the "bin" that it falls into. One matrix is of all rat affy genes, the second on is only the significant genes. > > Thanks, > > Mark W. Kimpel MD > > Department of Psychiatry > Indiana University School of Medicine > Biotechnology, Research, & Training Center > 1345 W. 16th Street > Indianapolis, IN 46202 > > > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry@tajo.ucsd.edu UC San Diego http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717
ADD COMMENT
0
Entering edit mode
Oops! minor correction below On Thu, 9 Dec 2004, Charles C. Berry wrote: > > > Mark, > > In > > Borevitz, J.O., Liang, D., Plouffe, D., Chang, H., Zhu, T., Weigel, D., > Berry, C.C., Winzeler, E., and Chory. J. (2003) Large Scale Identification of > Single Feature Polymorphisms in Complex Genomes Genome Research 13,513-523. > > we used individual probesets on Affy arrays to search for polymorphisms among > inbred strains (hyb'ing genomic DNA rather than RNA). > > A collection of the tools we used to identify probesets and/or regions that ................................................^^^^^^^^^... I meant individual probes, not probesets. > differentially bind according to strain may be found at: > > http://naturalvariation.org/sfp > > and the 'Methods' link will connect you to some newer work and scripts. > > ---------- > > Although you seem to have somewhat different objectives, it looks like > similar statistical tools would apply to your situation. > > Chuck > > > On Thu, 9 Dec 2004, Kimpel, Mark W wrote: > >> My apologies to those with far more statistical expertise than I, but I >> have what may (or may not) be a straightforward question. >> >> After performing SAM analysis of an experiment comparing two strains of >> rats, I have a list of about 200 significant affy rat probesets (genes) >> that I have mapped to their chromosomal locations. Some of the genes >> appear to cluster into discrete physical chromosomal regions, which I >> suspect is related to underlying genetic differences between the two >> inbred strains. Based on their chromosomal location, I have clustered >> these significant genes into discrete bins. Something thing to remember >> when solving this problem is that the distribution along chromosomes of >> all affy rat probesets is not uniform. Thus my fear that some of the >> granularity of the chromosomal locations of significant genes could not >> only be due to chance, but to granularity of the underlying distribution. >> >> At this point I would like to test: >> >> 1. if the distribution of sig. genes amongst the bins is >> statistically different from that of the population of all affy >> genes from which they were drawn. >> 2. if the above distribution of sig genes is, as I suspect >> different, which of the bins are responsible for this significant >> difference. It would be great to assign significance p values to >> the significance of each bin. >> >> I believe this is similar to the problem faced in analyzing the >> distribution of genes in GO categories but I am not familiar with the >> proper solution. >> >> Any sample code would be greatly appreciated. For an example, assume that >> I have two matrices, each of two columns with genes represented by rows. >> The first column is the probeset ID, the second column the "bin" that it >> falls into. One matrix is of all rat affy genes, the second on is only >> the significant genes. >> >> Thanks, >> >> Mark W. Kimpel MD >> >> Department of Psychiatry >> Indiana University School of Medicine >> Biotechnology, Research, & Training Center >> 1345 W. 16th Street >> Indianapolis, IN 46202 >> >> >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive Medicine > E mailto:cberry@tajo.ucsd.edu UC San Diego > http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717 > > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry@tajo.ucsd.edu UC San Diego http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717
ADD REPLY

Login before adding your answer.

Traffic: 601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6