Annotation for Nonspecificity of Affymetrix Probes?

0

Entering edit mode

Jeff Sorenson ▴ 70

@jeff-sorenson-60

Last seen 9.6 years ago

I would like to thank all of the contributors to the bioconductor project for putting their tools into the public domain. I'm embarking on a project using Affymetrix U133A/B chips and have been in the process of setting up a database of probe/sequence information and other annotation information (mysql), and learning to use the various R packages. Looking over the probe sequences and putative gene sequences that affymetrix provides on their website, it is clear that many of the probes are nonspecific - e.g, they perfectly match portions of gene sequences that are differenct than the one they were derived from. In some cases, it appears that affymetrix has simply generated multiple probe sets for transcriptional variants of the same gene. In other cases, it appears that some probes are simply nonspecific. Affymetrix does warn us that some probe sets are less specific than others, and this is indeed incorporated into their probe set nomenclature, but I have found no downloadable file that lists the specifics. My computer should be done testing the half million probes for perfect matches against the ~45000 sequences some time later this week. After that, I will probably test the mismatch probes. My question to this community is this: is there already an annotation file or package that takes this consideration into account? If so, can this information be readily adapted into the R packages for probe level analysis and gene expression estimation? In a related question, can anyone point me to an algorithm for accurately estimating the hybridization probability of an arbitrary probe against an arbitrary mRNA. Would it correlate closely to the BLAST score? Has anyone done theoretical studies on the nature of the mismatch probes and their usefulness in measuring "nonspecific" binding? It would be nice to be able to predict how strongly a particular mRNA should bind to each of the probes on a chip (both PM and MM). If this is feasable, has anyone done in computo chip hybridization experiments to see how closely the estimated expression levels are to the actual input? Thanks, Jeff Sorenson

probe PROcess probe PROcess • 1.0k views

ADD COMMENT • link updated 21.7 years ago by rgentleman ★ 5.5k • written 21.7 years ago by Jeff Sorenson ▴ 70

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 8.9 years ago

United States

Hi, We are doing some research that is probably not completely dissimilar. I don't know of research regarding the affinity of particular probes. However, G and C bind more tightly than A and T so the CG content is undoubtedly important. The former have 3 H-bonds, the latter only 2. Probably more important are the cross-hybridization issues. I do have some ideas on how you deal with them. Stated quite simply, one completely ignores the mappings that Affymetrix has provided. The only ones that are appropriate are the ones that you (and lots of others) have developed by mapping the 25mers to the transcriptome (I'm not at all sure that there is a reliable estimate of the transcriptome, but that's another issue). Note also, that in this mapping there is no such thing as PM or MM, there are just 25mers and they map into particular genes (or not). This allows you to do some interesting things. Say, you have a favorite gene but Affy has not indicated that it is on the chip. All you need is its sequence and if you can find a handful of 25mers that match then you can estimate its abundance. You might want to look at "Gene Expression Analysis with Universal n-mer Arrays", by van Dam and Quake in Genome Research. Going back one paragraph, these mappings are many to one (in both directions). Some genes contain multiple probe sets, and some probe sets are found in multiple genes. (If this sounds a lot like SAGE data, it should. You can think of SAGE as digital and Affy as analog and it sort of works). Things are a bit simpler with SAGE -- I don't want to say much about Affy because we are in the midst of figuring it out. Regards, Robert ps I encourage you to look at all resources used to construct our data packages. There are a lot of people doing similar things. On Mon, Aug 05, 2002 at 11:49:45AM -0500, Jeff Sorenson wrote: > I would like to thank all of the contributors to the bioconductor project > for putting their tools into the public domain. I'm embarking on a project > using Affymetrix U133A/B chips and have been in the process of setting up a > database of probe/sequence information and other annotation information > (mysql), and learning to use the various R packages. Looking over the probe > sequences and putative gene sequences that affymetrix provides on their > website, it is clear that many of the probes are nonspecific - e.g, they > perfectly match portions of gene sequences that are differenct than the one > they were derived from. In some cases, it appears that affymetrix has > simply generated multiple probe sets for transcriptional variants of the > same gene. In other cases, it appears that some probes are simply > nonspecific. Affymetrix does warn us that some probe sets are less specific > than others, and this is indeed incorporated into their probe set > nomenclature, but I have found no downloadable file that lists the > specifics. My computer should be done testing the half million probes for > perfect matches against the ~45000 sequences some time later this week. > After that, I will probably test the mismatch probes. > > My question to this community is this: is there already an annotation file > or package that takes this consideration into account? If so, can this > information be readily adapted into the R packages for probe level analysis > and gene expression estimation? > > In a related question, can anyone point me to an algorithm for accurately > estimating the hybridization probability of an arbitrary probe against an > arbitrary mRNA. Would it correlate closely to the BLAST score? Has anyone > done theoretical studies on the nature of the mismatch probes and their > usefulness in measuring "nonspecific" binding? It would be nice to be able > to predict how strongly a particular mRNA should bind to each of the probes > on a chip (both PM and MM). If this is feasable, has anyone done in computo > chip hybridization experiments to see how closely the estimated expression > levels are to the actual input? > > > Thanks, > > Jeff Sorenson > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- +--------------------------------------------------------------------- ------+ | Robert Gentleman phone : (617) 632-5250 | | Associate Professor fax: (617) 632-2444 | | Department of Biostatistics office: M1B20 | Harvard School of Public Health email: rgentlem@jimmy.dfci.harvard.edu | +--------------------------------------------------------------------- ------+

ADD COMMENT • link 21.7 years ago rgentleman ★ 5.5k

Login before adding your answer.