how deal with multiplicate affy probes?

0

Entering edit mode

Johnnidis, Jonathan ▴ 50

@johnnidis-jonathan-689

Last seen 9.6 years ago

Dear List, I remain unsure of how to deal with multiplicate Affymetrix ProbeSets (in my analysis I need to assign a single fold-change status to each _gene_, not merely each ProbeSet). Some have suggested that, given two or more ProbeSets for a given gene (e.g. 97569_r_at and 97658_r_at on the MgU74Av2 chip for Insulin), if at least one ProbeSet shows significant foldchange in an experiment (and, say the other(s) show no fold-change), it is fair to regard that locus as diferentially regulated and ignore the other ProbeSet(s). While this is an attractive solution (one maximizes the number of potentially diferentially regulated loci), I remain unconvinced that it is scientifically acceptable: doesn't that bias the results of the experiment based on the experimental variable you are trying to test? Wouldn't that be quite dangerous? If there are any thoughts to the contrary, I'd be very interested to hear them. Perhaps there are other solutions to dealing with multiplicate ProbeSets? For example, one might use a criterion which is less-biased to select the best ProbeSet within a multiplicate group of ProbeSets. Those criteria could include the ProbeSet that has the highest signal(i.e. expression) value, or the best 3'-5' ratio, or the highest genomic alignment fidelity? So basically the question is how best to 'summarize' a group of ProbeSets (as opposed to previous and ongoing debates on how to summarize individual probes within a ProbeSet). with thanks for any further discussion, Jonathan -----Original Message----- From: Michael Seewald [mailto:mseewald@gmx.de] Sent: Thursday, March 25, 2004 3:48 PM To: Johnnidis, Jonathan Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] how deal with multiplicate affy probes? As a rule of thumb: If statistics based on a given probe set data tells you, that a transcript is significantly deregulated, you can usually trust it and discard every other probe set for that transcript! The thing to look at is the probe design itself: Download the probe set from NetAffx and blast the single probes agains the genome (e.g. in ensembl). You will be surprised, how many probes match up with introns or genomic regions that do not correspond to any cDNA! 2 examples: There are 4 probe sets for human Wnt6 (HG-U133AB), 2 match with the sense (!) strand and have to be discarded. Out of >12 probe sets for human CD44, only 4 have probes that are completely matching the transcripts. >8 can be discarded. Best, Michael PS: www.ensembl.org is always a good place to check probe sets. Their mapping of probe sets does not show the location of single probes, though... PPS: In affymetrix.com you can check out the "Details" view for a probe set. There you can discover, that 2 probe sets of Wnt 6 map to the (-) strand, which is bad. It doesn't tell you, however, that many probe sets match intron regions. On Sat, 20 Mar 2004, Johnnidis, Jonathan wrote: > I'm a new list member and am not quite sure if this question is appropriate > for the list, but will shoot anyway. I'm analyzing a bunch of data from Affy > MgU74Av2 chips and am a bit perplexed as to how to treat conflicting > expression data from multiplicate probe sets (that is a gene that has >1 > probe set designed against it (for example, 97569_r_at and 97658_r_at are > both probes for the Insulin gene).

Alignment mgu74av2 probe affy ASSIGN Alignment mgu74av2 probe affy ASSIGN • 3.5k views

ADD COMMENT • link updated 20.0 years ago by rgentleman ★ 5.5k • written 20.0 years ago by Johnnidis, Jonathan ▴ 50

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 9.0 years ago

United States

On Wed, Apr 28, 2004 at 12:22:14PM -0400, Johnnidis, Jonathan wrote: > Dear List, > I remain unsure of how to deal with multiplicate Affymetrix ProbeSets (in my analysis I need to assign a single fold-change status to each _gene_, not merely each ProbeSet). > Some have suggested that, given two or more ProbeSets for a given gene (e.g. 97569_r_at and 97658_r_at on the MgU74Av2 chip for Insulin), if at least one ProbeSet shows significant foldchange in an experiment (and, say the other(s) show no fold-change), it is fair to regard that locus as diferentially regulated and ignore the other ProbeSet(s). While this is an attractive solution (one maximizes the number of potentially diferentially regulated loci), I remain unconvinced that it is scientifically acceptable: doesn't that bias the results of the experiment based on the experimental variable you are trying to test? Wouldn't that be quite dangerous? > If there are any thoughts to the contrary, I'd be very interested to hear them. > Perhaps there are other solutions to dealing with multiplicate ProbeSets? For example, one might use a criterion which is less-biased to select the best ProbeSet within a multiplicate group of ProbeSets. Those criteria could include the ProbeSet that has the highest signal(i.e. expression) value, or the best 3'-5' ratio, or the highest genomic alignment fidelity? > > So basically the question is how best to 'summarize' a group of ProbeSets (as opposed to previous and ongoing debates on how to summarize individual probes within a ProbeSet). > Well one thing that you might want to do (when things are very different between probe sets) is to have a look at sensitivity and specificity. One place to start would be to go back to the Unigene numbers (so that you can see what sequences they are based on). [BTW whose mappings from Affy probe set to LocusLink (or other identifier) are you using, and how recently have you updated them?] Then you can, for each 25 mer compare its sequence to the best known (this might be slightly problematic, but in principle it is possible) version of the gene sequence and see if the 25mers are actually there. Next, you can see if the 25mer is found in other genes (in which case you may be seeing some problems due that other gene binding). We are working on automating this, but it isn't that simple to have something that works for all organisms and chips.... Robert > with thanks for any further discussion, > Jonathan > > > -----Original Message----- > From: Michael Seewald [mailto:mseewald@gmx.de] > Sent: Thursday, March 25, 2004 3:48 PM > To: Johnnidis, Jonathan > Cc: bioconductor@stat.math.ethz.ch > Subject: Re: [BioC] how deal with multiplicate affy probes? > > As a rule of thumb: If statistics based on a given probe set data tells you, > that a transcript is significantly deregulated, you can usually trust it and > discard every other probe set for that transcript! > > The thing to look at is the probe design itself: Download the probe set from > NetAffx and blast the single probes agains the genome (e.g. in ensembl). You > will be surprised, how many probes match up with introns or genomic regions > that do not correspond to any cDNA! > > 2 examples: There are 4 probe sets for human Wnt6 (HG-U133AB), 2 match with > the sense (!) strand and have to be discarded. Out of >12 probe sets for human > CD44, only 4 have probes that are completely matching the transcripts. >8 can > be discarded. > > Best, > Michael > > PS: www.ensembl.org is always a good place to check probe sets. Their mapping > of probe sets does not show the location of single probes, though... > > PPS: In affymetrix.com you can check out the "Details" view for a probe set. > There you can discover, that 2 probe sets of Wnt 6 map to the (-) strand, > which is bad. It doesn't tell you, however, that many probe sets match intron > regions. > > > On Sat, 20 Mar 2004, Johnnidis, Jonathan wrote: > > I'm a new list member and am not quite sure if this question is appropriate > > for the list, but will shoot anyway. I'm analyzing a bunch of data from Affy > > MgU74Av2 chips and am a bit perplexed as to how to treat conflicting > > expression data from multiplicate probe sets (that is a gene that has >1 > > probe set designed against it (for example, 97569_r_at and 97658_r_at are > > both probes for the Insulin gene). > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- +--------------------------------------------------------------------- ------+ | Robert Gentleman phone : (617) 632-5250 | | Associate Professor fax: (617) 632-2444 | | Department of Biostatistics office: M1B20 | | Harvard School of Public Health email: rgentlem@jimmy.harvard.edu | +--------------------------------------------------------------------- ------+

ADD COMMENT • link 20.0 years ago rgentleman ★ 5.5k

Login before adding your answer.