redundant probe sets in Affymetrix HG-U219
1
0
Entering edit mode
@andreas-heider-4538
Last seen 9.2 years ago
Dear Bioconductor mailing list, is ther a sensible way to deal with redundant probesets on Affymetrix chips like the HG-U219? For Example: Probe Set ID RefSeq Transcript ID 11715100_at NM_003534 11715101_s_at NM_003534 11715102_x_at NM_003534 Should I get the median/mean of te expression intensities? Or select the highest? And what would be the procedre in R to do it? I mean, how do I tell R to return the median of expression values if there are more than 1 probesets for only 1 refseq ID? I hope you can help me, Andreas [[alternative HTML version deleted]]
probe probe • 1.3k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States
Hi Andreas, On 4/14/2011 5:27 AM, Andreas Heider wrote: > Dear Bioconductor mailing list, > is ther a sensible way to deal with redundant probesets on Affymetrix chips > like the HG-U219? Define sensible. There are some things you can do, but each comes with its own assumptions. There is the findLargest() function in genefilter that will select the probeset with the largest value of a test statistic. This assumes (among other things) that all of the redundant probesets measure the same thing. But note that the _x_ and _s_ in the probesets you list below indicate that when Affy designed that chip the probesets cross-hybridized with unrelated or related transcripts, respectively. You can use the MBNI re-mapped cdfs, which take current versions of the genome and filter out probes that don't uniquely hybridize to the genome, and then map probes to probesets based on e.g., Entrez Gene IDs. This eliminates the problem of multiple probesets, but you then have to contend with probesets that vary from ~3 probes up to 100 or more. As you can imagine, the probesets with 3 probes will have much larger standard errors than those with say 100 probes. This makes downstream analyses more difficult unless you choose to simply ignore that fact. You could ignore the fact that you have multiple probesets that may or may not be measuring the same thing, and assume independence (which, of course isn't even true when you have no redundant probesets). No real satisfying alternatives, IMO, so you have to pick your poison. Best, Jim > For Example: > Probe Set ID RefSeq Transcript ID 11715100_at NM_003534 11715101_s_at > NM_003534 11715102_x_at NM_003534 > Should I get the median/mean of te expression intensities? Or select the > highest? And what would be the procedre in R to do it? I mean, how do I tell > R to return the median of expression values if there are more than 1 > probesets for only 1 refseq ID? > > I hope you can help me, Andreas > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT

Login before adding your answer.

Traffic: 895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6