Entering edit mode
jsv@stat.ohio-state.edu
▴
30
@jsvstatohio-stateedu-1389
Last seen 10.2 years ago
Is there any general procedure for handling duplicate genes in Affy
arrays?
For example, for the hu6800 array which has 7129 probe sets,
there are 869 genes that are represented by more than one probe set,
with one gene (ACTB) being represented by 9 probe sets.
g.symbols=aafSymbol(X.gnames,"hu6800")
ug.symbols <- unlist(g.symbols)
length(ug.symbols) #6980 (7129-6980 = 149 with no symbols)
symbol.usage <- table(ug.symbols)
sum(symbol.usage>1) # 869
max(symbol.usage) #9
Ignoring this would seem to invalidate a number of multiple comparison
procedures. Is it reasonable to average probe set expression levels
for
the same gene? Are there any "pre-processing" routines that address
this
issue?
The flip side of this question is "Do probe sets with the same gene
symbol
really specify the same gene? Does it matter which annotational method
is
used to name genes?"