Jim as already pointed out that you have some incorrect perceptions about what limma does by default.
If you need to keep one probe for each gene symbol after a limma lmFit, and you want to choose the probe with highest average expression, it is easy to do like this. I will assume that your linear model fit object is called 'fit', and your annotation includes a column called "Symbol" containing the gene symbol.
o <- order(fit$Amean, decreasing=TRUE) dup <- duplicated(fit$genes$Symbol[o]) fit.unique <- fit[o,][!dup,]
Now your fit object fit.unique has only one row for each symbol.
This sort of filtering has been done in many papers when it is wished to match symbols across platforms, or to do gene set testing.
------------------ original message ----------------
[BioC] Use probesets with highest baseline expression for differntial
gene expression in LIMMA
Ekta Jain Ekta_Jain at jubilantbiosys.com
Thu Feb 23 04:06:09 CET 2012
I am using an affymetrix chip data. I need to analyse my dataset for
differential gene expression (LIMMA). Each gene can be referenced by
multiple probesets and while performing LIMMA the expression values of
these multiple probesets gets averaged and this averaged value is assigned
to that gene. I need to be able to simply select the probeset with the
highest expression value to represent a gene.
LIMMA by default averages the probeset values.
I am not sure if i need to modify any default settings in LIMMA or use