Question

CMA package: variable selection with limma

0

Entering edit mode

Renaud Gaujoux ▴ 170

@renaud-gaujoux-3125

Last seen 9.6 years ago

Hi, I used the CMA package (v1.0.0) to evaluate the performance of a kNN classification after a feature selection step using limma, including the whole procedure in the cross-validation. Before finding the CMA package I had implemented myself the same procedure (cross-validation of feature selection) and I couldn't get the same result as with CMA. Looking to the CMA code I found that the feature were selected by their rank based on the statistic computed by the function classifyTestsF from limma (see the CMA code below). My understanding is that this function computes a F-statistic to test the joint significance of all the coefficients in the regression model (in this case 2: the intercept and the contrast of interest, i.e. the difference between the two groups of samples). In my procedure I had ranked the genes based on the p-values of the single coefficient of interest (not the intercept), as suggested by the limma tutorials. Indeed the selected genes in the first step are different. Is there a reason for also looking at the joint significance (intercept + coeff)? I guess the fact intercept is significantly different from zero gives us the information that the gene is actually expressed at a certain level, making us more confident that what is observed is not noise? In standard regression problem (more samples than variables) I would indeed test first the significance of the model as a whole (F-test), but in the context of microarray I had never seen this practice before. Am I missing something? Which practice is recommended in this context? Thanks ------------------ limmatest <- function(X, y, learnind, ...){ require(limma, quietly=TRUE) yprime <- y[learnind] Xprime <- X[learnind,,drop=FALSE] yprime <- as.factor(yprime) des <- model.matrix(~yprime) limo <- lmFit(t(Xprime), des) outp <- eBayes(limo, ...) stat <- classifyTestsF(object = outp, fstat.only = TRUE) attr(stat, "df1") <- NULL attr(stat, "df2") <- NULL new("varseloutput", varsel=stat) }

Microarray Classification Regression limma CMA Microarray Classification Regression limma • 1.5k views

ADD COMMENT • link 15.2 years ago Renaud Gaujoux ▴ 170