Question

'Empty model' error during Monte Carlo cross-validation of SVM classification with the CMA package

0

Entering edit mode

Santosh Patnaik ▴ 30

@santosh-patnaik-4477

Last seen 9.6 years ago

Hi, I am using the CMA package to perform internal, Monte Carlo cross-validation (MCCV) on SVM classifiers in a microarray dataset. The dataset has numerical, non-NA values for ~400 genes for 45 samples belonging to one of two classes (labels 0 or 1; at about 1:1 proportion). I am trying a 1000-iteration MCCV with 15 variables selected for SVM using the limma statistics for differential gene expression analysis. During MCCV, a fraction of the 45-sample set is used for training an SVM classifier, which is then used to test the remaining fraction, and I am trying different values for the training-set fraction. CMA also performs inner-loop validations (3-fold cross-validation within the training sets, by default) to fine-tune the classifiers to be used for cross-validation against the test-sets. Sometimes, for low training-set sizes, CMA shows an error in the console and halts the rest of the code: ------------------------- [snip] tuning iteration 575 tuning iteration 576 tuning iteration 577 Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty! ------------------------- This happens even when I use a test other than limma's for variable selection, or use, say, two instead of 15 variables for classifier generation. The R code I use should ensure that the training-sets always have members of both classes. The code I use and the traceback() return are further below. I would appreciate any insight on this. Thank you. Santosh Patnaik ---- R code; until g=8 in the 'for(g in 0:10)' loop everything goes OK # exp is the expression table, a matrix; 'classes' is list of known classes exp <- as.matrix(read.table(file='hy3ExpHsaMir.txt', sep='\t', row.names=1, header=T, check.names=F)) #best is to use 0 and 1 as class labels (instead of 'p', 'g', etc.) with 1 for 'positive' items (positive for recurrence, or for disease, etc.) classes <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) yesPredVal = 1 # class label for 'positive' items in 'classes' library(CMA) library(WilcoxCV) myNumFun <- function(x, y){round(y(as.numeric(x), na.rm=T), 4)} set.seed(631) out = '' out2 = '\nEffect of varying the training-set size:\nTraining-set size\tSuccessful iterations\tMean acc.\tSD acc.\tMean sens.\tSD sens.\tMean spec.\tSD spec.\tMean PPV\tSD PPV\tMean NPV\tSD NPV\tTotal genes in the classifiers\n' niter = 1000 diffTest = 'limma' diffGeneNum = 15 svmCost <- c(0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50) for(g in 0:10){ ntest = 3+g*3 # test-set size result <- matrix(nrow=0, ncol=7) colnames(result) <- c('trainSetSize', 'iteration', 'acc', 'sens', 'spec', 'ppv', 'npv') diffGenes <- numeric() # generate training and test sets lsets <- GenerateLearningsets(n=ncol(exp), y=classes, method=c('MCCV'), niter=niter, ntrain=ncol(exp)-ntest) # actual prediction work svm <- classification(t(exp), factor(classes), learningsets=lsets, genesellist= list(method=diffTest), classifier=svmCMA, nbgene= diffGeneNum, tuninglist=list(grids=list(cost=svmCost)), probability=TRUE) svm <- join(svm) # genes in classifiers svmGenes <- GeneSelection(t(exp), classes, learningsets=lsets, method=diffTest) actualIters=0 for(h in 1:niter){ m <- ntest*(h-1) # valid SVM classification requires min. 2 classes if(1 < length(unique(classes[-lsets at learnmatrix[h,]]))){ actualIters = actualIters+1 tp <- tn <- fp <- fn <- 0 for(i in 1:ntest){ pred <- svm at yhat[m+i] known <- svm at y[m+i] if(pred == known){ if(pred == yesPredVal){tp <- tp+1} else{tn <- tn+1} }else{ if(pred == yesPredVal){fp <- fp+1} else{fn <- fn+1} } } result <- rbind(result, c(ncol(exp)-ntest, h, (tp+tn)/(tp+tn+fp+fn), tp/(tp+fn), tn/(tn+fp), tp/(tp+fp), tn/(tn+fn))) diffGenes <- c(diffGenes, toplist(svmGenes, k=diffGeneNum, iter=h, show=F)$index) } # end if valid SVM } # end for h # output accuracy, etc. out = paste(out, 'SVM MCCV using ', niter, ' attempted iterations and ', actualIters, ' successful iterations, with ', ncol(exp)-ntest, ' of ', ncol(exp), ' total samples used for training:\nThe means (ranges; SDs) of prediction accuracy, sensitivity, specificity, PPV and NPV in fractions are ', myNumFun(result[, 'acc'],mean), ' (', myNumFun(result[, 'acc'], min), '-', myNumFun(result[, 'acc'], max), '; ', myNumFun(result[, 'acc'], sd), '), ', myNumFun(result[, 'sens'], mean), ' (', myNumFun(result[, 'sens'], min), '-', myNumFun(result[, 'sens'], max), '; ', myNumFun(result[, 'sens'], sd), '), ', myNumFun(result[, 'spec'], mean), ' (', myNumFun(result[, 'spec'], min), '-', myNumFun(result[, 'spec'], max), '; ', myNumFun(result[, 'spec'], sd), '), ', myNumFun(result[, 'ppv'], mean), ' (', myNumFun(result[, 'ppv'], min), '-', myNumFun(result[, 'ppv'], max), '; ', myNumFun(result[, 'ppv'], sd), '), and ', myNumFun(result[, 'npv'], mean), ' (', myNumFun(result[, 'npv'], min), '-', myNumFun(result[, 'npv'], max), '; ', myNumFun(result[, 'npv'], sd), '), respectively.\n', sep='') # output classifier genes diffGenesUnq <- unique(diffGenes) out = paste(out, 'A total of ', length(diffGenesUnq), ' genes occur in the ', actualIters, ' classifiers, with occurrence frequencies in fractions:\n', sep='') for(i in 1:length(diffGenesUnq)){ out = paste(out, rownames(exp)[diffGenesUnq[i]], '\t', round(sum(diffGenes == diffGenesUnq[i])/actualIters, 3), '\n', sep='') } # output split-size effect out2 = paste(out2, ncol(exp)-ntest, '\t', actualIters, '\t', myNumFun(result[, 'acc'], mean), '\t', myNumFun(result[, 'acc'], sd), '\t', myNumFun(result[, 'sens'], mean), '\t', myNumFun(result[, 'sens'], sd), '\t', myNumFun(result[, 'spec'], mean), '\t', myNumFun(result[, 'spec'], sd), '\t', myNumFun(result[, 'ppv'], mean), '\t', myNumFun(result[, 'ppv'], sd), '\t', myNumFun(result[, 'npv'], mean), '\t', myNumFun(result[, 'npv'], sd), '\t', length(diffGenesUnq), '\n', sep='') } # end for g cat(out, out2, sep='') ---- traceback(): 20: stop("Model is empty!") 19: predict.svm(ret, xhold, decision.values = TRUE) 18: predict(ret, xhold, decision.values = TRUE) 17: na.action(predict(ret, xhold, decision.values = TRUE)) 16: svm.default(cost = 0.1, kernel = "linear", type = "C-classification", ... 15: svm(cost = 0.1, kernel = "linear", type = "C-classification", ... 14: do.call("svm", args = ll) 13: function (X, y, f, learnind, probability, models = FALSE, ...) ... 12: function (X, y, f, learnind, probability, models = FALSE, ...) ... 11: do.call(classifier, args = c(list(X = X, y = y, learnind = learnmatrix[i, ... 10: classification(X = c(83.5832768669369, 83.146333099001, 94.253534443549, ... 9: classification(X = c(83.5832768669369, 83.146333099001, 94.253534443549, ... 8: do.call("classification", args = c(list(X = Xi, y = yi, learningsets = lsi, ... 7: tune(grids = list(cost = c(0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50... 6: tune(grids = list(cost = c(0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50... 5: do.call("tune", args = c(tuninglist, ll)) 4: classification(X, y = as.numeric(y) - 1, learningsets = learningsets, ... 3: classification(X, y = as.numeric(y) - 1, learningsets = learningsets, ... 2: classification(t(exp), factor(classes), learningsets = lsets, ... 1: classification(t(exp), factor(classes), learningsets = lsets, ...

Microarray Classification limma CMA Microarray Classification limma CMA • 2.4k views

ADD COMMENT • link 13.2 years ago Santosh Patnaik ▴ 30