'Empty model' error during Monte Carlo cross-validation of SVM classification with the CMA package
0
0
Entering edit mode
@santosh-patnaik-4477
Last seen 9.6 years ago
Hi, I am using the CMA package to perform internal, Monte Carlo cross-validation (MCCV) on SVM classifiers in a microarray dataset. The dataset has numerical, non-NA values for ~400 genes for 45 samples belonging to one of two classes (labels 0 or 1; at about 1:1 proportion). I am trying a 1000-iteration MCCV with 15 variables selected for SVM using the limma statistics for differential gene expression analysis. During MCCV, a fraction of the 45-sample set is used for training an SVM classifier, which is then used to test the remaining fraction, and I am trying different values for the training-set fraction. CMA also performs inner-loop validations (3-fold cross-validation within the training sets, by default) to fine-tune the classifiers to be used for cross-validation against the test-sets. Sometimes, for low training-set sizes, CMA shows an error in the console and halts the rest of the code: ------------------------- [snip] tuning iteration 575 tuning iteration 576 tuning iteration 577 Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty! ------------------------- This happens even when I use a test other than limma's for variable selection, or use, say, two instead of 15 variables for classifier generation. The R code I use should ensure that the training-sets always have members of both classes. The code I use and the traceback() return are further below. I would appreciate any insight on this. Thank you. Santosh Patnaik ---- R code; until g=8 in the 'for(g in 0:10)' loop everything goes OK # exp is the expression table, a matrix; 'classes' is list of known classes exp <- as.matrix(read.table(file='hy3ExpHsaMir.txt', sep='\t', row.names=1, header=T, check.names=F)) #best is to use 0 and 1 as class labels (instead of 'p', 'g', etc.) with 1 for 'positive' items (positive for recurrence, or for disease, etc.) classes <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) yesPredVal = 1 # class label for 'positive' items in 'classes' library(CMA) library(WilcoxCV) myNumFun <- function(x, y){round(y(as.numeric(x), na.rm=T), 4)} set.seed(631) out = '' out2 = '\nEffect of varying the training-set size:\nTraining-set size\tSuccessful iterations\tMean acc.\tSD acc.\tMean sens.\tSD sens.\tMean spec.\tSD spec.\tMean PPV\tSD PPV\tMean NPV\tSD NPV\tTotal genes in the classifiers\n' niter = 1000 diffTest = 'limma' diffGeneNum = 15 svmCost <- c(0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50) for(g in 0:10){ ntest = 3+g*3 # test-set size result <- matrix(nrow=0, ncol=7) colnames(result) <- c('trainSetSize', 'iteration', 'acc', 'sens', 'spec', 'ppv', 'npv') diffGenes <- numeric() # generate training and test sets lsets <- GenerateLearningsets(n=ncol(exp), y=classes, method=c('MCCV'), niter=niter, ntrain=ncol(exp)-ntest) # actual prediction work svm <- classification(t(exp), factor(classes), learningsets=lsets, genesellist= list(method=diffTest), classifier=svmCMA, nbgene= diffGeneNum, tuninglist=list(grids=list(cost=svmCost)), probability=TRUE) svm <- join(svm) # genes in classifiers svmGenes <- GeneSelection(t(exp), classes, learningsets=lsets, method=diffTest) actualIters=0 for(h in 1:niter){ m <- ntest*(h-1) # valid SVM classification requires min. 2 classes if(1 < length(unique(classes[-lsets at learnmatrix[h,]]))){ actualIters = actualIters+1 tp <- tn <- fp <- fn <- 0 for(i in 1:ntest){ pred <- svm at yhat[m+i] known <- svm at y[m+i] if(pred == known){ if(pred == yesPredVal){tp <- tp+1} else{tn <- tn+1} }else{ if(pred == yesPredVal){fp <- fp+1} else{fn <- fn+1} } } result <- rbind(result, c(ncol(exp)-ntest, h, (tp+tn)/(tp+tn+fp+fn), tp/(tp+fn), tn/(tn+fp), tp/(tp+fp), tn/(tn+fn))) diffGenes <- c(diffGenes, toplist(svmGenes, k=diffGeneNum, iter=h, show=F)$index) } # end if valid SVM } # end for h # output accuracy, etc. out = paste(out, 'SVM MCCV using ', niter, ' attempted iterations and ', actualIters, ' successful iterations, with ', ncol(exp)-ntest, ' of ', ncol(exp), ' total samples used for training:\nThe means (ranges; SDs) of prediction accuracy, sensitivity, specificity, PPV and NPV in fractions are ', myNumFun(result[, 'acc'],mean), ' (', myNumFun(result[, 'acc'], min), '-', myNumFun(result[, 'acc'], max), '; ', myNumFun(result[, 'acc'], sd), '), ', myNumFun(result[, 'sens'], mean), ' (', myNumFun(result[, 'sens'], min), '-', myNumFun(result[, 'sens'], max), '; ', myNumFun(result[, 'sens'], sd), '), ', myNumFun(result[, 'spec'], mean), ' (', myNumFun(result[, 'spec'], min), '-', myNumFun(result[, 'spec'], max), '; ', myNumFun(result[, 'spec'], sd), '), ', myNumFun(result[, 'ppv'], mean), ' (', myNumFun(result[, 'ppv'], min), '-', myNumFun(result[, 'ppv'], max), '; ', myNumFun(result[, 'ppv'], sd), '), and ', myNumFun(result[, 'npv'], mean), ' (', myNumFun(result[, 'npv'], min), '-', myNumFun(result[, 'npv'], max), '; ', myNumFun(result[, 'npv'], sd), '), respectively.\n', sep='') # output classifier genes diffGenesUnq <- unique(diffGenes) out = paste(out, 'A total of ', length(diffGenesUnq), ' genes occur in the ', actualIters, ' classifiers, with occurrence frequencies in fractions:\n', sep='') for(i in 1:length(diffGenesUnq)){ out = paste(out, rownames(exp)[diffGenesUnq[i]], '\t', round(sum(diffGenes == diffGenesUnq[i])/actualIters, 3), '\n', sep='') } # output split-size effect out2 = paste(out2, ncol(exp)-ntest, '\t', actualIters, '\t', myNumFun(result[, 'acc'], mean), '\t', myNumFun(result[, 'acc'], sd), '\t', myNumFun(result[, 'sens'], mean), '\t', myNumFun(result[, 'sens'], sd), '\t', myNumFun(result[, 'spec'], mean), '\t', myNumFun(result[, 'spec'], sd), '\t', myNumFun(result[, 'ppv'], mean), '\t', myNumFun(result[, 'ppv'], sd), '\t', myNumFun(result[, 'npv'], mean), '\t', myNumFun(result[, 'npv'], sd), '\t', length(diffGenesUnq), '\n', sep='') } # end for g cat(out, out2, sep='') ---- traceback(): 20: stop("Model is empty!") 19: predict.svm(ret, xhold, decision.values = TRUE) 18: predict(ret, xhold, decision.values = TRUE) 17: na.action(predict(ret, xhold, decision.values = TRUE)) 16: svm.default(cost = 0.1, kernel = "linear", type = "C-classification", ... 15: svm(cost = 0.1, kernel = "linear", type = "C-classification", ... 14: do.call("svm", args = ll) 13: function (X, y, f, learnind, probability, models = FALSE, ...) ... 12: function (X, y, f, learnind, probability, models = FALSE, ...) ... 11: do.call(classifier, args = c(list(X = X, y = y, learnind = learnmatrix[i, ... 10: classification(X = c(83.5832768669369, 83.146333099001, 94.253534443549, ... 9: classification(X = c(83.5832768669369, 83.146333099001, 94.253534443549, ... 8: do.call("classification", args = c(list(X = Xi, y = yi, learningsets = lsi, ... 7: tune(grids = list(cost = c(0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50... 6: tune(grids = list(cost = c(0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50... 5: do.call("tune", args = c(tuninglist, ll)) 4: classification(X, y = as.numeric(y) - 1, learningsets = learningsets, ... 3: classification(X, y = as.numeric(y) - 1, learningsets = learningsets, ... 2: classification(t(exp), factor(classes), learningsets = lsets, ... 1: classification(t(exp), factor(classes), learningsets = lsets, ...
Microarray Classification limma CMA Microarray Classification limma CMA • 2.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 1105 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6