xvalSpec('NOTEST') problem
2
0
Entering edit mode
@gustavo-fernandez-bayon-5300
Last seen 8.3 years ago
Spain
Hi. I am trying to learn MLInterfaces to apply some Machine Learning Techniques to expression data. For now, it has been a pretty calm trip, but I have just arrived to a point where I am totally clueless. I have a small ExpressionSet, which I have generated as a subset of a bigger one: > eset ExpressionSet (storageMode: lockedEnvironment) assayData: 10 features, 55 samples element names: exprs protocolData: none phenoData sampleNames: GSM671268 GSM671269 ... GSM671347 (55 total) varLabels: title geo_accession ... data_row_count (36 total) varMetadata: labelDescription featureData featureNames: 1007_s_at 1053_at ... 160020_at (10 total) fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total) fvarMetadata: Column Description labelDescription experimentData: use 'experimentData(object)' Annotation: GPL96 As you can see, it is indeed small. It means nothing, but I just wanted something small enough to play with the interfaces in a reasonable amount of time. I just wanted to learn a linear SVM on it, but when I do the following... > X <- MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec('NOTEST')) …I get this error message (I have the Spanish version, translating it below) Error en matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = list(rowns, : extensión negativa para la matriz Error durante el wrapup: (Error in matrix […] Error during wrapup) If I execute the following, without using xvalSpec()… > X <- MLearn(source_name_ch1 ~ ., eset, svmI, 1:55) …I get the same result (fail). However, if I restrict the training indices, and execute as follows… > X <- MLearn(source_name_ch1 ~ ., eset, svmI, 1:54) …then it works as smoothly as expected. In order to implement a version of SVM-RFE, I need to train the SVM on all the training data available, which is something I have done before (with SVMLight on a UNIX machine, or with LibSVM from C++ code, long, long time before I discovered the wonderful world of R and BioC). Output of traceback(), after the error, looks like this: > traceback() 12: matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = list(rowns, colns)) 11: napredict.default(act, matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = list(rowns, colns))) 10: napredict(act, matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = list(rowns, colns))) 9: predict.svm(obj, teData, decision.values = TRUE, probability = TRUE) 8: predict(obj, teData, decision.values = TRUE, probability = TRUE) 7: .method@converter(ans, data, trainInd) 6: MLearn(formula, data, .method, 1:N, ...) 5: MLearn(formula, data, .method, 1:N, ...) 4: MLearn(formula, data, .method, trainInd, ...) 3: MLearn(formula, data, .method, trainInd, ...) 2: MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec("NOTEST"), kernel = "linear") 1: MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec("NOTEST"), kernel = "linear") …and this is my sessionInfo: > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MLInterfaces_1.36.0 sfsmisc_1.0-20 cluster_1.14.2 annotate_1.34.0 [5] AnnotationDbi_1.18.0 rda_1.0.2 rpart_3.1-52 MASS_7.3-18 [9] multtest_2.12.0 genefilter_1.38.0 GEOquery_2.23.2 Biobase_2.16.0 [13] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] DBI_0.2-5 IRanges_1.14.3 Matrix_1.0-6 RCurl_1.91-1 RSQLite_0.11.1 [6] XML_3.9-4 class_7.3-3 e1071_1.6 gdata_2.8.2 grid_2.15.0 [11] gtools_2.6.2 lattice_0.20-6 mboost_2.1-2 nnet_7.3-1 splines_2.15.0 [16] stats4_2.15.0 survival_2.36-14 tools_2.15.0 xtable_1.7-0 Does anybody know why the call to MLearn fails? By the way, if I call directly to the sum() function in e1071 package, it works: > svm(t(exprs(eset)), eset$source_name_ch1, kernel='linear', subset=1:55) Call: svm.default(x = t(exprs(eset)), y = eset$source_name_ch1, kernel = "linear", subset = 1:55) Parameters: SVM-Type: C-classification SVM-Kernel: linear cost: 1 gamma: 0.1 Number of Support Vectors: 10 ..so I guess it might be a MLInterfaces issue. What do you think? Regards, Gustavo --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) [[alternative HTML version deleted]]
MLInterfaces MLInterfaces • 1.2k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 weeks ago
United States
Thank you for this note. I have recently encountered problems with MLInterfaces that may be of a similar nature. I will get back to you ASAP. On Thu, May 24, 2012 at 4:32 AM, Gustavo Fernández Bayón <gbayon@gmail.com>wrote: > Hi. > > I am trying to learn MLInterfaces to apply some Machine Learning > Techniques to expression data. For now, it has been a pretty calm trip, but > I have just arrived to a point where I am totally clueless. > > I have a small ExpressionSet, which I have generated as a subset of a > bigger one: > > > eset > ExpressionSet (storageMode: lockedEnvironment) > assayData: 10 features, 55 samples > element names: exprs > protocolData: none > phenoData > sampleNames: GSM671268 GSM671269 ... GSM671347 (55 total) > varLabels: title geo_accession ... data_row_count (36 total) > varMetadata: labelDescription > featureData > featureNames: 1007_s_at 1053_at ... 160020_at (10 total) > fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total) > fvarMetadata: Column Description labelDescription > experimentData: use 'experimentData(object)' > Annotation: GPL96 > > > As you can see, it is indeed small. It means nothing, but I just wanted > something small enough to play with the interfaces in a reasonable amount > of time. I just wanted to learn a linear SVM on it, but when I do the > following... > > > X <- MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec('NOTEST')) > > I get this error message (I have the Spanish version, translating it > below) > > Error en matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = > list(rowns, : > extensión negativa para la matriz > Error durante el wrapup: > > > (Error in matrix [ ] > Error during wrapup) > > If I execute the following, without using xvalSpec() > > > X <- MLearn(source_name_ch1 ~ ., eset, svmI, 1:55) > > I get the same result (fail). However, if I restrict the training > indices, and execute as follows > > > X <- MLearn(source_name_ch1 ~ ., eset, svmI, 1:54) > > then it works as smoothly as expected. > > In order to implement a version of SVM-RFE, I need to train the SVM on all > the training data available, which is something I have done before (with > SVMLight on a UNIX machine, or with LibSVM from C++ code, long, long time > before I discovered the wonderful world of R and BioC). > > Output of traceback(), after the error, looks like this: > > > traceback() > 12: matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = > list(rowns, > colns)) > 11: napredict.default(act, matrix(ret$dec, nrow = nrow(newdata), > byrow = TRUE, dimnames = list(rowns, colns))) > 10: napredict(act, matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, > dimnames = list(rowns, colns))) > 9: predict.svm(obj, teData, decision.values = TRUE, probability = TRUE) > 8: predict(obj, teData, decision.values = TRUE, probability = TRUE) > 7: .method@converter(ans, data, trainInd) > 6: MLearn(formula, data, .method, 1:N, ...) > 5: MLearn(formula, data, .method, 1:N, ...) > 4: MLearn(formula, data, .method, trainInd, ...) > 3: MLearn(formula, data, .method, trainInd, ...) > 2: MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec("NOTEST"), kernel = > "linear") > 1: MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec("NOTEST"), kernel = > "linear") > > > and this is my sessionInfo: > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/en_US.UTF-8/C/C/C/C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] MLInterfaces_1.36.0 sfsmisc_1.0-20 cluster_1.14.2 > annotate_1.34.0 > [5] AnnotationDbi_1.18.0 rda_1.0.2 rpart_3.1-52 > MASS_7.3-18 > [9] multtest_2.12.0 genefilter_1.38.0 GEOquery_2.23.2 > Biobase_2.16.0 > [13] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] DBI_0.2-5 IRanges_1.14.3 Matrix_1.0-6 RCurl_1.91-1 > RSQLite_0.11.1 > [6] XML_3.9-4 class_7.3-3 e1071_1.6 gdata_2.8.2 > grid_2.15.0 > [11] gtools_2.6.2 lattice_0.20-6 mboost_2.1-2 nnet_7.3-1 > splines_2.15.0 > [16] stats4_2.15.0 survival_2.36-14 tools_2.15.0 xtable_1.7-0 > > > Does anybody know why the call to MLearn fails? By the way, if I call > directly to the sum() function in e1071 package, it works: > > > svm(t(exprs(eset)), eset$source_name_ch1, kernel='linear', subset=1:55) > > Call: > svm.default(x = t(exprs(eset)), y = eset$source_name_ch1, kernel = > "linear", subset = 1:55) > > > Parameters: > SVM-Type: C-classification > SVM-Kernel: linear > cost: 1 > gamma: 0.1 > > Number of Support Vectors: 10 > > ..so I guess it might be a MLInterfaces issue. What do you think? > > Regards, > > Gustavo > > > --------------------------- > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 weeks ago
United States
On Thu, May 24, 2012 at 4:32 AM, Gustavo Fernández Bayón <gbayon@gmail.com>wrote: > Hi. > > I am trying to learn MLInterfaces to apply some Machine Learning > Techniques to expression data. For now, it has been a pretty calm trip, but > I have just arrived to a point where I am totally clueless. > > I have a small ExpressionSet, which I have generated as a subset of a > bigger one: > > > eset > ExpressionSet (storageMode: lockedEnvironment) > assayData: 10 features, 55 samples > element names: exprs > protocolData: none > phenoData > sampleNames: GSM671268 GSM671269 ... GSM671347 (55 total) > varLabels: title geo_accession ... data_row_count (36 total) > varMetadata: labelDescription > featureData > featureNames: 1007_s_at 1053_at ... 160020_at (10 total) > fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total) > fvarMetadata: Column Description labelDescription > experimentData: use 'experimentData(object)' > Annotation: GPL96 > > > As you can see, it is indeed small. It means nothing, but I just wanted > something small enough to play with the interfaces in a reasonable amount > of time. I just wanted to learn a linear SVM on it, but when I do the > following... > > > X <- MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec('NOTEST')) > > I get this error message (I have the Spanish version, translating it > below) > > Error en matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = > list(rowns, : > extensión negativa para la matriz > Error durante el wrapup: > xvalSpec("NOTEST") is intended for use with randomForestI and rdacvI -- both of the associated learners have embedded resampling so the MLearn user can proceed without specifying a train/test decomposition approach (knn.cvI is a little different but in principle can used with xvalSpec("NOTEST")). The release documentation now clarifies the situation, and additional warning infrastructure has been added to the release branch. This will be propagated to the devel branch ASAP. > > (Error in matrix [ ] > Error during wrapup) > > If I execute the following, without using xvalSpec() > > > X <- MLearn(source_name_ch1 ~ ., eset, svmI, 1:55) > > I get the same result (fail). This is to be expected. The learner schemas as programmed do expect that the trainInd does not index the entire dataset. Documentation and a warning will be added to indicate this. The error is not universal -- some learners have predict methods that return gracefully when handed a newdata of zero rows. So I have not decided how to deal with this in the infrastructure at this time. The main intention of MLInterfaces was to simplify use of machine learning with genome scale data, diminishing requirements of data reformatting as one moves among different learning packages. Support for convenient cross-validation is also an important feature but adds complexity. For the specific use case underlying this email, modification to the learnerSchema instance can be used to allow use of xvalSpec("NOTEST") more broadly, and this is illustrated in the revised MLint_devel vignette, in release. > However, if I restrict the training indices, and execute as follows > > > X <- MLearn(source_name_ch1 ~ ., eset, svmI, 1:54) > > then it works as smoothly as expected. > > In order to implement a version of SVM-RFE, I need to train the SVM on all > the training data available, which is something I have done before (with > SVMLight on a UNIX machine, or with LibSVM from C++ code, long, long time > before I discovered the wonderful world of R and BioC). > Note that the CMA package in bioc and caret package on CRAN address RFE; other packages may do so as well. > Output of traceback(), after the error, looks like this: > > > traceback() > 12: matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = > list(rowns, > colns)) > 11: napredict.default(act, matrix(ret$dec, nrow = nrow(newdata), > byrow = TRUE, dimnames = list(rowns, colns))) > 10: napredict(act, matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, > dimnames = list(rowns, colns))) > 9: predict.svm(obj, teData, decision.values = TRUE, probability = TRUE) > 8: predict(obj, teData, decision.values = TRUE, probability = TRUE) > 7: .method@converter(ans, data, trainInd) > 6: MLearn(formula, data, .method, 1:N, ...) > 5: MLearn(formula, data, .method, 1:N, ...) > 4: MLearn(formula, data, .method, trainInd, ...) > 3: MLearn(formula, data, .method, trainInd, ...) > 2: MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec("NOTEST"), kernel = > "linear") > 1: MLearn(source_name_ch1 ~ ., eset, svmI, xvalSpec("NOTEST"), kernel = > "linear") > > > and this is my sessionInfo: > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/en_US.UTF-8/C/C/C/C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] MLInterfaces_1.36.0 sfsmisc_1.0-20 cluster_1.14.2 > annotate_1.34.0 > [5] AnnotationDbi_1.18.0 rda_1.0.2 rpart_3.1-52 > MASS_7.3-18 > [9] multtest_2.12.0 genefilter_1.38.0 GEOquery_2.23.2 > Biobase_2.16.0 > [13] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] DBI_0.2-5 IRanges_1.14.3 Matrix_1.0-6 RCurl_1.91-1 > RSQLite_0.11.1 > [6] XML_3.9-4 class_7.3-3 e1071_1.6 gdata_2.8.2 > grid_2.15.0 > [11] gtools_2.6.2 lattice_0.20-6 mboost_2.1-2 nnet_7.3-1 > splines_2.15.0 > [16] stats4_2.15.0 survival_2.36-14 tools_2.15.0 xtable_1.7-0 > > > Does anybody know why the call to MLearn fails? By the way, if I call > directly to the sum() function in e1071 package, it works: > > > svm(t(exprs(eset)), eset$source_name_ch1, kernel='linear', subset=1:55) > > Call: > svm.default(x = t(exprs(eset)), y = eset$source_name_ch1, kernel = > "linear", subset = 1:55) > > > Parameters: > SVM-Type: C-classification > SVM-Kernel: linear > cost: 1 > gamma: 0.1 > > Number of Support Vectors: 10 > > ..so I guess it might be a MLInterfaces issue. What do you think? > > Regards, > > Gustavo > > > --------------------------- > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6