Question

Support vector regression

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

For convenience sake, I use the example data to ask the question. I use QSAR.XLS [http://eric.univ- lyon2.fr/~ricco/tanagra/fichiers/qsar.zip] Considering the donors from the dataset as predictor variables and Activity as the resposne variable, I would like to do a support vector regression using both linear and non-linear kernels. In my case, I would like to find which of the predictors (out of the 20 donors) best explain the activity (response) and did the following: fit <- svm(activity ~ ., data=qsar,kernel='linear',type="eps- regression") Call: svm(formula = activity ~ ., data = qsar, kernel = "linear", type = "eps-regression") Parameters: SVM-Type: eps-regression SVM-Kernel: linear cost: 1 gamma: 0.04347826 epsilon: 0.1 Number of Support Vectors: 66 How to determine now which are the best predictors (out of the 20) which explain the activity and get the R-squared values ? And if I try several kernels, is it possible to represent the results in the following way. Below figure is an SVR regression example obtained from python and thought that the comparison of the model will be good this way. I found this from the link here http://scikit- learn.org/0.11/auto_examples/svm/plot_svm_regression.html I found several good tutorials for classification but for regression not, so, I tried to follow the tutorial from from http://eric.univ- lyon2.fr/~ricco/tanagra/fichiers /en_Tanagra_Support_Vector_Regression.pdf but did not understand very well. Could anyone please explain me how this is to be done? [1]: http://eric.univ-lyon2.fr/~ricco/tanagra/fichiers/qsar.zip [2]: [3]: http://i.stack.imgur.com/lCwm7.png [4]: -- output of sessionInfo(): R version 3.0.3 (2014-03-06) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 [4] LC_NUMERIC=C LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] kernlab_0.9-19 e1071_1.6-3 loaded via a namespace (and not attached): [1] class_7.3-9 tools_3.0.3 -- Sent via the guest posting facility at bioconductor.org.

Classification Regression Classification Regression • 1.5k views

ADD COMMENT • link updated 10.1 years ago by Steve Lianoglou ★ 13k • written 10.1 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-03-31

Hi, On Mon, Mar 31, 2014 at 9:06 AM, Paul [guest] <guest at="" bioconductor.org=""> wrote: > > For convenience sake, I use the example data to ask the question. I use QSAR.XLS [http://eric.univ- lyon2.fr/~ricco/tanagra/fichiers/qsar.zip] > > Considering the donors from the dataset as predictor variables and Activity as the resposne variable, I would like to do a support vector regression using both linear and non-linear kernels. > > In my case, I would like to find which of the predictors (out of the 20 donors) best explain the activity (response) and did the following: > > > fit <- svm(activity ~ ., data=qsar,kernel='linear',type ="eps-regression") > Call: > svm(formula = activity ~ ., data = qsar, kernel = "linear", type = "eps-regression") > > > Parameters: > SVM-Type: eps-regression > SVM-Kernel: linear > cost: 1 > gamma: 0.04347826 > epsilon: 0.1 > Number of Support Vectors: 66 > > How to determine now which are the best predictors (out of the 20) which explain the activity and get the R-squared values ? SVMs aren't the easiest to do this with. The trained model is (should be) sparse in *example* space, so you know which examples contribute most to your decision boundary, but you are left to reverse engineer how the features in each sample are responsible for that (given the kernel you use). Depending on the kernel you use, you can take the values in the W vector from the SVM as a feature ranking type of approach, but this gets complicated fast. You might try using a method that enforces sparsity in the feature space: try the glmnet package. You could also try penalizedSVM, but (I believe) the last time I checked you could only use linear kernel (although I could easily be mistaken). Also, looking at your results, you have 66 support vectors out of a dataset of 75 examples (so the model is not sparse with respect to the number of examples you used to train on). Typically you like to see the number of support vectors to be relatively fewer than the number of examples you are training on as a good sign of the fitness of your model. But the number of SVs aren't the *real* thing you are interested in, you'd rather want to do some cross validation to ensure that the model is actually generalizable (ie. how well does it predict on held out examples). Once you get something that looks promising, I'd then spend time trying to figure out how to extract features from it. I see in your dataset you've annotated some rows as train and test but you're not using that information just yet. HTH, -steve -- Steve Lianoglou Computational Biologist Genentech