Thursday, March 10, 2011

Learn how to start SVM

Reference: A Practical Guide to Support Vector Classification by Chih-Wei Hsu

Proposed procedures for beginners:
1. Convert data into the LibSVM format
2. Scale the data
3. Try RBF kernel first
4. Use cross-validation to find the best parameter C and   gama
5. Use the best parameter C and gama to train the whole training set.
6. Test the model on the testing set.

1. The format of training and testing data file is:

<label> <index1>:<value1> <index2>:<value2> ...
.
.

Each line contains an instance and is ended by a '\n' character.  For classification, <label> is an integer indicating the class label. For regression, <label> is the target value which can be any real number. i.e. our training data is :

0.985749058346 1:24451.96 2:198.0345 3:0.00155077697416 4:2.40490922357e-06
<index>:<value> gives a feature (attribute) value. <index> is an integer starting from 1 and <value> is a real number. Indices must be in ASCENDING order. Labels in the testing file are only used to calculate accuracy or errors. If they are
unknown, fill the first column with any numbers.

2. Check data type using the command:
checkdata.py libSVMformat.data
    no error was reported.

3. Separate data(3026 in total) into training set(2726) and testing set(300) using random selection. Command:
subset.py -s 1 libSVMformat.data 300 test.data train.data

4. Scale the data
> svm-scale -l 0 -u 1 -s range train.data > train.scale
> svm-scale -r range test.data > test.scale
Scale each feature of the training data to be in [0,1]. Scaling
factors are stored in the file range and then used for scaling the
test data.

5. First try to train model
svm-train -s 3 -p 0.1 -t 0 train.scale
Solve SVM regression with linear kernel u'v and epsilon=0.1
in the loss function.
6. test
svm-predict test.scale train.scale.model test.predict
results:
Mean squared error = 3.29278 (regression)
Squared correlation coefficient = 0.0463435 (regression)

7. Second try: use RBF kernel to train and test
>svm-train -s 3 -p 0.1 -t 2 train.scale
>svm-predict test.scale train.scale.model test.predict
results:
Mean squared error = 3.28346 (regression)
Squared correlation coefficient = 0.0498395 (regression)

No comments:

Post a Comment