SVMs(Support Vector Machines) are commonly used to do data classification. Compared to Neural Network, it is easier to use. When we are doing classification, we need to separate our data into training and testing sets. Each individual in the training set contains one target value (i.e. the class labels) and several attributes (i.e. the features or observed variables). The goal of SVM is to produce a model based on the training set and use this model to predict the target values of the testing set given only the attributes of test data.
1.download and install gnuplot http://sourceforge.net/projects/gnuplot/files/
this is required to use the parameter selection tool grid.py in LIBSVM
2. download LIBSVM
Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification, one-class-SVM, epsilon-SVM regression, and nu-SVM regression. It also provides an automatic model selection tool for C-SVM classification.
3. Materials
Spectral data: A set of fragmentation spectra from C. elegans set run in an FT-ICR, who’s monoisotopic masses been determined by Hardklör/Bulseye
PSM identifications: the fragmentation data has been searched and postprocessed using crux/percolator.
4. What to do
• Extract a set of fragmentation spectra which have PSMs with a q-value less than 1%.
• Determine the relative mass deviation (<observed mass> -<calculated mass>)/<calculated mass> for each of the PSMs in the set, and investigate the realtionship to I/<observed_mass_to_charge>.
• For each PSM, extract the features: (1) Total ion current of the MS/MS scan, (2) Ion injection time of the MS/MS scan, (3) I/<observed_mass_to_charge>, (4) I/<observed_mass_to_charge>^2 (5) the relative mass deviation
• Design an SVR that from (1)-(4) predicts (5)
• Use cross validation to determine the performance of the system
No comments:
Post a Comment