fix: formatting
This commit is contained in:
+11
-11
@@ -1,20 +1,20 @@
|
|||||||
//! # Support Vector Machines
|
//! # Support Vector Machines
|
||||||
//!
|
//!
|
||||||
//! Support Vector Machines (SVM) is one of the most performant off-the-shelf machine learning algorithms.
|
//! Support Vector Machines (SVM) is one of the most performant off-the-shelf machine learning algorithms.
|
||||||
//! SVM is based on the [Vapnik–Chervonenkiy theory](https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theory) that was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkiy.
|
//! SVM is based on the [Vapnik–Chervonenkiy theory](https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theory) that was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkiy.
|
||||||
//!
|
//!
|
||||||
//! SVM splits data into two sets using a maximal-margin decision boundary, \\(f(x)\\). For regression, the algorithm uses a value of the function \\(f(x)\\) to predict a target value.
|
//! SVM splits data into two sets using a maximal-margin decision boundary, \\(f(x)\\). For regression, the algorithm uses a value of the function \\(f(x)\\) to predict a target value.
|
||||||
//! To classify a new point, algorithm calculates a sign of the decision function to see where the new point is relative to the boundary.
|
//! To classify a new point, algorithm calculates a sign of the decision function to see where the new point is relative to the boundary.
|
||||||
//!
|
//!
|
||||||
//! SVM is memory efficient since it uses only a subset of training data to find a decision boundary. This subset is called support vectors.
|
//! SVM is memory efficient since it uses only a subset of training data to find a decision boundary. This subset is called support vectors.
|
||||||
//!
|
//!
|
||||||
//! In SVM distance between a data point and the support vectors is defined by the kernel function.
|
//! In SVM distance between a data point and the support vectors is defined by the kernel function.
|
||||||
//! SmartCore supports multiple kernel functions but you can always define a new kernel function by implementing the `Kernel` trait. Not all functions can be a kernel.
|
//! SmartCore supports multiple kernel functions but you can always define a new kernel function by implementing the `Kernel` trait. Not all functions can be a kernel.
|
||||||
//! Building a new kernel requires a good mathematical understanding of the [Mercer theorem](https://en.wikipedia.org/wiki/Mercer%27s_theorem)
|
//! Building a new kernel requires a good mathematical understanding of the [Mercer theorem](https://en.wikipedia.org/wiki/Mercer%27s_theorem)
|
||||||
//! that gives necessary and sufficient condition for a function to be a kernel function.
|
//! that gives necessary and sufficient condition for a function to be a kernel function.
|
||||||
//!
|
//!
|
||||||
//! Pre-defined kernel functions:
|
//! Pre-defined kernel functions:
|
||||||
//!
|
//!
|
||||||
//! * *Linear*, \\( K(x, x') = \langle x, x' \rangle\\)
|
//! * *Linear*, \\( K(x, x') = \langle x, x' \rangle\\)
|
||||||
//! * *Polynomial*, \\( K(x, x') = (\gamma\langle x, x' \rangle + r)^d\\), where \\(d\\) is polynomial degree, \\(\gamma\\) is a kernel coefficient and \\(r\\) is an independent term in the kernel function.
|
//! * *Polynomial*, \\( K(x, x') = (\gamma\langle x, x' \rangle + r)^d\\), where \\(d\\) is polynomial degree, \\(\gamma\\) is a kernel coefficient and \\(r\\) is an independent term in the kernel function.
|
||||||
//! * *RBF (Gaussian)*, \\( K(x, x') = e^{-\gamma \lVert x - x' \rVert ^2} \\), where \\(\gamma\\) is kernel coefficient
|
//! * *RBF (Gaussian)*, \\( K(x, x') = e^{-\gamma \lVert x - x' \rVert ^2} \\), where \\(\gamma\\) is kernel coefficient
|
||||||
|
|||||||
+18
-18
@@ -1,27 +1,27 @@
|
|||||||
//! # Support Vector Classifier.
|
//! # Support Vector Classifier.
|
||||||
//!
|
//!
|
||||||
//! Support Vector Classifier (SVC) is a binary classifier that uses an optimal hyperplane to separate the points in the input variable space by their class.
|
//! Support Vector Classifier (SVC) is a binary classifier that uses an optimal hyperplane to separate the points in the input variable space by their class.
|
||||||
//!
|
//!
|
||||||
//! During training, SVC chooses a Maximal-Margin hyperplane that can separate all training instances with the largest margin.
|
//! During training, SVC chooses a Maximal-Margin hyperplane that can separate all training instances with the largest margin.
|
||||||
//! The margin is calculated as the perpendicular distance from the boundary to only the closest points. Hence, only these points are relevant in defining
|
//! The margin is calculated as the perpendicular distance from the boundary to only the closest points. Hence, only these points are relevant in defining
|
||||||
//! the hyperplane and in the construction of the classifier. These points are called the support vectors.
|
//! the hyperplane and in the construction of the classifier. These points are called the support vectors.
|
||||||
//!
|
//!
|
||||||
//! While SVC selects a hyperplane with the largest margin it allows some points in the training data to violate the separating boundary.
|
//! While SVC selects a hyperplane with the largest margin it allows some points in the training data to violate the separating boundary.
|
||||||
//! The parameter `C` > 0 gives you control over how SVC will handle violating points. The bigger the value of this parameter the more we penalize the algorithm
|
//! The parameter `C` > 0 gives you control over how SVC will handle violating points. The bigger the value of this parameter the more we penalize the algorithm
|
||||||
//! for incorrectly classified points. In other words, setting this parameter to a small value will result in a classifier that allows for a big number
|
//! for incorrectly classified points. In other words, setting this parameter to a small value will result in a classifier that allows for a big number
|
||||||
//! of misclassified samples. Mathematically, SVC optimization problem can be defined as:
|
//! of misclassified samples. Mathematically, SVC optimization problem can be defined as:
|
||||||
//!
|
//!
|
||||||
//! \\[\underset{w, \zeta}{minimize} \space \space \frac{1}{2} \lVert \vec{w} \rVert^2 + C\sum_{i=1}^m \zeta_i \\]
|
//! \\[\underset{w, \zeta}{minimize} \space \space \frac{1}{2} \lVert \vec{w} \rVert^2 + C\sum_{i=1}^m \zeta_i \\]
|
||||||
//!
|
//!
|
||||||
//! subject to:
|
//! subject to:
|
||||||
//!
|
//!
|
||||||
//! \\[y_i(\langle\vec{w}, \vec{x}_i \rangle + b) \geq 1 - \zeta_i \\]
|
//! \\[y_i(\langle\vec{w}, \vec{x}_i \rangle + b) \geq 1 - \zeta_i \\]
|
||||||
//! \\[\zeta_i \geq 0 for \space any \space i = 1, ... , m\\]
|
//! \\[\zeta_i \geq 0 for \space any \space i = 1, ... , m\\]
|
||||||
//!
|
//!
|
||||||
//! Where \\( m \\) is a number of training samples, \\( y_i \\) is a label value (either 1 or -1) and \\(\langle\vec{w}, \vec{x}_i \rangle + b\\) is a decision boundary.
|
//! Where \\( m \\) is a number of training samples, \\( y_i \\) is a label value (either 1 or -1) and \\(\langle\vec{w}, \vec{x}_i \rangle + b\\) is a decision boundary.
|
||||||
//!
|
//!
|
||||||
//! To solve this optimization problem, SmartCore uses an [approximate SVM solver](https://leon.bottou.org/projects/lasvm).
|
//! To solve this optimization problem, SmartCore uses an [approximate SVM solver](https://leon.bottou.org/projects/lasvm).
|
||||||
//! The optimizer reaches accuracies similar to that of a real SVM after performing two passes through the training examples. You can choose the number of passes
|
//! The optimizer reaches accuracies similar to that of a real SVM after performing two passes through the training examples. You can choose the number of passes
|
||||||
//! through the data that the algorithm takes by changing the `epoch` parameter of the classifier.
|
//! through the data that the algorithm takes by changing the `epoch` parameter of the classifier.
|
||||||
//!
|
//!
|
||||||
//! Example:
|
//! Example:
|
||||||
@@ -73,7 +73,7 @@
|
|||||||
//!
|
//!
|
||||||
//! * ["Support Vector Machines", Kowalczyk A., 2017](https://www.svm-tutorial.com/2017/10/support-vector-machines-succinctly-released/)
|
//! * ["Support Vector Machines", Kowalczyk A., 2017](https://www.svm-tutorial.com/2017/10/support-vector-machines-succinctly-released/)
|
||||||
//! * ["Fast Kernel Classifiers with Online and Active Learning", Bordes A., Ertekin S., Weston J., Bottou L., 2005](https://www.jmlr.org/papers/volume6/bordes05a/bordes05a.pdf)
|
//! * ["Fast Kernel Classifiers with Online and Active Learning", Bordes A., Ertekin S., Weston J., Bottou L., 2005](https://www.jmlr.org/papers/volume6/bordes05a/bordes05a.pdf)
|
||||||
//!
|
//!
|
||||||
//! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
|
//! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
|
||||||
//! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
//! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
||||||
|
|
||||||
|
|||||||
+11
-11
@@ -1,21 +1,21 @@
|
|||||||
//! # Epsilon-Support Vector Regression.
|
//! # Epsilon-Support Vector Regression.
|
||||||
//!
|
//!
|
||||||
//! Support Vector Regression (SVR) is a popular algorithm used for regression that uses the same principle as SVM.
|
//! Support Vector Regression (SVR) is a popular algorithm used for regression that uses the same principle as SVM.
|
||||||
//!
|
//!
|
||||||
//! Just like [SVC](../svc/index.html) SVR finds optimal decision boundary, \\(f(x)\\) that separates all training instances with the largest margin.
|
//! Just like [SVC](../svc/index.html) SVR finds optimal decision boundary, \\(f(x)\\) that separates all training instances with the largest margin.
|
||||||
//! Unlike SVC, in \\(\epsilon\\)-SVR regression the goal is to find a function \\(f(x)\\) that has at most \\(\epsilon\\) deviation from the
|
//! Unlike SVC, in \\(\epsilon\\)-SVR regression the goal is to find a function \\(f(x)\\) that has at most \\(\epsilon\\) deviation from the
|
||||||
//! known targets \\(y_i\\) for all the training data. To find this function, we need to find solution to this optimization problem:
|
//! known targets \\(y_i\\) for all the training data. To find this function, we need to find solution to this optimization problem:
|
||||||
//!
|
//!
|
||||||
//! \\[\underset{w, \zeta}{minimize} \space \space \frac{1}{2} \lVert \vec{w} \rVert^2 + C\sum_{i=1}^m \zeta_i \\]
|
//! \\[\underset{w, \zeta}{minimize} \space \space \frac{1}{2} \lVert \vec{w} \rVert^2 + C\sum_{i=1}^m \zeta_i \\]
|
||||||
//!
|
//!
|
||||||
//! subject to:
|
//! subject to:
|
||||||
//!
|
//!
|
||||||
//! \\[\lvert y_i - \langle\vec{w}, \vec{x}_i \rangle - b \rvert \leq \epsilon + \zeta_i \\]
|
//! \\[\lvert y_i - \langle\vec{w}, \vec{x}_i \rangle - b \rvert \leq \epsilon + \zeta_i \\]
|
||||||
//! \\[\lvert \langle\vec{w}, \vec{x}_i \rangle + b - y_i \rvert \leq \epsilon + \zeta_i \\]
|
//! \\[\lvert \langle\vec{w}, \vec{x}_i \rangle + b - y_i \rvert \leq \epsilon + \zeta_i \\]
|
||||||
//! \\[\zeta_i \geq 0 for \space any \space i = 1, ... , m\\]
|
//! \\[\zeta_i \geq 0 for \space any \space i = 1, ... , m\\]
|
||||||
//!
|
//!
|
||||||
//! Where \\( m \\) is a number of training samples, \\( y_i \\) is a target value and \\(\langle\vec{w}, \vec{x}_i \rangle + b\\) is a decision boundary.
|
//! Where \\( m \\) is a number of training samples, \\( y_i \\) is a target value and \\(\langle\vec{w}, \vec{x}_i \rangle + b\\) is a decision boundary.
|
||||||
//!
|
//!
|
||||||
//! The parameter `C` > 0 determines the trade-off between the flatness of \\(f(x)\\) and the amount up to which deviations larger than \\(\epsilon\\) are tolerated
|
//! The parameter `C` > 0 determines the trade-off between the flatness of \\(f(x)\\) and the amount up to which deviations larger than \\(\epsilon\\) are tolerated
|
||||||
//!
|
//!
|
||||||
//! Example:
|
//! Example:
|
||||||
@@ -66,7 +66,7 @@
|
|||||||
//! * ["A Fast Algorithm for Training Support Vector Machines", Platt J.C., 1998](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-98-14.pdf)
|
//! * ["A Fast Algorithm for Training Support Vector Machines", Platt J.C., 1998](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-98-14.pdf)
|
||||||
//! * ["Working Set Selection Using Second Order Information for Training Support Vector Machines", Rong-En Fan et al., 2005](https://www.jmlr.org/papers/volume6/fan05a/fan05a.pdf)
|
//! * ["Working Set Selection Using Second Order Information for Training Support Vector Machines", Rong-En Fan et al., 2005](https://www.jmlr.org/papers/volume6/fan05a/fan05a.pdf)
|
||||||
//! * ["A tutorial on support vector regression", Smola A.J., Scholkopf B., 2003](https://alex.smola.org/papers/2004/SmoSch04.pdf)
|
//! * ["A tutorial on support vector regression", Smola A.J., Scholkopf B., 2003](https://alex.smola.org/papers/2004/SmoSch04.pdf)
|
||||||
//!
|
//!
|
||||||
//! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
|
//! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
|
||||||
//! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
//! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user