feat: documents KNN Classifier

2020-08-27 19:47:11 -07:00
parent 762bc3d765
commit dcf636a5f1
3 changed files with 91 additions and 7 deletions
@@ -1,4 +1,35 @@
 //! # Nearest Neighbors
+//!
+//! <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_CHTML"></script>
+//!
+//! The k-nearest neighbors (KNN) algorithm is a simple supervised machine learning algorithm that can be used to solve both classification and regression problems.
+//! KNN is a non-parametric method that assumes that similar things exist in close proximity.
+//!
+//! During training the algorithms memorizes all training samples. To make a prediction it finds a predefined set of training samples closest in distance to the new
+//! point and uses labels of found samples to calculate value of new point. The number of samples (k) is defined by user and does not change after training.
+//!
+//! The distance can be any metric measure that is defined as \\( d(x, y) \geq 0\\)
+//! and follows three conditions:
+//! 1. \\( d(x, y) = 0 \\) if and only \\( x = y \\), positive definiteness
+//! 1. \\( d(x, y) = d(y, x) \\), symmetry
+//! 1. \\( d(x, y) \leq d(x, z) + d(z, y) \\), 	subadditivity or triangle inequality
+//!
+//! for all \\(x, y, z \in Z \\)
+//!
+//! Neighbors-based methods are very simple and are known as non-generalizing machine learning methods since they simply remember all of its training data and is prone to overfitting.
+//! Despite its disadvantages, nearest neighbors algorithms has been very successful in a large number of applications because of its flexibility and speed.
+//!
+//! __Advantages__
+//! * The algorithm is simple and fast.
+//! * The algorithm is non-parametric: there’s no need to build a model, the algorithm simply stores all training samples in memory.
+//! * The algorithm is versatile. It can be used for classification, regression.
+//!
+//! __Disadvantages__
+//! * The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase.
+//!
+//! ## References:
+//! * ["Nearest Neighbor Pattern Classification" Cover, T.M., IEEE Transactions on Information Theory (1967)](http://ssg.mit.edu/cal/abs/2000_spring/np_dens/classification/cover67.pdf)
+//! * ["The Elements of Statistical Learning: Data Mining, Inference, and Prediction" Trevor et al., 2nd edition, chapter 13](https://web.stanford.edu/~hastie/ElemStatLearn/)

 use crate::algorithm::neighbour::cover_tree::CoverTree;
 use crate::algorithm::neighbour::linear_search::LinearKNNSearch;
@@ -6,13 +37,18 @@ use crate::math::distance::Distance;
 use crate::math::num::FloatExt;
 use serde::{Deserialize, Serialize};

-///
+/// K Nearest Neighbors Classifier
 pub mod knn_classifier;
+/// K Nearest Neighbors Regressor
 pub mod knn_regressor;

+/// Both, KNN classifier and regressor benefits from underlying search algorithms that helps to speed up queries.
+/// `KNNAlgorithmName` maintains a list of supported search algorithms
 #[derive(Serialize, Deserialize, Debug)]
 pub enum KNNAlgorithmName {
+    /// Heap Search algorithm, see [`LinearSearch`](../algorithm/neighbour/linear_search/index.html)
    LinearSearch,
+    /// Cover Tree Search algorithm, see [`CoverTree`](../algorithm/neighbour/cover_tree/index.html)
    CoverTree,
 }