feat: documents KNN Classifier
This commit is contained in:
+37
-1
@@ -1,4 +1,35 @@
|
||||
//! # Nearest Neighbors
|
||||
//!
|
||||
//! <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_CHTML"></script>
|
||||
//!
|
||||
//! The k-nearest neighbors (KNN) algorithm is a simple supervised machine learning algorithm that can be used to solve both classification and regression problems.
|
||||
//! KNN is a non-parametric method that assumes that similar things exist in close proximity.
|
||||
//!
|
||||
//! During training the algorithms memorizes all training samples. To make a prediction it finds a predefined set of training samples closest in distance to the new
|
||||
//! point and uses labels of found samples to calculate value of new point. The number of samples (k) is defined by user and does not change after training.
|
||||
//!
|
||||
//! The distance can be any metric measure that is defined as \\( d(x, y) \geq 0\\)
|
||||
//! and follows three conditions:
|
||||
//! 1. \\( d(x, y) = 0 \\) if and only \\( x = y \\), positive definiteness
|
||||
//! 1. \\( d(x, y) = d(y, x) \\), symmetry
|
||||
//! 1. \\( d(x, y) \leq d(x, z) + d(z, y) \\), subadditivity or triangle inequality
|
||||
//!
|
||||
//! for all \\(x, y, z \in Z \\)
|
||||
//!
|
||||
//! Neighbors-based methods are very simple and are known as non-generalizing machine learning methods since they simply remember all of its training data and is prone to overfitting.
|
||||
//! Despite its disadvantages, nearest neighbors algorithms has been very successful in a large number of applications because of its flexibility and speed.
|
||||
//!
|
||||
//! __Advantages__
|
||||
//! * The algorithm is simple and fast.
|
||||
//! * The algorithm is non-parametric: there’s no need to build a model, the algorithm simply stores all training samples in memory.
|
||||
//! * The algorithm is versatile. It can be used for classification, regression.
|
||||
//!
|
||||
//! __Disadvantages__
|
||||
//! * The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase.
|
||||
//!
|
||||
//! ## References:
|
||||
//! * ["Nearest Neighbor Pattern Classification" Cover, T.M., IEEE Transactions on Information Theory (1967)](http://ssg.mit.edu/cal/abs/2000_spring/np_dens/classification/cover67.pdf)
|
||||
//! * ["The Elements of Statistical Learning: Data Mining, Inference, and Prediction" Trevor et al., 2nd edition, chapter 13](https://web.stanford.edu/~hastie/ElemStatLearn/)
|
||||
|
||||
use crate::algorithm::neighbour::cover_tree::CoverTree;
|
||||
use crate::algorithm::neighbour::linear_search::LinearKNNSearch;
|
||||
@@ -6,13 +37,18 @@ use crate::math::distance::Distance;
|
||||
use crate::math::num::FloatExt;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
///
|
||||
/// K Nearest Neighbors Classifier
|
||||
pub mod knn_classifier;
|
||||
/// K Nearest Neighbors Regressor
|
||||
pub mod knn_regressor;
|
||||
|
||||
/// Both, KNN classifier and regressor benefits from underlying search algorithms that helps to speed up queries.
|
||||
/// `KNNAlgorithmName` maintains a list of supported search algorithms
|
||||
#[derive(Serialize, Deserialize, Debug)]
|
||||
pub enum KNNAlgorithmName {
|
||||
/// Heap Search algorithm, see [`LinearSearch`](../algorithm/neighbour/linear_search/index.html)
|
||||
LinearSearch,
|
||||
/// Cover Tree Search algorithm, see [`CoverTree`](../algorithm/neighbour/cover_tree/index.html)
|
||||
CoverTree,
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user