* First draft of the new n-dimensional arrays + NB use case * Improves default implementation of multiple Array methods * Refactors tree methods * Adds matrix decomposition routines * Adds matrix decomposition methods to ndarray and nalgebra bindings * Refactoring + linear regression now uses array2 * Ridge & Linear regression * LBFGS optimizer & logistic regression * LBFGS optimizer & logistic regression * Changes linear methods, metrics and model selection methods to new n-dimensional arrays * Switches KNN and clustering algorithms to new n-d array layer * Refactors distance metrics * Optimizes knn and clustering methods * Refactors metrics module * Switches decomposition methods to n-dimensional arrays * Linalg refactoring - cleanup rng merge (#172) * Remove legacy DenseMatrix and BaseMatrix implementation. Port the new Number, FloatNumber and Array implementation into module structure. * Exclude AUC metrics. Needs reimplementation * Improve developers walkthrough New traits system in place at `src/numbers` and `src/linalg` Co-authored-by: Lorenzo <tunedconsulting@gmail.com> * Provide SupervisedEstimator with a constructor to avoid explicit dynamical box allocation in 'cross_validate' and 'cross_validate_predict' as required by the use of 'dyn' as per Rust 2021 * Implement getters to use as_ref() in src/neighbors * Implement getters to use as_ref() in src/naive_bayes * Implement getters to use as_ref() in src/linear * Add Clone to src/naive_bayes * Change signature for cross_validate and other model_selection functions to abide to use of dyn in Rust 2021 * Implement ndarray-bindings. Remove FloatNumber from implementations * Drop nalgebra-bindings support (as decided in conf-call to go for ndarray) * Remove benches. Benches will have their own repo at smartcore-benches * Implement SVC * Implement SVC serialization. Move search parameters in dedicated module * Implement SVR. Definitely too slow * Fix compilation issues for wasm (#202) Co-authored-by: Luis Moreno <morenol@users.noreply.github.com> * Fix tests (#203) * Port linalg/traits/stats.rs * Improve methods naming * Improve Display for DenseMatrix Co-authored-by: Montana Low <montanalow@users.noreply.github.com> Co-authored-by: VolodymyrOrlov <volodymyr.orlov@gmail.com>
85 lines
4.1 KiB
Rust
85 lines
4.1 KiB
Rust
//! # Nearest Neighbors
|
||
//!
|
||
//! The k-nearest neighbors (KNN) algorithm is a simple supervised machine learning algorithm that can be used to solve both classification and regression problems.
|
||
//! KNN is a non-parametric method that assumes that similar things exist in close proximity.
|
||
//!
|
||
//! During training the algorithms memorizes all training samples. To make a prediction it finds a predefined set of training samples closest in distance to the new
|
||
//! point and uses labels of found samples to calculate value of new point. The number of samples (k) is defined by user and does not change after training.
|
||
//!
|
||
//! The distance can be any metric measure that is defined as \\( d(x, y) \geq 0\\)
|
||
//! and follows three conditions:
|
||
//! 1. \\( d(x, y) = 0 \\) if and only \\( x = y \\), positive definiteness
|
||
//! 1. \\( d(x, y) = d(y, x) \\), symmetry
|
||
//! 1. \\( d(x, y) \leq d(x, z) + d(z, y) \\), subadditivity or triangle inequality
|
||
//!
|
||
//! for all \\(x, y, z \in Z \\)
|
||
//!
|
||
//! Neighbors-based methods are very simple and are known as non-generalizing machine learning methods since they simply remember all of its training data and is prone to overfitting.
|
||
//! Despite its disadvantages, nearest neighbors algorithms has been very successful in a large number of applications because of its flexibility and speed.
|
||
//!
|
||
//! __Advantages__
|
||
//! * The algorithm is simple and fast.
|
||
//! * The algorithm is non-parametric: there’s no need to build a model, the algorithm simply stores all training samples in memory.
|
||
//! * The algorithm is versatile. It can be used for classification, regression.
|
||
//!
|
||
//! __Disadvantages__
|
||
//! * The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase.
|
||
//!
|
||
//! ## References:
|
||
//! * ["Nearest Neighbor Pattern Classification" Cover, T.M., IEEE Transactions on Information Theory (1967)](http://ssg.mit.edu/cal/abs/2000_spring/np_dens/classification/cover67.pdf)
|
||
//! * ["The Elements of Statistical Learning: Data Mining, Inference, and Prediction" Trevor et al., 2nd edition, chapter 13](https://web.stanford.edu/~hastie/ElemStatLearn/)
|
||
//!
|
||
//! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
|
||
//! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
||
|
||
#[cfg(feature = "serde")]
|
||
use serde::{Deserialize, Serialize};
|
||
|
||
/// K Nearest Neighbors Classifier
|
||
pub mod knn_classifier;
|
||
/// K Nearest Neighbors Regressor
|
||
pub mod knn_regressor;
|
||
|
||
/// `KNNAlgorithmName` maintains a list of supported search algorithms, see [KNN algorithms](../algorithm/neighbour/index.html)
|
||
#[deprecated(
|
||
since = "0.2.0",
|
||
note = "please use `smartcore::algorithm::neighbour::KNNAlgorithmName` instead"
|
||
)]
|
||
pub type KNNAlgorithmName = crate::algorithm::neighbour::KNNAlgorithmName;
|
||
|
||
/// Weight function that is used to determine estimated value.
|
||
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
|
||
#[derive(Debug, Clone)]
|
||
pub enum KNNWeightFunction {
|
||
/// All k nearest points are weighted equally
|
||
Uniform,
|
||
/// k nearest points are weighted by the inverse of their distance. Closer neighbors will have a greater influence than neighbors which are further away.
|
||
Distance,
|
||
}
|
||
|
||
impl Default for KNNWeightFunction {
|
||
fn default() -> Self {
|
||
KNNWeightFunction::Uniform
|
||
}
|
||
}
|
||
|
||
impl KNNWeightFunction {
|
||
fn calc_weights(&self, distances: Vec<f64>) -> std::vec::Vec<f64> {
|
||
match *self {
|
||
KNNWeightFunction::Distance => {
|
||
// if there are any points that has zero distance from one or more training points,
|
||
// those training points are weighted as 1.0 and the other points as 0.0
|
||
if distances.iter().any(|&e| e == 0f64) {
|
||
distances
|
||
.iter()
|
||
.map(|e| if *e == 0f64 { 1f64 } else { 0f64 })
|
||
.collect()
|
||
} else {
|
||
distances.iter().map(|e| 1f64 / *e).collect()
|
||
}
|
||
}
|
||
KNNWeightFunction::Uniform => vec![1f64; distances.len()],
|
||
}
|
||
}
|
||
}
|