allow for sparse predictions

bump version tp 0.4.9
fix LASSO (#346 )
2026-02-09 13:25:50 +01:00 · 2026-01-09 06:14:44 +00:00 · 2025-12-05 17:49:07 +09:00 · 2025-11-29 02:54:35 +00:00 · 2025-11-29 02:46:14 +00:00 · 2025-11-28 12:15:43 +09:00
69 changed files with 6190 additions and 1444 deletions
@@ -2,6 +2,5 @@
 # the repo. Unless a later match takes precedence,
 # Developers in this list will be requested for
 # review when someone opens a pull request.
-*       @VolodymyrOrlov
 *       @morenol
 *       @Mec-iS
@@ -50,9 +50,9 @@ $ rust-code-analysis-cli -p src/algorithm/neighbour/fastpair.rs --ls 22 --le 213

 1. After a PR is opened maintainers are notified
 2. Probably changes will be required to comply with the workflow, these commands are run automatically and all tests shall pass:
-    * **Coverage** (optional): `tarpaulin` is used with command `cargo tarpaulin --out Lcov --all-features -- --test-threads 1`
    * **Formatting**: run `rustfmt src/*.rs` to apply automatic formatting
    * **Linting**: `clippy` is used with command `cargo clippy --all-features -- -Drust-2018-idioms -Dwarnings`
+    * **Coverage** (optional): `tarpaulin` is used with command `cargo tarpaulin --out Lcov --all-features -- --test-threads 1`
    * **Testing**: multiple test pipelines are run for different targets
 3. When everything is OK, code is merged.

@@ -19,59 +19,37 @@ jobs:
            { os: "ubuntu", target: "i686-unknown-linux-gnu" },
            { os: "ubuntu", target: "wasm32-unknown-unknown" },
            { os: "macos", target: "aarch64-apple-darwin" },
-            { os: "ubuntu", target: "wasm32-wasi" },
          ]
    env:
      TZ: "/usr/share/zoneinfo/your/location"
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
      - name: Cache .cargo and target
-        uses: actions/cache@v2
+        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo
            ./target
          key: ${{ runner.os }}-cargo-${{ matrix.platform.target }}-${{ hashFiles('**/Cargo.toml') }}
-          restore-keys: ${{ runner.os }}-cargo-${{ matrix.platform.target }}-${{ hashFiles('**/Cargo.toml') }}
+          restore-keys: ${{ runner.os }}-cargo-${{ matrix.platform.target }}
      - name: Install Rust toolchain
-        uses: actions-rs/toolchain@v1
+        uses: dtolnay/rust-toolchain@stable
        with:
-          toolchain: stable
-          target: ${{ matrix.platform.target }}
-          profile: minimal
-          default: true
+          targets: ${{ matrix.platform.target }}
      - name: Install test runner for wasm
        if: matrix.platform.target == 'wasm32-unknown-unknown'
        run: curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh
-      - name: Install test runner for wasi
-        if: matrix.platform.target == 'wasm32-wasi'
-        run: curl https://wasmtime.dev/install.sh -sSf | bash
      - name: Stable Build with all features
-        uses: actions-rs/cargo@v1
-        with:
-          command: build
-          args: --all-features --target ${{ matrix.platform.target }}
+        run: cargo build --all-features --target ${{ matrix.platform.target }}
      - name: Stable Build without features
-        uses: actions-rs/cargo@v1
-        with:
-          command: build
-          args: --target ${{ matrix.platform.target }}
+        run: cargo build --target ${{ matrix.platform.target }}
      - name: Tests
        if: matrix.platform.target == 'x86_64-unknown-linux-gnu' || matrix.platform.target == 'x86_64-pc-windows-msvc' || matrix.platform.target == 'aarch64-apple-darwin'
-        uses: actions-rs/cargo@v1
-        with:
-          command: test
-          args: --all-features
+        run: cargo test --all-features
      - name: Tests in WASM
        if: matrix.platform.target == 'wasm32-unknown-unknown'
        run: wasm-pack test --node -- --all-features
-      - name: Tests in WASI
-        if: matrix.platform.target == 'wasm32-wasi'
-        run: |
-          export WASMTIME_HOME="$HOME/.wasmtime"
-          export PATH="$WASMTIME_HOME/bin:$PATH"
-          cargo install cargo-wasi && cargo wasi test
-
+  
  check_features:
    runs-on: "${{ matrix.platform.os }}-latest"
    strategy:
@@ -81,24 +59,16 @@ jobs:
    env:
      TZ: "/usr/share/zoneinfo/your/location"
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
      - name: Cache .cargo and target
-        uses: actions/cache@v2
+        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo
            ./target
-          key: ${{ runner.os }}-cargo-features-${{ hashFiles('**/Cargo.toml') }}
-          restore-keys: ${{ runner.os }}-cargo-features-${{ hashFiles('**/Cargo.toml') }}
+          key: ${{ runner.os }}-cargo-features-${{ hashFiles('Cargo.toml') }}
+          restore-keys: ${{ runner.os }}-cargo-features
      - name: Install Rust toolchain
-        uses: actions-rs/toolchain@v1
-        with:
-          toolchain: stable
-          target: ${{ matrix.platform.target }}
-          profile: minimal
-          default: true
+        uses: dtolnay/rust-toolchain@stable
      - name: Stable Build
-        uses: actions-rs/cargo@v1
-        with:
-          command: build
-          args: --no-default-features ${{ matrix.features }}
+        run: cargo build --no-default-features ${{ matrix.features }}
@@ -12,33 +12,22 @@ jobs:
    env:
      TZ: "/usr/share/zoneinfo/your/location"
    steps:
-      - uses: actions/checkout@v2
+      - uses: actions/checkout@v4
      - name: Cache .cargo
-        uses: actions/cache@v2
+        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo
            ./target
-          key: ${{ runner.os }}-coverage-cargo-${{ hashFiles('**/Cargo.toml') }}
-          restore-keys: ${{ runner.os }}-coverage-cargo-${{ hashFiles('**/Cargo.toml') }}
+          key: ${{ runner.os }}-coverage-cargo-${{ hashFiles('Cargo.toml') }}
+          restore-keys: ${{ runner.os }}-coverage-cargo
      - name: Install Rust toolchain
-        uses: actions-rs/toolchain@v1
-        with:
-          toolchain: nightly
-          profile: minimal
-          default: true
+        uses: dtolnay/rust-toolchain@nightly
      - name: Install cargo-tarpaulin
-        uses: actions-rs/install@v0.1
-        with:
-          crate: cargo-tarpaulin
-          version: latest
-          use-tool-cache: true
+        run: cargo install cargo-tarpaulin
      - name: Run cargo-tarpaulin
-        uses: actions-rs/cargo@v1
-        with:
-          command: tarpaulin
-          args: --out Lcov --all-features -- --test-threads 1
+        run: cargo tarpaulin --out Lcov --all-features -- --test-threads 1
      - name: Upload to codecov.io
-        uses: codecov/codecov-action@v2
+        uses: codecov/codecov-action@v4
        with:
          fail_ci_if_error: false
@@ -6,36 +6,27 @@ on:
  pull_request:
    branches: [ development ]

+
 jobs:
  lint:
    runs-on: ubuntu-latest
    env:
      TZ: "/usr/share/zoneinfo/your/location"
    steps:
-      - uses: actions/checkout@v2
+      - uses: actions/checkout@v4
      - name: Cache .cargo and target
-        uses: actions/cache@v2
+        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo
            ./target
-          key: ${{ runner.os }}-lint-cargo-${{ hashFiles('**/Cargo.toml') }}
-          restore-keys: ${{ runner.os }}-lint-cargo-${{ hashFiles('**/Cargo.toml') }}
+          key: ${{ runner.os }}-lint-cargo-${{ hashFiles('Cargo.toml') }}
+          restore-keys: ${{ runner.os }}-lint-cargo
      - name: Install Rust toolchain
-        uses: actions-rs/toolchain@v1
+        uses: dtolnay/rust-toolchain@stable
        with:
-          toolchain: stable
-          profile: minimal
-          default: true
-      - run: rustup component add rustfmt
-      - name: Check formt
-        uses: actions-rs/cargo@v1
-        with:
-          command: fmt
-          args: --all -- --check
-      - run: rustup component add clippy
+          components: rustfmt, clippy
+      - name: Check format
+        run: cargo fmt --all -- --check
      - name: Run clippy
-        uses: actions-rs/cargo@v1
-        with:
-          command: clippy
-          args: --all-features -- -Drust-2018-idioms -Dwarnings
+        run: cargo clippy --all-features -- -Drust-2018-idioms -Dwarnings
@@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [0.4.8] - 2025-11-29
+- WARNING: Breaking changes!
+- `LassoParameters` and `LassoSearchParameters` have a new field `fit_intercept`. When it is set to false, the `beta_0` term in the formula will be forced to zero, and `intercept` field in `Lasso` will be set to `None`.
+
+
 ## [0.4.0] - 2023-04-05

 ## Added
@@ -0,0 +1,41 @@
+cff-version: 1.2.0
+message: "If this software contributes to published work, please cite smartcore."
+type: software
+title: "smartcore: Machine Learning in Rust"
+abstract: "smartcore is a comprehensive machine learning and numerical computing library for Rust, offering supervised and unsupervised algorithms, model evaluation tools, and linear algebra abstractions, with optional ndarray integration." [web:5][web:3]
+repository-code: "https://github.com/smartcorelib/smartcore" [web:5]
+url: "https://github.com/smartcorelib" [web:3]
+license: "MIT" [web:13]
+keywords:
+  - Rust
+  - machine learning
+  - numerical computing
+  - linear algebra
+  - classification
+  - regression
+  - clustering
+  - SVM
+  - Random Forest
+  - XGBoost [web:5]
+authors:
+  - name: "smartcore Developers" [web:7]
+  - name: "Lorenzo (contributor)" [web:16]
+  - name: "Community contributors" [web:7]
+version: "0.4.2" [attached_file:1]
+date-released: "2025-09-14" [attached_file:1]
+preferred-citation:
+  type: software
+  title: "smartcore: Machine Learning in Rust"
+  authors:
+    - name: "smartcore Developers" [web:7]
+  url: "https://github.com/smartcorelib" [web:3]
+  repository-code: "https://github.com/smartcorelib/smartcore" [web:5]
+  license: "MIT" [web:13]
+references:
+  - type: manual
+    title: "smartcore Documentation"
+    url: "https://docs.rs/smartcore" [web:5]
+  - type: webpage
+    title: "smartcore Homepage"
+    url: "https://github.com/smartcorelib" [web:3]
+notes: "For development features, see the docs.rs page and the repository README; SmartCore includes algorithms such as SVM, Random Forest, K-Means, PCA, DBSCAN, and XGBoost." [web:5]
@@ -2,7 +2,7 @@
 name = "smartcore"
 description = "Machine Learning in Rust."
 homepage = "https://smartcorelib.org"
-version = "0.4.0"
+version = "0.4.9"
 authors = ["smartcore Developers"]
 edition = "2021"
 license = "Apache-2.0"
@@ -28,6 +28,7 @@ num = "0.4"
 rand = { version = "0.8.5", default-features = false, features = ["small_rng"] }
 rand_distr = { version = "0.4", optional = true }
 serde = { version = "1", features = ["derive"], optional = true }
+ordered-float = "5.1.0"

 [target.'cfg(not(target_arch = "wasm32"))'.dependencies]
 typetag = { version = "0.2", optional = true }
@@ -48,7 +49,7 @@ getrandom = { version = "0.2.8", optional = true }
 wasm-bindgen-test = "0.3"

 [dev-dependencies]
-itertools = "0.12.0"
+itertools = "0.13.0"
 serde_json = "1.0"
 bincode = "1.3.1"

@@ -16,6 +16,132 @@
 </p>

 -----
-[![CI](https://github.com/smartcorelib/smartcore/actions/workflows/ci.yml/badge.svg)](https://github.com/smartcorelib/smartcore/actions/workflows/ci.yml)
+[![CI](https://github.com/smartcorelib/smartcore/actions/workflows/ci.yml/badge.svg)](https://github.com/smartcorelib/smartcore/actions/workflows/ci.yml) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17219259.svg)](https://doi.org/10.5281/zenodo.17219259)

-To start getting familiar with the new smartcore v0.3 API, there is now available a [**Jupyter Notebook environment repository**](https://github.com/smartcorelib/smartcore-jupyter). Please see instructions there, contributions welcome see [CONTRIBUTING](.github/CONTRIBUTING.md).
+To start getting familiar with the new smartcore v0.4 API, there is now available a [**Jupyter Notebook environment repository**](https://github.com/smartcorelib/smartcore-jupyter). Please see instructions there, contributions welcome see [CONTRIBUTING](.github/CONTRIBUTING.md).
+
+smartcore is a fast, ergonomic machine learning library for Rust, covering classical supervised and unsupervised methods with a modular linear algebra abstraction and optional ndarray support. It aims to provide production-friendly APIs, strong typing, and good defaults while remaining flexible for research and experimentation.
+
+
+## Highlights
+
+- Broad algorithm coverage: linear models, tree-based methods, ensembles, SVMs, neighbors, clustering, decomposition, and preprocessing.
+- Strong linear algebra traits with optional ndarray integration for users who prefer array-first workflows.
+- WASM-first defaults with attention to portability; features such as serde and datasets are opt-in.
+- Practical utilities for model selection, evaluation, readers (CSV), dataset generators, and built-in sample datasets.
+
+
+## Install
+
+Add to Cargo.toml:
+
+```toml
+[dependencies]
+smartcore = "^0.4.3"
+```
+
+For the latest development branch:
+
+```toml
+[dependencies]
+smartcore = { git = "https://github.com/smartcorelib/smartcore", branch = "development" }
+```
+
+Optional features (examples):
+
+- datasets
+- serde
+- ndarray-bindings (deprecated in favor of ndarray-only support per recent changes)
+
+Check Cargo.toml for available features and compatibility notes.
+
+## Quick start
+
+Here is a minimal example fitting a KNN classifier from native Rust vectors using DenseMatrix:
+
+```rust
+use smartcore::linalg::basic::matrix::DenseMatrix;
+use smartcore::neighbors::knn_classifier::KNNClassifier;
+
+// Turn vector slices into a matrix
+let x = DenseMatrix::from_2d_array(&[
+    &[1., 2.],
+    &[3., 4.],
+    &[5., 6.],
+    &[7., 8.],
+    &[9., 10.],
+]).unwrap;
+
+// Class labels
+let y = vec![2, 2, 2, 3, 3];
+
+// Train classifier
+let knn = KNNClassifier::fit(&x, &y, Default::default()).unwrap();
+
+// Predict
+let yhat = knn.predict(&x).unwrap();
+```
+
+This example mirrors the “First Example” section of the crate docs and demonstrates smartcore’s ergonomic API surface.
+
+## Algorithms
+
+smartcore organizes algorithms into clear modules with consistent traits:
+
+- Clustering: K-Means, DBSCAN, agglomerative (including single-linkage), with K-Means++ initialization and utilities.
+- Matrix decomposition: SVD, EVD, Cholesky, LU, QR, plus related linear algebra helpers.
+- Linear models: OLS, Ridge, Lasso, ElasticNet, Logistic Regression.
+- Ensemble and tree-based: Random Forest (classifier and regressor), Extra Trees, shared reusable components across trees and forests.
+- SVM: SVC/SVR with kernel enum support and multiclass extensions.
+- Neighbors: KNN classification and regression with distance metrics and fast selection helpers.
+- Naive Bayes: Gaussian, Bernoulli, Categorical, Multinomial.
+- Preprocessing: encoders, split utilities, and common transforms.
+- Model selection and metrics: K-fold, search parameters, and evaluation utilities.
+
+Recent refactors emphasize reusable components in trees/forests and expanded multiclass SVM capabilities. XGBoost-style regression and single-linkage clustering have been added. See CHANGELOG for API changes and migration notes.
+
+## Data access and readers
+
+- CSV readers: Read matrices from CSV with configurable delimiter and header rows, with helpful error messages and testing utilities (including non-IO reader abstractions).
+- Dataset generators: make_blobs, make_circles, make_moons for quick experiments.
+- Built-in datasets (feature-gated): digits, diabetes, breast cancer, boston, with serialization utilities to persist or refresh .xy bundles.
+
+
+## WebAssembly and portability
+
+smartcore adopts a WASM/WASI-first posture in defaults to ease browser and embedded deployments. Some file-system operations are restricted in wasm targets; tests and IO utilities are structured to avoid unsupported calls where possible. Enable features like serde selectively to minimize footprint. Consult module-level docs and CHANGELOG for target-specific caveats.
+
+## Notebooks
+
+A curated set of Jupyter notebooks is available via the companion repository to explore smartcore interactively. To run locally, use EVCXR to enable Rust notebooks. This is the recommended path to quickly experiment with the v0.4 API.
+
+## Roadmap and recent changes
+
+- Trait-system refactor, fewer structs and more object-safe traits, large codebase reorganization.
+- Move to Rust 2021 edition and cleanup of duplicate code paths.
+- Seeds and deterministic controls across algorithms using RNG plumbing.
+- Search parameter API for hyperparameter exploration in K-Means and SVM families.
+- Tree and forest components refactored for reuse; Extra Trees added.
+- SVM multiclass support; SVR kernel enum and related improvements.
+- XGBoost-style regression introduced; single-linkage clustering implemented.
+
+See CHANGELOG.md for precise details, deprecations, and breaking changes. Some features like nalgebra-bindings have been dropped in favor of ndarray-only paths. Default features are tuned for WASM/WASI builds; enable serde/datasets as needed.
+
+## Contributing
+
+Contributions are welcome:
+
+- Open an issue describing the change and link it in the PR.
+- Keep PRs in sync with the development branch and ensure tests pass on stable Rust.
+- Provide or update tests; run clippy and apply formatting. Coverage and linting are part of the workflow.
+- Use the provided PR and issue templates to describe behavior changes, new features, and expectations.
+
+If adding IO, prefer abstractions that make non-IO testing straightforward (see readers/iotesting). For datasets, keep serialization helpers in tests gated appropriately to avoid unintended file writes in wasm targets.
+
+## License
+
+smartcore is open source under a permissive license; see Cargo.toml and LICENSE for details. The crate metadata identifies “smartcore Developers” as authors; community contributions are credited via Git history and releases.
+
+## Acknowledgments
+
+smartcore’s design incorporates well-known ML patterns while staying idiomatic to Rust. Thanks to all contributors who have helped expand algorithms, improve docs, modernize traits, and harden the codebase for production.
@@ -124,7 +124,7 @@ impl<T: Debug + PartialEq, D: Distance<T>> CoverTree<T, D> {
        current_cover_set.push((d, &self.root));

        let mut heap = HeapSelection::with_capacity(k);
-        heap.add(std::f64::MAX);
+        heap.add(f64::MAX);

        let mut empty_heap = true;
        if !self.identical_excluded || self.get_data_value(self.root.idx) != p {
@@ -145,7 +145,7 @@ impl<T: Debug + PartialEq, D: Distance<T>> CoverTree<T, D> {
                    }

                    let upper_bound = if empty_heap {
-                        std::f64::INFINITY
+                        f64::INFINITY
                    } else {
                        *heap.peek()
                    };
@@ -291,7 +291,7 @@ impl<T: Debug + PartialEq, D: Distance<T>> CoverTree<T, D> {
        } else {
            let max_dist = self.max(point_set);
            let next_scale = (max_scale - 1).min(self.get_scale(max_dist));
-            if next_scale == std::i64::MIN {
+            if next_scale == i64::MIN {
                let mut children: Vec<Node> = Vec::new();
                let mut leaf = self.new_leaf(p);
                children.push(leaf);
@@ -435,7 +435,7 @@ impl<T: Debug + PartialEq, D: Distance<T>> CoverTree<T, D> {

    fn get_scale(&self, d: f64) -> i64 {
        if d == 0f64 {
-            std::i64::MIN
+            i64::MIN
        } else {
            (self.inv_log_base * d.ln()).ceil() as i64
        }
@@ -52,10 +52,8 @@ pub struct FastPair<'a, T: RealNumber + FloatNumber, M: Array2<T>> {
 }

 impl<'a, T: RealNumber + FloatNumber, M: Array2<T>> FastPair<'a, T, M> {
-    ///
    /// Constructor
-    /// Instantiate and inizialise the algorithm
-    ///
+    /// Instantiate and initialize the algorithm
    pub fn new(m: &'a M) -> Result<Self, Failed> {
        if m.shape().0 < 3 {
            return Err(Failed::because(
@@ -74,10 +72,8 @@ impl<'a, T: RealNumber + FloatNumber, M: Array2<T>> FastPair<'a, T, M> {
        Ok(init)
    }

-    ///
    /// Initialise `FastPair` by passing a `Array2`.
    /// Build a FastPairs data-structure from a set of (new) points.
-    ///
    fn init(&mut self) {
        // basic measures
        let len = self.samples.shape().0;
@@ -158,9 +154,7 @@ impl<'a, T: RealNumber + FloatNumber, M: Array2<T>> FastPair<'a, T, M> {
        self.neighbours = neighbours;
    }

-    ///
    /// Find closest pair by scanning list of nearest neighbors.
-    ///
    #[allow(dead_code)]
    pub fn closest_pair(&self) -> PairwiseDistance<T> {
        let mut a = self.neighbours[0]; // Start with first point
@@ -179,6 +173,21 @@ impl<'a, T: RealNumber + FloatNumber, M: Array2<T>> FastPair<'a, T, M> {
        }
    }

+    ///
+    /// Return order dissimilarities from closest to furthest
+    ///
+    #[allow(dead_code)]
+    pub fn ordered_pairs(&self) -> std::vec::IntoIter<&PairwiseDistance<T>> {
+        // improvement: implement this to return `impl Iterator<Item = &PairwiseDistance<T>>`
+        // need to implement trait `Iterator` for `Vec<&PairwiseDistance<T>>`
+        let mut distances = self
+            .distances
+            .values()
+            .collect::<Vec<&PairwiseDistance<T>>>();
+        distances.sort_by(|a, b| a.partial_cmp(b).unwrap());
+        distances.into_iter()
+    }
+
    //
    // Compute distances from input to all other points in data-structure.
    // input is the row index of the sample matrix
@@ -217,10 +226,10 @@ mod tests_fastpair {
    use super::*;
    use crate::linalg::basic::{arrays::Array, matrix::DenseMatrix};

-    ///
    /// Brute force algorithm, used only for comparison and testing
-    ///
-    pub fn closest_pair_brute(fastpair: &FastPair<f64, DenseMatrix<f64>>) -> PairwiseDistance<f64> {
+    pub fn closest_pair_brute(
+        fastpair: &FastPair<'_, f64, DenseMatrix<f64>>,
+    ) -> PairwiseDistance<f64> {
        use itertools::Itertools;
        let m = fastpair.samples.shape().0;

@@ -594,4 +603,103 @@ mod tests_fastpair {

        assert_eq!(closest, min_dissimilarity);
    }
+
+    #[test]
+    fn fastpair_ordered_pairs() {
+        let x = DenseMatrix::<f64>::from_2d_array(&[
+            &[5.1, 3.5, 1.4, 0.2],
+            &[4.9, 3.0, 1.4, 0.2],
+            &[4.7, 3.2, 1.3, 0.2],
+            &[4.6, 3.1, 1.5, 0.2],
+            &[5.0, 3.6, 1.4, 0.2],
+            &[5.4, 3.9, 1.7, 0.4],
+            &[4.9, 3.1, 1.5, 0.1],
+            &[7.0, 3.2, 4.7, 1.4],
+            &[6.4, 3.2, 4.5, 1.5],
+            &[6.9, 3.1, 4.9, 1.5],
+            &[5.5, 2.3, 4.0, 1.3],
+            &[6.5, 2.8, 4.6, 1.5],
+            &[4.6, 3.4, 1.4, 0.3],
+            &[5.0, 3.4, 1.5, 0.2],
+            &[4.4, 2.9, 1.4, 0.2],
+        ])
+        .unwrap();
+        let fastpair = FastPair::new(&x).unwrap();
+
+        let ordered = fastpair.ordered_pairs();
+
+        let mut previous: f64 = -1.0;
+        for p in ordered {
+            if previous == -1.0 {
+                previous = p.distance.unwrap();
+            } else {
+                let current = p.distance.unwrap();
+                assert!(current >= previous);
+                previous = current;
+            }
+        }
+    }
+
+    #[test]
+    fn test_empty_set() {
+        let empty_matrix = DenseMatrix::<f64>::zeros(0, 0);
+        let result = FastPair::new(&empty_matrix);
+        assert!(result.is_err());
+        if let Err(e) = result {
+            assert_eq!(
+                e,
+                Failed::because(FailedError::FindFailed, "min number of rows should be 3")
+            );
+        }
+    }
+
+    #[test]
+    fn test_single_point() {
+        let single_point = DenseMatrix::from_2d_array(&[&[1.0, 2.0, 3.0]]).unwrap();
+        let result = FastPair::new(&single_point);
+        assert!(result.is_err());
+        if let Err(e) = result {
+            assert_eq!(
+                e,
+                Failed::because(FailedError::FindFailed, "min number of rows should be 3")
+            );
+        }
+    }
+
+    #[test]
+    fn test_two_points() {
+        let two_points = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0]]).unwrap();
+        let result = FastPair::new(&two_points);
+        assert!(result.is_err());
+        if let Err(e) = result {
+            assert_eq!(
+                e,
+                Failed::because(FailedError::FindFailed, "min number of rows should be 3")
+            );
+        }
+    }
+
+    #[test]
+    fn test_three_identical_points() {
+        let identical_points =
+            DenseMatrix::from_2d_array(&[&[1.0, 1.0], &[1.0, 1.0], &[1.0, 1.0]]).unwrap();
+        let result = FastPair::new(&identical_points);
+        assert!(result.is_ok());
+        let fastpair = result.unwrap();
+        let closest_pair = fastpair.closest_pair();
+        assert_eq!(closest_pair.distance, Some(0.0));
+    }
+
+    #[test]
+    fn test_result_unwrapping() {
+        let valid_matrix =
+            DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0], &[5.0, 6.0], &[7.0, 8.0]])
+                .unwrap();
+
+        let result = FastPair::new(&valid_matrix);
+        assert!(result.is_ok());
+
+        // This should not panic
+        let _fastpair = result.unwrap();
+    }
 }
@@ -61,7 +61,7 @@ impl<T, D: Distance<T>> LinearKNNSearch<T, D> {

        for _ in 0..k {
            heap.add(KNNPoint {
-                distance: std::f64::INFINITY,
+                distance: f64::INFINITY,
                index: None,
            });
        }
@@ -215,7 +215,7 @@ mod tests {
        };

        let point_inf = KNNPoint {
-            distance: std::f64::INFINITY,
+            distance: f64::INFINITY,
            index: Some(3),
        };

@@ -1,4 +1,4 @@
-#![allow(clippy::ptr_arg)]
+#![allow(clippy::ptr_arg, clippy::needless_range_loop)]
 //! # Nearest Neighbors Search Algorithms and Data Structures
 //!
 //! Nearest neighbor search is a basic computational tool that is particularly relevant to machine learning,
@@ -39,6 +39,8 @@ use crate::numbers::basenum::Number;
 use serde::{Deserialize, Serialize};

 pub(crate) mod bbd_tree;
+/// a variant of fastpair using cosine distance
+pub mod cosinepair;
 /// tree data structure for fast nearest neighbor search
 pub mod cover_tree;
 /// fastpair closest neighbour algorithm
@@ -133,7 +133,7 @@ mod tests {
    #[test]
    fn test_add1() {
        let mut heap = HeapSelection::with_capacity(3);
-        heap.add(std::f64::INFINITY);
+        heap.add(f64::INFINITY);
        heap.add(-5f64);
        heap.add(4f64);
        heap.add(-1f64);
@@ -151,7 +151,7 @@ mod tests {
    #[test]
    fn test_add2() {
        let mut heap = HeapSelection::with_capacity(3);
-        heap.add(std::f64::INFINITY);
+        heap.add(f64::INFINITY);
        heap.add(0.0);
        heap.add(8.4852);
        heap.add(5.6568);
@@ -1,8 +1,10 @@
 use num_traits::Num;

 pub trait QuickArgSort {
+    #[allow(dead_code)]
    fn quick_argsort_mut(&mut self) -> Vec<usize>;

+    #[allow(dead_code)]
    fn quick_argsort(&self) -> Vec<usize>;
 }

@@ -0,0 +1,317 @@
+//! # Agglomerative Hierarchical Clustering
+//!
+//! Agglomerative clustering is a "bottom-up" hierarchical clustering method. It works by placing each data point in its own cluster and then successively merging the two most similar clusters until a stopping criterion is met. This process creates a tree-based hierarchy of clusters known as a dendrogram.
+//!
+//! The similarity of two clusters is determined by a **linkage criterion**. This implementation uses **single-linkage**, where the distance between two clusters is defined as the minimum distance between any single point in the first cluster and any single point in the second cluster. The distance between points is the standard Euclidean distance.
+//!
+//! The algorithm first builds the full hierarchy of `N-1` merges. To obtain a specific number of clusters, `n_clusters`, the algorithm then effectively "cuts" the dendrogram at the point where `n_clusters` remain.
+//!
+//! ## Example:
+//!
+//! ```
+//! use smartcore::linalg::basic::matrix::DenseMatrix;
+//! use smartcore::cluster::agglomerative::{AgglomerativeClustering, AgglomerativeClusteringParameters};
+//!
+//! // A dataset with 2 distinct groups of points.
+//! let x = DenseMatrix::from_2d_array(&[
+//!         &[0.0, 0.0], &[1.0, 1.0], &[0.5, 0.5], // Cluster A
+//!         &[10.0, 10.0], &[11.0, 11.0], &[10.5, 10.5], // Cluster B
+//!     ]).unwrap();
+//!
+//! // Set parameters to find 2 clusters.
+//! let parameters = AgglomerativeClusteringParameters::default().with_n_clusters(2);
+//!
+//! // Fit the model to the data.
+//! let clustering = AgglomerativeClustering::<f64, usize, DenseMatrix<f64>, Vec<usize>>::fit(&x, parameters).unwrap();
+//!
+//! // Get the cluster assignments.
+//! let labels = clustering.labels; // e.g., [0, 0, 0, 1, 1, 1]
+//! ```
+//!
+//! ## References:
+//!
+//! * ["An Introduction to Statistical Learning", James G., Witten D., Hastie T., Tibshirani R., 10.3.2 Hierarchical Clustering](http://faculty.marshall.usc.edu/gareth-james/ISL/)
+//! * ["The Elements of Statistical Learning", Hastie T., Tibshirani R., Friedman J., 14.3.12 Hierarchical Clustering](https://hastie.su.domains/ElemStatLearn/)
+
+use std::collections::HashMap;
+use std::marker::PhantomData;
+
+use crate::api::UnsupervisedEstimator;
+use crate::error::{Failed, FailedError};
+use crate::linalg::basic::arrays::{Array1, Array2};
+use crate::numbers::basenum::Number;
+
+/// Parameters for the Agglomerative Clustering algorithm.
+#[derive(Debug, Clone, Copy)]
+pub struct AgglomerativeClusteringParameters {
+    /// The number of clusters to find.
+    pub n_clusters: usize,
+}
+
+impl AgglomerativeClusteringParameters {
+    /// Sets the number of clusters.
+    ///
+    /// # Arguments
+    /// * `n_clusters` - The desired number of clusters.
+    pub fn with_n_clusters(mut self, n_clusters: usize) -> Self {
+        self.n_clusters = n_clusters;
+        self
+    }
+}
+
+impl Default for AgglomerativeClusteringParameters {
+    fn default() -> Self {
+        AgglomerativeClusteringParameters { n_clusters: 2 }
+    }
+}
+
+/// Agglomerative Clustering model.
+///
+/// This implementation uses single-linkage clustering, which is mathematically
+/// equivalent to finding the Minimum Spanning Tree (MST) of the data points.
+/// The core logic is an efficient implementation of Kruskal's algorithm, which
+/// processes all pairwise distances in increasing order and uses a Disjoint
+/// Set Union (DSU) data structure to track cluster membership.
+#[derive(Debug)]
+pub struct AgglomerativeClustering<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>> {
+    /// The cluster label assigned to each sample.
+    pub labels: Vec<usize>,
+    _phantom_tx: PhantomData<TX>,
+    _phantom_ty: PhantomData<TY>,
+    _phantom_x: PhantomData<X>,
+    _phantom_y: PhantomData<Y>,
+}
+
+impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>> AgglomerativeClustering<TX, TY, X, Y> {
+    /// Fits the agglomerative clustering model to the data.
+    ///
+    /// # Arguments
+    /// * `data` - A reference to the input data matrix.
+    /// * `parameters` - The parameters for the clustering algorithm, including `n_clusters`.
+    ///
+    /// # Returns
+    /// A `Result` containing the fitted model with cluster labels, or an error if
+    pub fn fit(data: &X, parameters: AgglomerativeClusteringParameters) -> Result<Self, Failed> {
+        let (num_samples, _) = data.shape();
+        let n_clusters = parameters.n_clusters;
+        if n_clusters > num_samples {
+            return Err(Failed::because(
+                FailedError::ParametersError,
+                &format!(
+                    "n_clusters: {n_clusters} cannot be greater than n_samples: {num_samples}"
+                ),
+            ));
+        }
+
+        let mut distance_pairs = Vec::new();
+        for i in 0..num_samples {
+            for j in (i + 1)..num_samples {
+                let distance: f64 = data
+                    .get_row(i)
+                    .iterator(0)
+                    .zip(data.get_row(j).iterator(0))
+                    .map(|(&a, &b)| (a.to_f64().unwrap() - b.to_f64().unwrap()).powi(2))
+                    .sum::<f64>();
+
+                distance_pairs.push((distance, i, j));
+            }
+        }
+        distance_pairs.sort_unstable_by(|a, b| b.0.partial_cmp(&a.0).unwrap());
+        let mut parent = HashMap::new();
+        let mut children = HashMap::new();
+        for i in 0..num_samples {
+            parent.insert(i, i);
+            children.insert(i, vec![i]);
+        }
+
+        let mut merge_history = Vec::new();
+        let num_merges_needed = num_samples - 1;
+
+        while merge_history.len() < num_merges_needed {
+            let (_, p1, p2) = distance_pairs.pop().unwrap();
+
+            let root1 = parent[&p1];
+            let root2 = parent[&p2];
+
+            if root1 != root2 {
+                let root2_children = children.remove(&root2).unwrap();
+                for child in root2_children.iter() {
+                    parent.insert(*child, root1);
+                }
+                let root1_children = children.get_mut(&root1).unwrap();
+                root1_children.extend(root2_children);
+                merge_history.push((root1, root2));
+            }
+        }
+
+        let mut clusters = HashMap::new();
+        let mut assignments = HashMap::new();
+
+        for i in 0..num_samples {
+            clusters.insert(i, vec![i]);
+            assignments.insert(i, i);
+        }
+
+        let merges_to_apply = num_samples - n_clusters;
+
+        for (root1, root2) in merge_history[0..merges_to_apply].iter() {
+            let root1_cluster = assignments[root1];
+            let root2_cluster = assignments[root2];
+
+            let root2_assignments = clusters.remove(&root2_cluster).unwrap();
+            for assignment in root2_assignments.iter() {
+                assignments.insert(*assignment, root1_cluster);
+            }
+            let root1_assignments = clusters.get_mut(&root1_cluster).unwrap();
+            root1_assignments.extend(root2_assignments);
+        }
+
+        let mut labels: Vec<usize> = (0..num_samples).map(|_| 0).collect();
+        let mut cluster_keys: Vec<&usize> = clusters.keys().collect();
+        cluster_keys.sort();
+        for (i, key) in cluster_keys.into_iter().enumerate() {
+            for index in clusters[key].iter() {
+                labels[*index] = i;
+            }
+        }
+        Ok(AgglomerativeClustering {
+            labels,
+            _phantom_tx: PhantomData,
+            _phantom_ty: PhantomData,
+            _phantom_x: PhantomData,
+            _phantom_y: PhantomData,
+        })
+    }
+}
+
+impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    UnsupervisedEstimator<X, AgglomerativeClusteringParameters>
+    for AgglomerativeClustering<TX, TY, X, Y>
+{
+    fn fit(x: &X, parameters: AgglomerativeClusteringParameters) -> Result<Self, Failed> {
+        AgglomerativeClustering::fit(x, parameters)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use crate::linalg::basic::matrix::DenseMatrix;
+    use std::collections::HashSet;
+
+    use super::*;
+
+    #[test]
+    fn test_simple_clustering() {
+        // Two distinct clusters, far apart.
+        let data = vec![
+            0.0, 0.0, 1.0, 1.0, 0.5, 0.5, // Cluster A
+            10.0, 10.0, 11.0, 11.0, 10.5, 10.5, // Cluster B
+        ];
+        let matrix = DenseMatrix::new(6, 2, data, false).unwrap();
+        let parameters = AgglomerativeClusteringParameters::default().with_n_clusters(2);
+        // Using f64 for TY as usize doesn't satisfy the Number trait bound.
+        let clustering = AgglomerativeClustering::<f64, f64, DenseMatrix<f64>, Vec<f64>>::fit(
+            &matrix, parameters,
+        )
+        .unwrap();
+
+        let labels = clustering.labels;
+
+        // Check that all points in the first group have the same label.
+        let first_group_label = labels[0];
+        assert!(labels[0..3].iter().all(|&l| l == first_group_label));
+
+        // Check that all points in the second group have the same label.
+        let second_group_label = labels[3];
+        assert!(labels[3..6].iter().all(|&l| l == second_group_label));
+
+        // Check that the two groups have different labels.
+        assert_ne!(first_group_label, second_group_label);
+    }
+
+    #[test]
+    fn test_four_clusters() {
+        // Four distinct clusters in the corners of a square.
+        let data = vec![
+            0.0, 0.0, 1.0, 1.0, // Cluster A
+            100.0, 100.0, 101.0, 101.0, // Cluster B
+            0.0, 100.0, 1.0, 101.0, // Cluster C
+            100.0, 0.0, 101.0, 1.0, // Cluster D
+        ];
+        let matrix = DenseMatrix::new(8, 2, data, false).unwrap();
+        let parameters = AgglomerativeClusteringParameters::default().with_n_clusters(4);
+        let clustering = AgglomerativeClustering::<f64, f64, DenseMatrix<f64>, Vec<f64>>::fit(
+            &matrix, parameters,
+        )
+        .unwrap();
+
+        let labels = clustering.labels;
+
+        // Verify that there are exactly 4 unique labels produced.
+        let unique_labels: HashSet<usize> = labels.iter().cloned().collect();
+        assert_eq!(unique_labels.len(), 4);
+
+        // Verify that points within each original group were assigned the same cluster label.
+        let label_a = labels[0];
+        assert_eq!(label_a, labels[1]);
+
+        let label_b = labels[2];
+        assert_eq!(label_b, labels[3]);
+
+        let label_c = labels[4];
+        assert_eq!(label_c, labels[5]);
+
+        let label_d = labels[6];
+        assert_eq!(label_d, labels[7]);
+
+        // Verify that all four groups received different labels.
+        assert_ne!(label_a, label_b);
+        assert_ne!(label_a, label_c);
+        assert_ne!(label_a, label_d);
+        assert_ne!(label_b, label_c);
+        assert_ne!(label_b, label_d);
+        assert_ne!(label_c, label_d);
+    }
+
+    #[test]
+    fn test_n_clusters_equal_to_samples() {
+        let data = vec![0.0, 0.0, 5.0, 5.0, 10.0, 10.0];
+        let matrix = DenseMatrix::new(3, 2, data, false).unwrap();
+        let parameters = AgglomerativeClusteringParameters::default().with_n_clusters(3);
+        let clustering = AgglomerativeClustering::<f64, f64, DenseMatrix<f64>, Vec<f64>>::fit(
+            &matrix, parameters,
+        )
+        .unwrap();
+
+        // Each point should be its own cluster. Sorting makes the test deterministic.
+        let mut labels = clustering.labels;
+        labels.sort();
+        assert_eq!(labels, vec![0, 1, 2]);
+    }
+
+    #[test]
+    fn test_one_cluster() {
+        let data = vec![0.0, 0.0, 5.0, 5.0, 10.0, 10.0];
+        let matrix = DenseMatrix::new(3, 2, data, false).unwrap();
+        let parameters = AgglomerativeClusteringParameters::default().with_n_clusters(1);
+        let clustering = AgglomerativeClustering::<f64, f64, DenseMatrix<f64>, Vec<f64>>::fit(
+            &matrix, parameters,
+        )
+        .unwrap();
+
+        // All points should be in the same cluster.
+        assert_eq!(clustering.labels, vec![0, 0, 0]);
+    }
+
+    #[test]
+    fn test_error_on_too_many_clusters() {
+        let data = vec![0.0, 0.0, 5.0, 5.0];
+        let matrix = DenseMatrix::new(2, 2, data, false).unwrap();
+        let parameters = AgglomerativeClusteringParameters::default().with_n_clusters(3);
+        let result = AgglomerativeClustering::<f64, f64, DenseMatrix<f64>, Vec<f64>>::fit(
+            &matrix, parameters,
+        );
+
+        assert!(result.is_err());
+    }
+}
@@ -96,7 +96,7 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>> PartialEq for KMeans<
                    return false;
                }
                for j in 0..self.centroids[i].len() {
-                    if (self.centroids[i][j] - other.centroids[i][j]).abs() > std::f64::EPSILON {
+                    if (self.centroids[i][j] - other.centroids[i][j]).abs() > f64::EPSILON {
                        return false;
                    }
                }
@@ -270,7 +270,7 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>> KMeans<TX, TY, X, Y>

        let (n, d) = data.shape();

-        let mut distortion = std::f64::MAX;
+        let mut distortion = f64::MAX;
        let mut y = KMeans::<TX, TY, X, Y>::kmeans_plus_plus(data, parameters.k, parameters.seed);
        let mut size = vec![0; parameters.k];
        let mut centroids = vec![vec![0f64; d]; parameters.k];
@@ -331,7 +331,7 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>> KMeans<TX, TY, X, Y>
        let mut row = vec![0f64; x.shape().1];

        for i in 0..n {
-            let mut min_dist = std::f64::MAX;
+            let mut min_dist = f64::MAX;
            let mut best_cluster = 0;

            for j in 0..self.k {
@@ -361,7 +361,7 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>> KMeans<TX, TY, X, Y>
            .cloned()
            .collect();

-        let mut d = vec![std::f64::MAX; n];
+        let mut d = vec![f64::MAX; n];
        let mut row = vec![TX::zero(); data.shape().1];

        for j in 1..k {
@@ -1,8 +1,10 @@
+#![allow(clippy::ptr_arg, clippy::needless_range_loop)]
 //! # Clustering
 //!
 //! Clustering is the type of unsupervised learning where you divide the population or data points into a number of groups such that data points in the same groups
 //! are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.

+pub mod agglomerative;
 pub mod dbscan;
 /// An iterative clustering algorithm that aims to find local maxima in each iteration.
 pub mod kmeans;
@@ -1,3 +1,4 @@
+#![allow(clippy::ptr_arg, clippy::needless_range_loop)]
 //! Datasets
 //!
 //! In this module you will find small datasets that are used in `smartcore` mostly for demonstration purposes.
@@ -0,0 +1,214 @@
+use rand::Rng;
+use std::fmt::Debug;
+
+#[cfg(feature = "serde")]
+use serde::{Deserialize, Serialize};
+
+use crate::error::{Failed, FailedError};
+use crate::linalg::basic::arrays::{Array1, Array2};
+use crate::numbers::basenum::Number;
+use crate::numbers::floatnum::FloatNumber;
+
+use crate::rand_custom::get_rng_impl;
+use crate::tree::base_tree_regressor::{BaseTreeRegressor, BaseTreeRegressorParameters, Splitter};
+
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug, Clone)]
+/// Parameters of the Forest Regressor
+/// Some parameters here are passed directly into base estimator.
+pub struct BaseForestRegressorParameters {
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Tree max depth. See [Decision Tree Regressor](../../tree/decision_tree_regressor/index.html)
+    pub max_depth: Option<u16>,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The minimum number of samples required to be at a leaf node. See [Decision Tree Regressor](../../tree/decision_tree_regressor/index.html)
+    pub min_samples_leaf: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The minimum number of samples required to split an internal node. See [Decision Tree Regressor](../../tree/decision_tree_regressor/index.html)
+    pub min_samples_split: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The number of trees in the forest.
+    pub n_trees: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Number of random sample of predictors to use as split candidates.
+    pub m: Option<usize>,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Whether to keep samples used for tree generation. This is required for OOB prediction.
+    pub keep_samples: bool,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Seed used for bootstrap sampling and feature selection for each tree.
+    pub seed: u64,
+    #[cfg_attr(feature = "serde", serde(default))]
+    pub bootstrap: bool,
+    #[cfg_attr(feature = "serde", serde(default))]
+    pub splitter: Splitter,
+}
+
+impl<TX: Number + FloatNumber + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> PartialEq
+    for BaseForestRegressor<TX, TY, X, Y>
+{
+    fn eq(&self, other: &Self) -> bool {
+        if self.trees.as_ref().unwrap().len() != other.trees.as_ref().unwrap().len() {
+            false
+        } else {
+            self.trees
+                .iter()
+                .zip(other.trees.iter())
+                .all(|(a, b)| a == b)
+        }
+    }
+}
+
+/// Forest Regressor
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug)]
+pub struct BaseForestRegressor<
+    TX: Number + FloatNumber + PartialOrd,
+    TY: Number,
+    X: Array2<TX>,
+    Y: Array1<TY>,
+> {
+    trees: Option<Vec<BaseTreeRegressor<TX, TY, X, Y>>>,
+    samples: Option<Vec<Vec<bool>>>,
+}
+
+impl<TX: Number + FloatNumber + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    BaseForestRegressor<TX, TY, X, Y>
+{
+    /// Build a forest of trees from the training set.
+    /// * `x` - _NxM_ matrix with _N_ observations and _M_ features in each observation.
+    /// * `y` - the target class values
+    pub fn fit(
+        x: &X,
+        y: &Y,
+        parameters: BaseForestRegressorParameters,
+    ) -> Result<BaseForestRegressor<TX, TY, X, Y>, Failed> {
+        let (n_rows, num_attributes) = x.shape();
+
+        if n_rows != y.shape() {
+            return Err(Failed::fit("Number of rows in X should = len(y)"));
+        }
+
+        let mtry = parameters
+            .m
+            .unwrap_or((num_attributes as f64).sqrt().floor() as usize);
+
+        let mut rng = get_rng_impl(Some(parameters.seed));
+        let mut trees: Vec<BaseTreeRegressor<TX, TY, X, Y>> = Vec::new();
+
+        let mut maybe_all_samples: Option<Vec<Vec<bool>>> = Option::None;
+        if parameters.keep_samples {
+            // TODO: use with_capacity here
+            maybe_all_samples = Some(Vec::new());
+        }
+
+        let mut samples: Vec<usize> = (0..n_rows).map(|_| 1).collect();
+
+        for _ in 0..parameters.n_trees {
+            if parameters.bootstrap {
+                samples =
+                    BaseForestRegressor::<TX, TY, X, Y>::sample_with_replacement(n_rows, &mut rng);
+            }
+
+            // keep samples is flag is on
+            if let Some(ref mut all_samples) = maybe_all_samples {
+                all_samples.push(samples.iter().map(|x| *x != 0).collect())
+            }
+
+            let params = BaseTreeRegressorParameters {
+                max_depth: parameters.max_depth,
+                min_samples_leaf: parameters.min_samples_leaf,
+                min_samples_split: parameters.min_samples_split,
+                seed: Some(parameters.seed),
+                splitter: parameters.splitter.clone(),
+            };
+            let tree = BaseTreeRegressor::fit_weak_learner(x, y, samples.clone(), mtry, params)?;
+            trees.push(tree);
+        }
+
+        Ok(BaseForestRegressor {
+            trees: Some(trees),
+            samples: maybe_all_samples,
+        })
+    }
+
+    /// Predict class for `x`
+    /// * `x` - _KxM_ data where _K_ is number of observations and _M_ is number of features.
+    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
+        let mut result = Y::zeros(x.shape().0);
+
+        let (n, _) = x.shape();
+
+        for i in 0..n {
+            result.set(i, self.predict_for_row(x, i));
+        }
+
+        Ok(result)
+    }
+
+    fn predict_for_row(&self, x: &X, row: usize) -> TY {
+        let n_trees = self.trees.as_ref().unwrap().len();
+
+        let mut result = TY::zero();
+
+        for tree in self.trees.as_ref().unwrap().iter() {
+            result += tree.predict_for_row(x, row);
+        }
+
+        result / TY::from_usize(n_trees).unwrap()
+    }
+
+    /// Predict OOB classes for `x`. `x` is expected to be equal to the dataset used in training.
+    pub fn predict_oob(&self, x: &X) -> Result<Y, Failed> {
+        let (n, _) = x.shape();
+        if self.samples.is_none() {
+            Err(Failed::because(
+                FailedError::PredictFailed,
+                "Need samples=true for OOB predictions.",
+            ))
+        } else if self.samples.as_ref().unwrap()[0].len() != n {
+            Err(Failed::because(
+                FailedError::PredictFailed,
+                "Prediction matrix must match matrix used in training for OOB predictions.",
+            ))
+        } else {
+            let mut result = Y::zeros(n);
+
+            for i in 0..n {
+                result.set(i, self.predict_for_row_oob(x, i));
+            }
+
+            Ok(result)
+        }
+    }
+
+    fn predict_for_row_oob(&self, x: &X, row: usize) -> TY {
+        let mut n_trees = 0;
+        let mut result = TY::zero();
+
+        for (tree, samples) in self
+            .trees
+            .as_ref()
+            .unwrap()
+            .iter()
+            .zip(self.samples.as_ref().unwrap())
+        {
+            if !samples[row] {
+                result += tree.predict_for_row(x, row);
+                n_trees += 1;
+            }
+        }
+
+        // TODO: What to do if there are no oob trees?
+        result / TY::from(n_trees).unwrap()
+    }
+
+    fn sample_with_replacement(nrows: usize, rng: &mut impl Rng) -> Vec<usize> {
+        let mut samples = vec![0; nrows];
+        for _ in 0..nrows {
+            let xi = rng.gen_range(0..nrows);
+            samples[xi] += 1;
+        }
+        samples
+    }
+}
@@ -0,0 +1,318 @@
+//! # Extra Trees Regressor
+//! An Extra-Trees (Extremely Randomized Trees) regressor is an ensemble learning method that fits multiple randomized
+//! decision trees on the dataset and averages their predictions to improve accuracy and control over-fitting.
+//!
+//! It is similar to a standard Random Forest, but introduces more randomness in the way splits are chosen, which can
+//! reduce the variance of the model and often make the training process faster.
+//!
+//! The two key differences from a standard Random Forest are:
+//! 1. It uses the whole original dataset to build each tree instead of bootstrap samples.
+//! 2. When splitting a node, it chooses a random split point for each feature, rather than the most optimal one.
+//!
+//! See [ensemble models](../index.html) for more details.
+//!
+//! Bigger number of estimators in general improves performance of the algorithm with an increased cost of training time.
+//! The random sample of _m_ predictors is typically set to be \\(\sqrt{p}\\) from the full set of _p_ predictors.
+//!
+//! Example:
+//!
+//! ```
+//! use smartcore::linalg::basic::matrix::DenseMatrix;
+//! use smartcore::ensemble::extra_trees_regressor::*;
+//!
+//! // Longley dataset ([https://www.statsmodels.org/stable/datasets/generated/longley.html](https://www.statsmodels.org/stable/datasets/generated/longley.html))
+//! let x = DenseMatrix::from_2d_array(&[
+//!     &[234.289, 235.6, 159., 107.608, 1947., 60.323],
+//!     &[259.426, 232.5, 145.6, 108.632, 1948., 61.122],
+//!     &[258.054, 368.2, 161.6, 109.773, 1949., 60.171],
+//!     &[284.599, 335.1, 165., 110.929, 1950., 61.187],
+//!     &[328.975, 209.9, 309.9, 112.075, 1951., 63.221],
+//!     &[346.999, 193.2, 359.4, 113.27, 1952., 63.639],
+//!     &[365.385, 187., 354.7, 115.094, 1953., 64.989],
+//!     &[363.112, 357.8, 335., 116.219, 1954., 63.761],
+//!     &[397.469, 290.4, 304.8, 117.388, 1955., 66.019],
+//!     &[419.18, 282.2, 285.7, 118.734, 1956., 67.857],
+//!     &[442.769, 293.6, 279.8, 120.445, 1957., 68.169],
+//!     &[444.546, 468.1, 263.7, 121.95, 1958., 66.513],
+//!     &[482.704, 381.3, 255.2, 123.366, 1959., 68.655],
+//!     &[502.601, 393.1, 251.4, 125.368, 1960., 69.564],
+//!     &[518.173, 480.6, 257.2, 127.852, 1961., 69.331],
+//!     &[554.894, 400.7, 282.7, 130.081, 1962., 70.551],
+//! ]).unwrap();
+//! let y = vec![
+//!     83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0, 101.2,
+//!     104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9
+//! ];
+//!
+//! let regressor = ExtraTreesRegressor::fit(&x, &y, Default::default()).unwrap();
+//!
+//! let y_hat = regressor.predict(&x).unwrap(); // use the same data for prediction
+//! ```
+//!
+//! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
+//! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+
+use std::default::Default;
+use std::fmt::Debug;
+
+#[cfg(feature = "serde")]
+use serde::{Deserialize, Serialize};
+
+use crate::api::{Predictor, SupervisedEstimator};
+use crate::ensemble::base_forest_regressor::{BaseForestRegressor, BaseForestRegressorParameters};
+use crate::error::Failed;
+use crate::linalg::basic::arrays::{Array1, Array2};
+use crate::numbers::basenum::Number;
+use crate::numbers::floatnum::FloatNumber;
+use crate::tree::base_tree_regressor::Splitter;
+
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug, Clone)]
+/// Parameters of the Extra Trees Regressor
+/// Some parameters here are passed directly into base estimator.
+pub struct ExtraTreesRegressorParameters {
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Tree max depth. See [Decision Tree Regressor](../../tree/decision_tree_regressor/index.html)
+    pub max_depth: Option<u16>,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The minimum number of samples required to be at a leaf node. See [Decision Tree Regressor](../../tree/decision_tree_regressor/index.html)
+    pub min_samples_leaf: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The minimum number of samples required to split an internal node. See [Decision Tree Regressor](../../tree/decision_tree_regressor/index.html)
+    pub min_samples_split: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The number of trees in the forest.
+    pub n_trees: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Number of random sample of predictors to use as split candidates.
+    pub m: Option<usize>,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Whether to keep samples used for tree generation. This is required for OOB prediction.
+    pub keep_samples: bool,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Seed used for bootstrap sampling and feature selection for each tree.
+    pub seed: u64,
+}
+
+/// Extra Trees Regressor
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug)]
+pub struct ExtraTreesRegressor<
+    TX: Number + FloatNumber + PartialOrd,
+    TY: Number,
+    X: Array2<TX>,
+    Y: Array1<TY>,
+> {
+    forest_regressor: Option<BaseForestRegressor<TX, TY, X, Y>>,
+}
+
+impl ExtraTreesRegressorParameters {
+    /// Tree max depth. See [Decision Tree Classifier](../../tree/decision_tree_classifier/index.html)
+    pub fn with_max_depth(mut self, max_depth: u16) -> Self {
+        self.max_depth = Some(max_depth);
+        self
+    }
+    /// The minimum number of samples required to be at a leaf node. See [Decision Tree Classifier](../../tree/decision_tree_classifier/index.html)
+    pub fn with_min_samples_leaf(mut self, min_samples_leaf: usize) -> Self {
+        self.min_samples_leaf = min_samples_leaf;
+        self
+    }
+    /// The minimum number of samples required to split an internal node. See [Decision Tree Classifier](../../tree/decision_tree_classifier/index.html)
+    pub fn with_min_samples_split(mut self, min_samples_split: usize) -> Self {
+        self.min_samples_split = min_samples_split;
+        self
+    }
+    /// The number of trees in the forest.
+    pub fn with_n_trees(mut self, n_trees: usize) -> Self {
+        self.n_trees = n_trees;
+        self
+    }
+    /// Number of random sample of predictors to use as split candidates.
+    pub fn with_m(mut self, m: usize) -> Self {
+        self.m = Some(m);
+        self
+    }
+
+    /// Whether to keep samples used for tree generation. This is required for OOB prediction.
+    pub fn with_keep_samples(mut self, keep_samples: bool) -> Self {
+        self.keep_samples = keep_samples;
+        self
+    }
+
+    /// Seed used for bootstrap sampling and feature selection for each tree.
+    pub fn with_seed(mut self, seed: u64) -> Self {
+        self.seed = seed;
+        self
+    }
+}
+impl Default for ExtraTreesRegressorParameters {
+    fn default() -> Self {
+        ExtraTreesRegressorParameters {
+            max_depth: Option::None,
+            min_samples_leaf: 1,
+            min_samples_split: 2,
+            n_trees: 10,
+            m: Option::None,
+            keep_samples: false,
+            seed: 0,
+        }
+    }
+}
+
+impl<TX: Number + FloatNumber + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    SupervisedEstimator<X, Y, ExtraTreesRegressorParameters> for ExtraTreesRegressor<TX, TY, X, Y>
+{
+    fn new() -> Self {
+        Self {
+            forest_regressor: Option::None,
+        }
+    }
+
+    fn fit(x: &X, y: &Y, parameters: ExtraTreesRegressorParameters) -> Result<Self, Failed> {
+        ExtraTreesRegressor::fit(x, y, parameters)
+    }
+}
+
+impl<TX: Number + FloatNumber + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    Predictor<X, Y> for ExtraTreesRegressor<TX, TY, X, Y>
+{
+    fn predict(&self, x: &X) -> Result<Y, Failed> {
+        self.predict(x)
+    }
+}
+
+impl<TX: Number + FloatNumber + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    ExtraTreesRegressor<TX, TY, X, Y>
+{
+    /// Build a forest of trees from the training set.
+    /// * `x` - _NxM_ matrix with _N_ observations and _M_ features in each observation.
+    /// * `y` - the target class values
+    pub fn fit(
+        x: &X,
+        y: &Y,
+        parameters: ExtraTreesRegressorParameters,
+    ) -> Result<ExtraTreesRegressor<TX, TY, X, Y>, Failed> {
+        let regressor_params = BaseForestRegressorParameters {
+            max_depth: parameters.max_depth,
+            min_samples_leaf: parameters.min_samples_leaf,
+            min_samples_split: parameters.min_samples_split,
+            n_trees: parameters.n_trees,
+            m: parameters.m,
+            keep_samples: parameters.keep_samples,
+            seed: parameters.seed,
+            bootstrap: false,
+            splitter: Splitter::Random,
+        };
+        let forest_regressor = BaseForestRegressor::fit(x, y, regressor_params)?;
+
+        Ok(ExtraTreesRegressor {
+            forest_regressor: Some(forest_regressor),
+        })
+    }
+
+    /// Predict class for `x`
+    /// * `x` - _KxM_ data where _K_ is number of observations and _M_ is number of features.
+    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
+        let forest_regressor = self.forest_regressor.as_ref().unwrap();
+        forest_regressor.predict(x)
+    }
+
+    /// Predict OOB classes for `x`. `x` is expected to be equal to the dataset used in training.
+    pub fn predict_oob(&self, x: &X) -> Result<Y, Failed> {
+        let forest_regressor = self.forest_regressor.as_ref().unwrap();
+        forest_regressor.predict_oob(x)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::linalg::basic::matrix::DenseMatrix;
+    use crate::metrics::mean_squared_error;
+
+    #[test]
+    fn test_extra_trees_regressor_fit_predict() {
+        // Use a simpler, more predictable dataset for unit testing.
+        let x = DenseMatrix::from_2d_array(&[
+            &[1., 2.],
+            &[3., 4.],
+            &[5., 6.],
+            &[7., 8.],
+            &[9., 10.],
+            &[11., 12.],
+            &[13., 14.],
+            &[15., 16.],
+        ])
+        .unwrap();
+        let y = vec![1., 2., 3., 4., 5., 6., 7., 8.];
+
+        let parameters = ExtraTreesRegressorParameters::default()
+            .with_n_trees(100)
+            .with_seed(42);
+
+        let regressor = ExtraTreesRegressor::fit(&x, &y, parameters).unwrap();
+        let y_hat = regressor.predict(&x).unwrap();
+
+        assert_eq!(y_hat.len(), y.len());
+        // A basic check to ensure the model is learning something.
+        // The error should be significantly less than the variance of y.
+        let mse = mean_squared_error(&y, &y_hat);
+        // With this simple dataset, the error should be very low.
+        assert!(mse < 1.0);
+    }
+
+    #[test]
+    fn test_fit_predict_higher_dims() {
+        // Dataset with 10 features, but y is only dependent on the 3rd feature (index 2).
+        let x = DenseMatrix::from_2d_array(&[
+            // The 3rd column is the important one. The rest are noise.
+            &[0., 0., 10., 5., 8., 1., 4., 9., 2., 7.],
+            &[0., 0., 20., 1., 2., 3., 4., 5., 6., 7.],
+            &[0., 0., 30., 7., 6., 5., 4., 3., 2., 1.],
+            &[0., 0., 40., 9., 2., 4., 6., 8., 1., 3.],
+            &[0., 0., 55., 3., 1., 8., 6., 4., 2., 9.],
+            &[0., 0., 65., 2., 4., 7., 5., 3., 1., 8.],
+        ])
+        .unwrap();
+        let y = vec![10., 20., 30., 40., 55., 65.];
+
+        let parameters = ExtraTreesRegressorParameters::default()
+            .with_n_trees(100)
+            .with_seed(42);
+
+        let regressor = ExtraTreesRegressor::fit(&x, &y, parameters).unwrap();
+        let y_hat = regressor.predict(&x).unwrap();
+
+        assert_eq!(y_hat.len(), y.len());
+
+        let mse = mean_squared_error(&y, &y_hat);
+
+        // The model should be able to learn this simple relationship perfectly,
+        // ignoring the noise features. The MSE should be very low.
+        assert!(mse < 1.0);
+    }
+
+    #[test]
+    fn test_reproducibility() {
+        let x = DenseMatrix::from_2d_array(&[
+            &[1., 2.],
+            &[3., 4.],
+            &[5., 6.],
+            &[7., 8.],
+            &[9., 10.],
+            &[11., 12.],
+        ])
+        .unwrap();
+        let y = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0];
+
+        let params = ExtraTreesRegressorParameters::default().with_seed(42);
+
+        let regressor1 = ExtraTreesRegressor::fit(&x, &y, params.clone()).unwrap();
+        let y_hat1 = regressor1.predict(&x).unwrap();
+
+        let regressor2 = ExtraTreesRegressor::fit(&x, &y, params.clone()).unwrap();
+        let y_hat2 = regressor2.predict(&x).unwrap();
+
+        assert_eq!(y_hat1, y_hat2);
+    }
+}
@@ -16,6 +16,8 @@
 //!
 //! * ["An Introduction to Statistical Learning", James G., Witten D., Hastie T., Tibshirani R., 8.2 Bagging, Random Forests, Boosting](http://faculty.marshall.usc.edu/gareth-james/ISL/)

+mod base_forest_regressor;
+pub mod extra_trees_regressor;
 /// Random forest classifier
 pub mod random_forest_classifier;
 /// Random forest regressor
@@ -43,7 +43,6 @@
 //! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
 //! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

-use rand::Rng;
 use std::default::Default;
 use std::fmt::Debug;

@@ -51,15 +50,12 @@ use std::fmt::Debug;
 use serde::{Deserialize, Serialize};

 use crate::api::{Predictor, SupervisedEstimator};
-use crate::error::{Failed, FailedError};
+use crate::ensemble::base_forest_regressor::{BaseForestRegressor, BaseForestRegressorParameters};
+use crate::error::Failed;
 use crate::linalg::basic::arrays::{Array1, Array2};
 use crate::numbers::basenum::Number;
 use crate::numbers::floatnum::FloatNumber;
-
-use crate::rand_custom::get_rng_impl;
-use crate::tree::decision_tree_regressor::{
-    DecisionTreeRegressor, DecisionTreeRegressorParameters,
-};
+use crate::tree::base_tree_regressor::Splitter;

 #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
 #[derive(Debug, Clone)]
@@ -98,8 +94,7 @@ pub struct RandomForestRegressor<
    X: Array2<TX>,
    Y: Array1<TY>,
 > {
-    trees: Option<Vec<DecisionTreeRegressor<TX, TY, X, Y>>>,
-    samples: Option<Vec<Vec<bool>>>,
+    forest_regressor: Option<BaseForestRegressor<TX, TY, X, Y>>,
 }

 impl RandomForestRegressorParameters {
@@ -159,14 +154,7 @@ impl<TX: Number + FloatNumber + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1
    for RandomForestRegressor<TX, TY, X, Y>
 {
    fn eq(&self, other: &Self) -> bool {
-        if self.trees.as_ref().unwrap().len() != other.trees.as_ref().unwrap().len() {
-            false
-        } else {
-            self.trees
-                .iter()
-                .zip(other.trees.iter())
-                .all(|(a, b)| a == b)
-        }
+        self.forest_regressor == other.forest_regressor
    }
 }

@@ -176,8 +164,7 @@ impl<TX: Number + FloatNumber + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1
 {
    fn new() -> Self {
        Self {
-            trees: Option::None,
-            samples: Option::None,
+            forest_regressor: Option::None,
        }
    }

@@ -397,128 +384,35 @@ impl<TX: Number + FloatNumber + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1
        y: &Y,
        parameters: RandomForestRegressorParameters,
    ) -> Result<RandomForestRegressor<TX, TY, X, Y>, Failed> {
-        let (n_rows, num_attributes) = x.shape();
-
-        if n_rows != y.shape() {
-            return Err(Failed::fit("Number of rows in X should = len(y)"));
-        }
-
-        let mtry = parameters
-            .m
-            .unwrap_or((num_attributes as f64).sqrt().floor() as usize);
-
-        let mut rng = get_rng_impl(Some(parameters.seed));
-        let mut trees: Vec<DecisionTreeRegressor<TX, TY, X, Y>> = Vec::new();
-
-        let mut maybe_all_samples: Option<Vec<Vec<bool>>> = Option::None;
-        if parameters.keep_samples {
-            // TODO: use with_capacity here
-            maybe_all_samples = Some(Vec::new());
-        }
-
-        for _ in 0..parameters.n_trees {
-            let samples: Vec<usize> =
-                RandomForestRegressor::<TX, TY, X, Y>::sample_with_replacement(n_rows, &mut rng);
-
-            // keep samples is flag is on
-            if let Some(ref mut all_samples) = maybe_all_samples {
-                all_samples.push(samples.iter().map(|x| *x != 0).collect())
-            }
-
-            let params = DecisionTreeRegressorParameters {
-                max_depth: parameters.max_depth,
-                min_samples_leaf: parameters.min_samples_leaf,
-                min_samples_split: parameters.min_samples_split,
-                seed: Some(parameters.seed),
-            };
-            let tree = DecisionTreeRegressor::fit_weak_learner(x, y, samples, mtry, params)?;
-            trees.push(tree);
-        }
+        let regressor_params = BaseForestRegressorParameters {
+            max_depth: parameters.max_depth,
+            min_samples_leaf: parameters.min_samples_leaf,
+            min_samples_split: parameters.min_samples_split,
+            n_trees: parameters.n_trees,
+            m: parameters.m,
+            keep_samples: parameters.keep_samples,
+            seed: parameters.seed,
+            bootstrap: true,
+            splitter: Splitter::Best,
+        };
+        let forest_regressor = BaseForestRegressor::fit(x, y, regressor_params)?;

        Ok(RandomForestRegressor {
-            trees: Some(trees),
-            samples: maybe_all_samples,
+            forest_regressor: Some(forest_regressor),
        })
    }

    /// Predict class for `x`
    /// * `x` - _KxM_ data where _K_ is number of observations and _M_ is number of features.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
-        let mut result = Y::zeros(x.shape().0);
-
-        let (n, _) = x.shape();
-
-        for i in 0..n {
-            result.set(i, self.predict_for_row(x, i));
-        }
-
-        Ok(result)
-    }
-
-    fn predict_for_row(&self, x: &X, row: usize) -> TY {
-        let n_trees = self.trees.as_ref().unwrap().len();
-
-        let mut result = TY::zero();
-
-        for tree in self.trees.as_ref().unwrap().iter() {
-            result += tree.predict_for_row(x, row);
-        }
-
-        result / TY::from_usize(n_trees).unwrap()
+        let forest_regressor = self.forest_regressor.as_ref().unwrap();
+        forest_regressor.predict(x)
    }

    /// Predict OOB classes for `x`. `x` is expected to be equal to the dataset used in training.
    pub fn predict_oob(&self, x: &X) -> Result<Y, Failed> {
-        let (n, _) = x.shape();
-        if self.samples.is_none() {
-            Err(Failed::because(
-                FailedError::PredictFailed,
-                "Need samples=true for OOB predictions.",
-            ))
-        } else if self.samples.as_ref().unwrap()[0].len() != n {
-            Err(Failed::because(
-                FailedError::PredictFailed,
-                "Prediction matrix must match matrix used in training for OOB predictions.",
-            ))
-        } else {
-            let mut result = Y::zeros(n);
-
-            for i in 0..n {
-                result.set(i, self.predict_for_row_oob(x, i));
-            }
-
-            Ok(result)
-        }
-    }
-
-    fn predict_for_row_oob(&self, x: &X, row: usize) -> TY {
-        let mut n_trees = 0;
-        let mut result = TY::zero();
-
-        for (tree, samples) in self
-            .trees
-            .as_ref()
-            .unwrap()
-            .iter()
-            .zip(self.samples.as_ref().unwrap())
-        {
-            if !samples[row] {
-                result += tree.predict_for_row(x, row);
-                n_trees += 1;
-            }
-        }
-
-        // TODO: What to do if there are no oob trees?
-        result / TY::from(n_trees).unwrap()
-    }
-
-    fn sample_with_replacement(nrows: usize, rng: &mut impl Rng) -> Vec<usize> {
-        let mut samples = vec![0; nrows];
-        for _ in 0..nrows {
-            let xi = rng.gen_range(0..nrows);
-            samples[xi] += 1;
-        }
-        samples
+        let forest_regressor = self.forest_regressor.as_ref().unwrap();
+        forest_regressor.predict_oob(x)
    }
 }

@@ -7,7 +7,6 @@
    clippy::approx_constant
 )]
 #![warn(missing_docs)]
-#![warn(rustdoc::missing_doc_code_examples)]

 //! # smartcore
 //!
@@ -131,5 +130,6 @@ pub mod readers;
 pub mod svm;
 /// Supervised tree-based learning methods
 pub mod tree;
+pub mod xgboost;

 pub(crate) mod rand_custom;
@@ -265,11 +265,11 @@ pub trait ArrayView1<T: Debug + Display + Copy + Sized>: Array<T, usize> {
        if p.is_infinite() && p.is_sign_positive() {
            self.iterator(0)
                .map(|x| x.to_f64().unwrap().abs())
-                .fold(std::f64::NEG_INFINITY, |a, b| a.max(b))
+                .fold(f64::NEG_INFINITY, |a, b| a.max(b))
        } else if p.is_infinite() && p.is_sign_negative() {
            self.iterator(0)
                .map(|x| x.to_f64().unwrap().abs())
-                .fold(std::f64::INFINITY, |a, b| a.min(b))
+                .fold(f64::INFINITY, |a, b| a.min(b))
        } else {
            let mut norm = 0f64;

@@ -558,11 +558,11 @@ pub trait ArrayView2<T: Debug + Display + Copy + Sized>: Array<T, (usize, usize)
        if p.is_infinite() && p.is_sign_positive() {
            self.iterator(0)
                .map(|x| x.to_f64().unwrap().abs())
-                .fold(std::f64::NEG_INFINITY, |a, b| a.max(b))
+                .fold(f64::NEG_INFINITY, |a, b| a.max(b))
        } else if p.is_infinite() && p.is_sign_negative() {
            self.iterator(0)
                .map(|x| x.to_f64().unwrap().abs())
-                .fold(std::f64::INFINITY, |a, b| a.min(b))
+                .fold(f64::INFINITY, |a, b| a.min(b))
        } else {
            let mut norm = 0f64;

@@ -619,7 +619,7 @@ pub trait MutArrayView1<T: Debug + Display + Copy + Sized>:
        T: Number + PartialOrd,
    {
        let stack_size = 64;
-        let mut jstack = -1;
+        let mut jstack: i32 = -1;
        let mut l = 0;
        let mut istack = vec![0; stack_size];
        let mut ir = self.shape() - 1;
@@ -731,34 +731,34 @@ pub trait MutArrayView1<T: Debug + Display + Copy + Sized>:
 pub trait MutArrayView2<T: Debug + Display + Copy + Sized>:
    MutArray<T, (usize, usize)> + ArrayView2<T>
 {
-    ///
+    /// copy values from another array
    fn copy_from(&mut self, other: &dyn Array<T, (usize, usize)>) {
        self.iterator_mut(0)
            .zip(other.iterator(0))
            .for_each(|(s, o)| *s = *o);
    }
-    ///
+    /// update view with absolute values
    fn abs_mut(&mut self)
    where
        T: Number + Signed,
    {
        self.iterator_mut(0).for_each(|v| *v = v.abs());
    }
-    ///
+    /// update view values with opposite sign
    fn neg_mut(&mut self)
    where
        T: Number + Neg<Output = T>,
    {
        self.iterator_mut(0).for_each(|v| *v = -*v);
    }
-    ///
+    /// update view values at power `p`
    fn pow_mut(&mut self, p: T)
    where
        T: RealNumber,
    {
        self.iterator_mut(0).for_each(|v| *v = v.powf(p));
    }
-    ///
+    /// scale view values
    fn scale_mut(&mut self, mean: &[T], std: &[T], axis: u8)
    where
        T: Number,
@@ -784,27 +784,27 @@ pub trait MutArrayView2<T: Debug + Display + Copy + Sized>:

 /// Trait for mutable 1D-array view
 pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized + Clone {
-    ///
+    /// return a view of the array
    fn slice<'a>(&'a self, range: Range<usize>) -> Box<dyn ArrayView1<T> + 'a>;
-    ///
+    /// return a mutable view of the array
    fn slice_mut<'a>(&'a mut self, range: Range<usize>) -> Box<dyn MutArrayView1<T> + 'a>;
-    ///
+    /// fill array with a given value
    fn fill(len: usize, value: T) -> Self
    where
        Self: Sized;
-    ///
+    /// create array from iterator
    fn from_iterator<I: Iterator<Item = T>>(iter: I, len: usize) -> Self
    where
        Self: Sized;
-    ///
+    /// create array from vector
    fn from_vec_slice(slice: &[T]) -> Self
    where
        Self: Sized;
-    ///
+    /// create array from slice
    fn from_slice(slice: &'_ dyn ArrayView1<T>) -> Self
    where
        Self: Sized;
-    ///
+    /// create a zero array
    fn zeros(len: usize) -> Self
    where
        T: Number,
@@ -812,7 +812,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
    {
        Self::fill(len, T::zero())
    }
-    ///
+    /// create an array of ones
    fn ones(len: usize) -> Self
    where
        T: Number,
@@ -820,7 +820,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
    {
        Self::fill(len, T::one())
    }
-    ///
+    /// create an array of random values
    fn rand(len: usize) -> Self
    where
        T: RealNumber,
@@ -828,7 +828,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
    {
        Self::from_iterator((0..len).map(|_| T::rand()), len)
    }
-    ///
+    /// add a scalar to the array
    fn add_scalar(&self, x: T) -> Self
    where
        T: Number,
@@ -838,7 +838,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.add_scalar_mut(x);
        result
    }
-    ///
+    /// subtract a scalar from the array
    fn sub_scalar(&self, x: T) -> Self
    where
        T: Number,
@@ -848,7 +848,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.sub_scalar_mut(x);
        result
    }
-    ///
+    /// divide a scalar from the array
    fn div_scalar(&self, x: T) -> Self
    where
        T: Number,
@@ -858,7 +858,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.div_scalar_mut(x);
        result
    }
-    ///
+    /// multiply a scalar to the array
    fn mul_scalar(&self, x: T) -> Self
    where
        T: Number,
@@ -868,7 +868,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.mul_scalar_mut(x);
        result
    }
-    ///
+    /// sum of two arrays
    fn add(&self, other: &dyn Array<T, usize>) -> Self
    where
        T: Number,
@@ -878,7 +878,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.add_mut(other);
        result
    }
-    ///
+    /// subtract two arrays
    fn sub(&self, other: &impl Array1<T>) -> Self
    where
        T: Number,
@@ -888,7 +888,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.sub_mut(other);
        result
    }
-    ///
+    /// multiply two arrays
    fn mul(&self, other: &dyn Array<T, usize>) -> Self
    where
        T: Number,
@@ -898,7 +898,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.mul_mut(other);
        result
    }
-    ///
+    /// divide two arrays
    fn div(&self, other: &dyn Array<T, usize>) -> Self
    where
        T: Number,
@@ -908,7 +908,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.div_mut(other);
        result
    }
-    ///
+    /// replace values with another array
    fn take(&self, index: &[usize]) -> Self
    where
        Self: Sized,
@@ -920,7 +920,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        );
        Self::from_iterator(index.iter().map(move |&i| *self.get(i)), index.len())
    }
-    ///
+    /// create a view of the array with absolute values
    fn abs(&self) -> Self
    where
        T: Number + Signed,
@@ -930,7 +930,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.abs_mut();
        result
    }
-    ///
+    /// create a view of the array with opposite sign
    fn neg(&self) -> Self
    where
        T: Number + Neg<Output = T>,
@@ -940,7 +940,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.neg_mut();
        result
    }
-    ///
+    /// create a view of the array with values at power `p`
    fn pow(&self, p: T) -> Self
    where
        T: RealNumber,
@@ -950,7 +950,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.pow_mut(p);
        result
    }
-    ///
+    /// apply argsort to the array
    fn argsort(&self) -> Vec<usize>
    where
        T: Number + PartialOrd,
@@ -958,12 +958,12 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        let mut v = self.clone();
        v.argsort_mut()
    }
-    ///
+    /// map values of the array
    fn map<O: Debug + Display + Copy + Sized, A: Array1<O>, F: FnMut(&T) -> O>(self, f: F) -> A {
        let len = self.shape();
        A::from_iterator(self.iterator(0).map(f), len)
    }
-    ///
+    /// apply softmax to the array
    fn softmax(&self) -> Self
    where
        T: RealNumber,
@@ -973,7 +973,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result.softmax_mut();
        result
    }
-    ///
+    /// multiply array by matrix
    fn xa(&self, a_transpose: bool, a: &dyn ArrayView2<T>) -> Self
    where
        T: Number,
@@ -1003,7 +1003,7 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +
        result
    }

-    ///
+    /// check if two arrays are approximately equal
    fn approximate_eq(&self, other: &Self, error: T) -> bool
    where
        T: Number + RealNumber,
@@ -1015,13 +1015,13 @@ pub trait Array1<T: Debug + Display + Copy + Sized>: MutArrayView1<T> + Sized +

 /// Trait for mutable 2D-array view
 pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized + Clone {
-    ///
+    /// fill 2d array with a given value
    fn fill(nrows: usize, ncols: usize, value: T) -> Self;
-    ///
+    /// get a view of the 2d array
    fn slice<'a>(&'a self, rows: Range<usize>, cols: Range<usize>) -> Box<dyn ArrayView2<T> + 'a>
    where
        Self: Sized;
-    ///
+    /// get a mutable view of the 2d array
    fn slice_mut<'a>(
        &'a mut self,
        rows: Range<usize>,
@@ -1029,31 +1029,31 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
    ) -> Box<dyn MutArrayView2<T> + 'a>
    where
        Self: Sized;
-    ///
+    /// create 2d array from iterator
    fn from_iterator<I: Iterator<Item = T>>(iter: I, nrows: usize, ncols: usize, axis: u8) -> Self;
-    ///
+    /// get row from 2d array
    fn get_row<'a>(&'a self, row: usize) -> Box<dyn ArrayView1<T> + 'a>
    where
        Self: Sized;
-    ///
+    /// get column from 2d array
    fn get_col<'a>(&'a self, col: usize) -> Box<dyn ArrayView1<T> + 'a>
    where
        Self: Sized;
-    ///
+    /// create a zero 2d array
    fn zeros(nrows: usize, ncols: usize) -> Self
    where
        T: Number,
    {
        Self::fill(nrows, ncols, T::zero())
    }
-    ///
+    /// create a 2d array of ones
    fn ones(nrows: usize, ncols: usize) -> Self
    where
        T: Number,
    {
        Self::fill(nrows, ncols, T::one())
    }
-    ///
+    /// create an identity matrix
    fn eye(size: usize) -> Self
    where
        T: Number,
@@ -1066,29 +1066,29 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +

        matrix
    }
-    ///
+    /// create a 2d array of random values
    fn rand(nrows: usize, ncols: usize) -> Self
    where
        T: RealNumber,
    {
        Self::from_iterator((0..nrows * ncols).map(|_| T::rand()), nrows, ncols, 0)
    }
-    ///
+    /// crate from 2d slice
    fn from_slice(slice: &dyn ArrayView2<T>) -> Self {
        let (nrows, ncols) = slice.shape();
        Self::from_iterator(slice.iterator(0).cloned(), nrows, ncols, 0)
    }
-    ///
+    /// create from row
    fn from_row(slice: &dyn ArrayView1<T>) -> Self {
        let ncols = slice.shape();
        Self::from_iterator(slice.iterator(0).cloned(), 1, ncols, 0)
    }
-    ///
+    /// create from column
    fn from_column(slice: &dyn ArrayView1<T>) -> Self {
        let nrows = slice.shape();
        Self::from_iterator(slice.iterator(0).cloned(), nrows, 1, 0)
    }
-    ///
+    /// transpose 2d array
    fn transpose(&self) -> Self {
        let (nrows, ncols) = self.shape();
        let mut m = Self::fill(ncols, nrows, *self.get((0, 0)));
@@ -1099,7 +1099,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        }
        m
    }
-    ///
+    /// change shape of 2d array
    fn reshape(&self, nrows: usize, ncols: usize, axis: u8) -> Self {
        let (onrows, oncols) = self.shape();

@@ -1110,7 +1110,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +

        Self::from_iterator(self.iterator(0).cloned(), nrows, ncols, axis)
    }
-    ///
+    /// multiply two 2d arrays
    fn matmul(&self, other: &dyn ArrayView2<T>) -> Self
    where
        T: Number,
@@ -1136,7 +1136,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +

        result
    }
-    ///
+    /// matrix multiplication
    fn ab(&self, a_transpose: bool, b: &dyn ArrayView2<T>, b_transpose: bool) -> Self
    where
        T: Number,
@@ -1171,7 +1171,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
            result
        }
    }
-    ///
+    /// matrix vector multiplication
    fn ax(&self, a_transpose: bool, x: &dyn ArrayView1<T>) -> Self
    where
        T: Number,
@@ -1199,7 +1199,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        }
        result
    }
-    ///
+    /// concatenate 1d array
    fn concatenate_1d<'a>(arrays: &'a [&'a dyn ArrayView1<T>], axis: u8) -> Self {
        assert!(
            axis == 1 || axis == 0,
@@ -1237,7 +1237,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
            ),
        }
    }
-    ///
+    /// concatenate 2d array
    fn concatenate_2d<'a>(arrays: &'a [&'a dyn ArrayView2<T>], axis: u8) -> Self {
        assert!(
            axis == 1 || axis == 0,
@@ -1294,7 +1294,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
            }
        }
    }
-    ///
+    /// merge 1d arrays
    fn merge_1d<'a>(&'a self, arrays: &'a [&'a dyn ArrayView1<T>], axis: u8, append: bool) -> Self {
        assert!(
            axis == 1 || axis == 0,
@@ -1362,7 +1362,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
            }
        }
    }
-    ///
+    /// Stack arrays in sequence vertically
    fn v_stack(&self, other: &dyn ArrayView2<T>) -> Self {
        let (nrows, ncols) = self.shape();
        let (other_nrows, other_ncols) = other.shape();
@@ -1378,7 +1378,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
            0,
        )
    }
-    ///
+    /// Stack arrays in sequence horizontally
    fn h_stack(&self, other: &dyn ArrayView2<T>) -> Self {
        let (nrows, ncols) = self.shape();
        let (other_nrows, other_ncols) = other.shape();
@@ -1394,20 +1394,20 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
            1,
        )
    }
-    ///
+    /// map  array values
    fn map<O: Debug + Display + Copy + Sized, A: Array2<O>, F: FnMut(&T) -> O>(self, f: F) -> A {
        let (nrows, ncols) = self.shape();
        A::from_iterator(self.iterator(0).map(f), nrows, ncols, 0)
    }
-    ///
+    /// iter rows
    fn row_iter<'a>(&'a self) -> Box<dyn Iterator<Item = Box<dyn ArrayView1<T> + 'a>> + 'a> {
        Box::new((0..self.shape().0).map(move |r| self.get_row(r)))
    }
-    ///
+    /// iter cols
    fn col_iter<'a>(&'a self) -> Box<dyn Iterator<Item = Box<dyn ArrayView1<T> + 'a>> + 'a> {
        Box::new((0..self.shape().1).map(move |r| self.get_col(r)))
    }
-    ///
+    /// take elements from 2d array
    fn take(&self, index: &[usize], axis: u8) -> Self {
        let (nrows, ncols) = self.shape();

@@ -1447,7 +1447,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
    fn take_column(&self, column_index: usize) -> Self {
        self.take(&[column_index], 1)
    }
-    ///
+    /// add a scalar to the array
    fn add_scalar(&self, x: T) -> Self
    where
        T: Number,
@@ -1456,7 +1456,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.add_scalar_mut(x);
        result
    }
-    ///
+    /// subtract a scalar from the array
    fn sub_scalar(&self, x: T) -> Self
    where
        T: Number,
@@ -1465,7 +1465,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.sub_scalar_mut(x);
        result
    }
-    ///
+    /// divide a scalar from the array
    fn div_scalar(&self, x: T) -> Self
    where
        T: Number,
@@ -1474,7 +1474,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.div_scalar_mut(x);
        result
    }
-    ///
+    /// multiply a scalar to the array
    fn mul_scalar(&self, x: T) -> Self
    where
        T: Number,
@@ -1483,7 +1483,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.mul_scalar_mut(x);
        result
    }
-    ///
+    /// sum of two arrays
    fn add(&self, other: &dyn Array<T, (usize, usize)>) -> Self
    where
        T: Number,
@@ -1492,7 +1492,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.add_mut(other);
        result
    }
-    ///
+    /// subtract two arrays
    fn sub(&self, other: &dyn Array<T, (usize, usize)>) -> Self
    where
        T: Number,
@@ -1501,7 +1501,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.sub_mut(other);
        result
    }
-    ///
+    /// multiply two arrays
    fn mul(&self, other: &dyn Array<T, (usize, usize)>) -> Self
    where
        T: Number,
@@ -1510,7 +1510,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.mul_mut(other);
        result
    }
-    ///
+    /// divide two arrays
    fn div(&self, other: &dyn Array<T, (usize, usize)>) -> Self
    where
        T: Number,
@@ -1519,7 +1519,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.div_mut(other);
        result
    }
-    ///
+    /// absolute values of the array
    fn abs(&self) -> Self
    where
        T: Number + Signed,
@@ -1528,7 +1528,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.abs_mut();
        result
    }
-    ///
+    /// negation of the array
    fn neg(&self) -> Self
    where
        T: Number + Neg<Output = T>,
@@ -1537,7 +1537,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        result.neg_mut();
        result
    }
-    ///
+    /// values at power `p`
    fn pow(&self, p: T) -> Self
    where
        T: RealNumber,
@@ -1575,7 +1575,7 @@ pub trait Array2<T: Debug + Display + Copy + Sized>: MutArrayView2<T> + Sized +
        }
    }

-    /// appriximate equality of the elements of a matrix according to a given error
+    /// approximate equality of the elements of a matrix according to a given error
    fn approximate_eq(&self, other: &Self, error: T) -> bool
    where
        T: Number + RealNumber,
@@ -1631,8 +1631,8 @@ mod tests {
        let v = vec![3., -2., 6.];
        assert_eq!(v.norm(1.), 11.);
        assert_eq!(v.norm(2.), 7.);
-        assert_eq!(v.norm(std::f64::INFINITY), 6.);
-        assert_eq!(v.norm(std::f64::NEG_INFINITY), 2.);
+        assert_eq!(v.norm(f64::INFINITY), 6.);
+        assert_eq!(v.norm(f64::NEG_INFINITY), 2.);
    }

    #[test]
@@ -2190,4 +2190,29 @@ mod tests {

        assert_eq!(result, [65, 581, 30])
    }
+
+    #[test]
+    fn test_argsort_mut_exact_boundary() {
+        // Test index == length - 1 case
+        let boundary =
+            DenseMatrix::from_2d_array(&[&[1.0, 2.0, 3.0, f64::MAX], &[3.0, f64::MAX, 0.0, 2.0]])
+                .unwrap();
+        let mut view0: Vec<f64> = boundary.get_col(0).iterator(0).copied().collect();
+        let indices = view0.argsort_mut();
+        assert_eq!(indices.last(), Some(&1));
+        assert_eq!(indices.first(), Some(&0));
+
+        let mut view1: Vec<f64> = boundary.get_col(3).iterator(0).copied().collect();
+        let indices = view1.argsort_mut();
+        assert_eq!(indices.last(), Some(&0));
+        assert_eq!(indices.first(), Some(&1));
+    }
+
+    #[test]
+    fn test_argsort_mut_filled_array() {
+        let matrix = DenseMatrix::<f64>::rand(1000, 1000);
+        let mut view: Vec<f64> = matrix.get_col(0).iterator(0).copied().collect();
+        let sorted = view.argsort_mut();
+        assert_eq!(sorted.len(), 1000);
+    }
 }
@@ -91,7 +91,7 @@ impl<'a, T: Debug + Display + Copy + Sized> DenseMatrixView<'a, T> {
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> fmt::Display for DenseMatrixView<'a, T> {
+impl<T: Debug + Display + Copy + Sized> fmt::Display for DenseMatrixView<'_, T> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        writeln!(
            f,
@@ -142,7 +142,7 @@ impl<'a, T: Debug + Display + Copy + Sized> DenseMatrixMutView<'a, T> {
        }
    }

-    fn iter_mut<'b>(&'b mut self, axis: u8) -> Box<dyn Iterator<Item = &mut T> + 'b> {
+    fn iter_mut<'b>(&'b mut self, axis: u8) -> Box<dyn Iterator<Item = &'b mut T> + 'b> {
        let column_major = self.column_major;
        let stride = self.stride;
        let ptr = self.values.as_mut_ptr();
@@ -169,7 +169,7 @@ impl<'a, T: Debug + Display + Copy + Sized> DenseMatrixMutView<'a, T> {
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> fmt::Display for DenseMatrixMutView<'a, T> {
+impl<T: Debug + Display + Copy + Sized> fmt::Display for DenseMatrixMutView<'_, T> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        writeln!(
            f,
@@ -385,7 +385,7 @@ impl<T: Debug + Display + Copy + Sized> Array<T, (usize, usize)> for DenseMatrix
    }

    fn is_empty(&self) -> bool {
-        self.ncols > 0 && self.nrows > 0
+        self.ncols < 1 || self.nrows < 1
    }

    fn iterator<'b>(&'b self, axis: u8) -> Box<dyn Iterator<Item = &'b T> + 'b> {
@@ -493,7 +493,7 @@ impl<T: Number + RealNumber> EVDDecomposable<T> for DenseMatrix<T> {}
 impl<T: Number + RealNumber> LUDecomposable<T> for DenseMatrix<T> {}
 impl<T: Number + RealNumber> SVDDecomposable<T> for DenseMatrix<T> {}

-impl<'a, T: Debug + Display + Copy + Sized> Array<T, (usize, usize)> for DenseMatrixView<'a, T> {
+impl<T: Debug + Display + Copy + Sized> Array<T, (usize, usize)> for DenseMatrixView<'_, T> {
    fn get(&self, pos: (usize, usize)) -> &T {
        if self.column_major {
            &self.values[pos.0 + pos.1 * self.stride]
@@ -515,7 +515,7 @@ impl<'a, T: Debug + Display + Copy + Sized> Array<T, (usize, usize)> for DenseMa
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> Array<T, usize> for DenseMatrixView<'a, T> {
+impl<T: Debug + Display + Copy + Sized> Array<T, usize> for DenseMatrixView<'_, T> {
    fn get(&self, i: usize) -> &T {
        if self.nrows == 1 {
            if self.column_major {
@@ -553,11 +553,11 @@ impl<'a, T: Debug + Display + Copy + Sized> Array<T, usize> for DenseMatrixView<
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView2<T> for DenseMatrixView<'a, T> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView2<T> for DenseMatrixView<'_, T> {}

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView1<T> for DenseMatrixView<'a, T> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView1<T> for DenseMatrixView<'_, T> {}

-impl<'a, T: Debug + Display + Copy + Sized> Array<T, (usize, usize)> for DenseMatrixMutView<'a, T> {
+impl<T: Debug + Display + Copy + Sized> Array<T, (usize, usize)> for DenseMatrixMutView<'_, T> {
    fn get(&self, pos: (usize, usize)) -> &T {
        if self.column_major {
            &self.values[pos.0 + pos.1 * self.stride]
@@ -579,9 +579,7 @@ impl<'a, T: Debug + Display + Copy + Sized> Array<T, (usize, usize)> for DenseMa
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> MutArray<T, (usize, usize)>
-    for DenseMatrixMutView<'a, T>
-{
+impl<T: Debug + Display + Copy + Sized> MutArray<T, (usize, usize)> for DenseMatrixMutView<'_, T> {
    fn set(&mut self, pos: (usize, usize), x: T) {
        if self.column_major {
            self.values[pos.0 + pos.1 * self.stride] = x;
@@ -595,15 +593,16 @@ impl<'a, T: Debug + Display + Copy + Sized> MutArray<T, (usize, usize)>
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> MutArrayView2<T> for DenseMatrixMutView<'a, T> {}
+impl<T: Debug + Display + Copy + Sized> MutArrayView2<T> for DenseMatrixMutView<'_, T> {}

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView2<T> for DenseMatrixMutView<'a, T> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView2<T> for DenseMatrixMutView<'_, T> {}

 impl<T: RealNumber> MatrixStats<T> for DenseMatrix<T> {}

 impl<T: RealNumber> MatrixPreprocessing<T> for DenseMatrix<T> {}

 #[cfg(test)]
+#[warn(clippy::reversed_empty_ranges)]
 mod tests {
    use super::*;
    use approx::relative_eq;
@@ -664,6 +663,7 @@ mod tests {
    #[test]
    fn test_instantiate_err_view3() {
        let x = DenseMatrix::from_2d_array(&[&[1., 2., 3.], &[4., 5., 6.], &[7., 8., 9.]]).unwrap();
+        #[allow(clippy::reversed_empty_ranges)]
        let v = DenseMatrixView::new(&x, 0..3, 4..3);
        assert!(v.is_err());
    }
@@ -119,7 +119,7 @@ impl<T: Debug + Display + Copy + Sized> Array1<T> for Vec<T> {
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> Array<T, usize> for VecMutView<'a, T> {
+impl<T: Debug + Display + Copy + Sized> Array<T, usize> for VecMutView<'_, T> {
    fn get(&self, i: usize) -> &T {
        &self.ptr[i]
    }
@@ -138,7 +138,7 @@ impl<'a, T: Debug + Display + Copy + Sized> Array<T, usize> for VecMutView<'a, T
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> MutArray<T, usize> for VecMutView<'a, T> {
+impl<T: Debug + Display + Copy + Sized> MutArray<T, usize> for VecMutView<'_, T> {
    fn set(&mut self, i: usize, x: T) {
        self.ptr[i] = x;
    }
@@ -149,10 +149,10 @@ impl<'a, T: Debug + Display + Copy + Sized> MutArray<T, usize> for VecMutView<'a
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView1<T> for VecMutView<'a, T> {}
-impl<'a, T: Debug + Display + Copy + Sized> MutArrayView1<T> for VecMutView<'a, T> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView1<T> for VecMutView<'_, T> {}
+impl<T: Debug + Display + Copy + Sized> MutArrayView1<T> for VecMutView<'_, T> {}

-impl<'a, T: Debug + Display + Copy + Sized> Array<T, usize> for VecView<'a, T> {
+impl<T: Debug + Display + Copy + Sized> Array<T, usize> for VecView<'_, T> {
    fn get(&self, i: usize) -> &T {
        &self.ptr[i]
    }
@@ -171,7 +171,7 @@ impl<'a, T: Debug + Display + Copy + Sized> Array<T, usize> for VecView<'a, T> {
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView1<T> for VecView<'a, T> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView1<T> for VecView<'_, T> {}

 #[cfg(test)]
 mod tests {
@@ -68,7 +68,7 @@ impl<T: Debug + Display + Copy + Sized> ArrayView2<T> for ArrayBase<OwnedRepr<T>

 impl<T: Debug + Display + Copy + Sized> MutArrayView2<T> for ArrayBase<OwnedRepr<T>, Ix2> {}

-impl<'a, T: Debug + Display + Copy + Sized> BaseArray<T, (usize, usize)> for ArrayView<'a, T, Ix2> {
+impl<T: Debug + Display + Copy + Sized> BaseArray<T, (usize, usize)> for ArrayView<'_, T, Ix2> {
    fn get(&self, pos: (usize, usize)) -> &T {
        &self[[pos.0, pos.1]]
    }
@@ -144,11 +144,9 @@ impl<T: Number + RealNumber> EVDDecomposable<T> for ArrayBase<OwnedRepr<T>, Ix2>
 impl<T: Number + RealNumber> LUDecomposable<T> for ArrayBase<OwnedRepr<T>, Ix2> {}
 impl<T: Number + RealNumber> SVDDecomposable<T> for ArrayBase<OwnedRepr<T>, Ix2> {}

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView2<T> for ArrayView<'a, T, Ix2> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView2<T> for ArrayView<'_, T, Ix2> {}

-impl<'a, T: Debug + Display + Copy + Sized> BaseArray<T, (usize, usize)>
-    for ArrayViewMut<'a, T, Ix2>
-{
+impl<T: Debug + Display + Copy + Sized> BaseArray<T, (usize, usize)> for ArrayViewMut<'_, T, Ix2> {
    fn get(&self, pos: (usize, usize)) -> &T {
        &self[[pos.0, pos.1]]
    }
@@ -175,9 +173,7 @@ impl<'a, T: Debug + Display + Copy + Sized> BaseArray<T, (usize, usize)>
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> MutArray<T, (usize, usize)>
-    for ArrayViewMut<'a, T, Ix2>
-{
+impl<T: Debug + Display + Copy + Sized> MutArray<T, (usize, usize)> for ArrayViewMut<'_, T, Ix2> {
    fn set(&mut self, pos: (usize, usize), x: T) {
        self[[pos.0, pos.1]] = x
    }
@@ -195,9 +191,9 @@ impl<'a, T: Debug + Display + Copy + Sized> MutArray<T, (usize, usize)>
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> MutArrayView2<T> for ArrayViewMut<'a, T, Ix2> {}
+impl<T: Debug + Display + Copy + Sized> MutArrayView2<T> for ArrayViewMut<'_, T, Ix2> {}

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView2<T> for ArrayViewMut<'a, T, Ix2> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView2<T> for ArrayViewMut<'_, T, Ix2> {}

 #[cfg(test)]
 mod tests {
@@ -41,7 +41,7 @@ impl<T: Debug + Display + Copy + Sized> ArrayView1<T> for ArrayBase<OwnedRepr<T>

 impl<T: Debug + Display + Copy + Sized> MutArrayView1<T> for ArrayBase<OwnedRepr<T>, Ix1> {}

-impl<'a, T: Debug + Display + Copy + Sized> BaseArray<T, usize> for ArrayView<'a, T, Ix1> {
+impl<T: Debug + Display + Copy + Sized> BaseArray<T, usize> for ArrayView<'_, T, Ix1> {
    fn get(&self, i: usize) -> &T {
        &self[i]
    }
@@ -60,9 +60,9 @@ impl<'a, T: Debug + Display + Copy + Sized> BaseArray<T, usize> for ArrayView<'a
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView1<T> for ArrayView<'a, T, Ix1> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView1<T> for ArrayView<'_, T, Ix1> {}

-impl<'a, T: Debug + Display + Copy + Sized> BaseArray<T, usize> for ArrayViewMut<'a, T, Ix1> {
+impl<T: Debug + Display + Copy + Sized> BaseArray<T, usize> for ArrayViewMut<'_, T, Ix1> {
    fn get(&self, i: usize) -> &T {
        &self[i]
    }
@@ -81,7 +81,7 @@ impl<'a, T: Debug + Display + Copy + Sized> BaseArray<T, usize> for ArrayViewMut
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> MutArray<T, usize> for ArrayViewMut<'a, T, Ix1> {
+impl<T: Debug + Display + Copy + Sized> MutArray<T, usize> for ArrayViewMut<'_, T, Ix1> {
    fn set(&mut self, i: usize, x: T) {
        self[i] = x;
    }
@@ -92,8 +92,8 @@ impl<'a, T: Debug + Display + Copy + Sized> MutArray<T, usize> for ArrayViewMut<
    }
 }

-impl<'a, T: Debug + Display + Copy + Sized> ArrayView1<T> for ArrayViewMut<'a, T, Ix1> {}
-impl<'a, T: Debug + Display + Copy + Sized> MutArrayView1<T> for ArrayViewMut<'a, T, Ix1> {}
+impl<T: Debug + Display + Copy + Sized> ArrayView1<T> for ArrayViewMut<'_, T, Ix1> {}
+impl<T: Debug + Display + Copy + Sized> MutArrayView1<T> for ArrayViewMut<'_, T, Ix1> {}

 impl<T: Debug + Display + Copy + Sized> Array1<T> for ArrayBase<OwnedRepr<T>, Ix1> {
    fn slice<'a>(&'a self, range: Range<usize>) -> Box<dyn ArrayView1<T> + 'a> {
@@ -841,7 +841,7 @@ mod tests {
        ));
        for (i, eigen_values_i) in eigen_values.iter().enumerate() {
            assert!((eigen_values_i - evd.d[i]).abs() < 1e-4);
-            assert!((0f64 - evd.e[i]).abs() < std::f64::EPSILON);
+            assert!((0f64 - evd.e[i]).abs() < f64::EPSILON);
        }
    }
    #[cfg_attr(
@@ -875,7 +875,7 @@ mod tests {
        ));
        for (i, eigen_values_i) in eigen_values.iter().enumerate() {
            assert!((eigen_values_i - evd.d[i]).abs() < 1e-4);
-            assert!((0f64 - evd.e[i]).abs() < std::f64::EPSILON);
+            assert!((0f64 - evd.e[i]).abs() < f64::EPSILON);
        }
    }
    #[cfg_attr(
@@ -142,7 +142,6 @@ pub trait MatrixPreprocessing<T: RealNumber>: MutArrayView2<T> + Clone {
    ///
    /// assert_eq!(a, expected);
    /// ```
-
    fn binarize_mut(&mut self, threshold: T) {
        let (nrows, ncols) = self.shape();
        for row in 0..nrows {
@@ -217,8 +216,8 @@ mod tests {
        let expected_0 = vec![0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0];
        let expected_1 = vec![1.25, 1.25];

-        assert!(m.var(0).approximate_eq(&expected_0, std::f64::EPSILON));
-        assert!(m.var(1).approximate_eq(&expected_1, std::f64::EPSILON));
+        assert!(m.var(0).approximate_eq(&expected_0, f64::EPSILON));
+        assert!(m.var(1).approximate_eq(&expected_1, f64::EPSILON));
        assert_eq!(
            m.mean(0),
            vec![0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
@@ -48,11 +48,9 @@ pub struct SVD<T: Number + RealNumber, M: SVDDecomposable<T>> {
    pub V: M,
    /// Singular values of the original matrix
    pub s: Vec<T>,
-    ///
    m: usize,
-    ///
    n: usize,
-    ///
+    /// Tolerance
    tol: T,
 }

@@ -27,9 +27,9 @@ use crate::error::Failed;
 use crate::linalg::basic::arrays::{Array, Array1, Array2, ArrayView1, MutArrayView1};
 use crate::numbers::floatnum::FloatNumber;

-///
+/// Trait for Biconjugate Gradient Solver
 pub trait BiconjugateGradientSolver<'a, T: FloatNumber, X: Array2<T>> {
-    ///
+    /// Solve Ax = b
    fn solve_mut(
        &self,
        a: &'a X,
@@ -109,7 +109,7 @@ pub trait BiconjugateGradientSolver<'a, T: FloatNumber, X: Array2<T>> {
        Ok(err)
    }

-    ///
+    /// solve preconditioner
    fn solve_preconditioner(&self, a: &'a X, b: &[T], x: &mut [T]) {
        let diag = Self::diag(a);
        let n = diag.len();
@@ -133,7 +133,7 @@ pub trait BiconjugateGradientSolver<'a, T: FloatNumber, X: Array2<T>> {
        y.copy_from(&x.xa(true, a));
    }

-    ///
+    /// Extract the diagonal from a matrix
    fn diag(a: &X) -> Vec<T> {
        let (nrows, ncols) = a.shape();
        let n = nrows.min(ncols);
@@ -345,6 +345,7 @@ impl<TX: FloatNumber + RealNumber, TY: Number, X: Array2<TX>, Y: Array1<TY>>
                l1_reg * gamma,
                parameters.max_iter,
                TX::from_f64(parameters.tol).unwrap(),
+                true,
            )?;

            for i in 0..p {
@@ -371,6 +372,7 @@ impl<TX: FloatNumber + RealNumber, TY: Number, X: Array2<TX>, Y: Array1<TY>>
                l1_reg * gamma,
                parameters.max_iter,
                TX::from_f64(parameters.tol).unwrap(),
+                true,
            )?;

            for i in 0..p {
@@ -9,7 +9,7 @@
 //!
 //! Lasso coefficient estimates solve the problem:
 //!
-//! \\[\underset{\beta}{minimize} \space \space \sum_{i=1}^n \left( y_i - \beta_0 - \sum_{j=1}^p \beta_jx_{ij} \right)^2 + \alpha \sum_{j=1}^p \lVert \beta_j \rVert_1\\]
+//! \\[\underset{\beta}{minimize} \space \space \frac{1}{n} \sum_{i=1}^n \left( y_i - \beta_0 - \sum_{j=1}^p \beta_jx_{ij} \right)^2 + \alpha \sum_{j=1}^p \lVert \beta_j \rVert_1\\]
 //!
 //! This problem is solved with an interior-point method that is comparable to coordinate descent in solving large problems with modest accuracy,
 //! but is able to solve them with high accuracy with relatively small additional computational cost.
@@ -53,6 +53,9 @@ pub struct LassoParameters {
    #[cfg_attr(feature = "serde", serde(default))]
    /// The maximum number of iterations
    pub max_iter: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// If false, force the intercept parameter (beta_0) to be zero.
+    pub fit_intercept: bool,
 }

 #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
@@ -86,6 +89,12 @@ impl LassoParameters {
        self.max_iter = max_iter;
        self
    }
+
+    /// If false, force the intercept parameter (beta_0) to be zero.
+    pub fn with_fit_intercept(mut self, fit_intercept: bool) -> Self {
+        self.fit_intercept = fit_intercept;
+        self
+    }
 }

 impl Default for LassoParameters {
@@ -95,6 +104,7 @@ impl Default for LassoParameters {
            normalize: true,
            tol: 1e-4,
            max_iter: 1000,
+            fit_intercept: true,
        }
    }
 }
@@ -118,8 +128,8 @@ impl<TX: FloatNumber + RealNumber, TY: Number, X: Array2<TX>, Y: Array1<TY>>
 {
    fn new() -> Self {
        Self {
-            coefficients: Option::None,
-            intercept: Option::None,
+            coefficients: None,
+            intercept: None,
            _phantom_ty: PhantomData,
            _phantom_y: PhantomData,
        }
@@ -155,6 +165,9 @@ pub struct LassoSearchParameters {
    #[cfg_attr(feature = "serde", serde(default))]
    /// The maximum number of iterations
    pub max_iter: Vec<usize>,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// If false, force the intercept parameter (beta_0) to be zero.
+    pub fit_intercept: Vec<bool>,
 }

 /// Lasso grid search iterator
@@ -164,6 +177,7 @@ pub struct LassoSearchParametersIterator {
    current_normalize: usize,
    current_tol: usize,
    current_max_iter: usize,
+    current_fit_intercept: usize,
 }

 impl IntoIterator for LassoSearchParameters {
@@ -177,6 +191,7 @@ impl IntoIterator for LassoSearchParameters {
            current_normalize: 0,
            current_tol: 0,
            current_max_iter: 0,
+            current_fit_intercept: 0,
        }
    }
 }
@@ -189,6 +204,7 @@ impl Iterator for LassoSearchParametersIterator {
            && self.current_normalize == self.lasso_search_parameters.normalize.len()
            && self.current_tol == self.lasso_search_parameters.tol.len()
            && self.current_max_iter == self.lasso_search_parameters.max_iter.len()
+            && self.current_fit_intercept == self.lasso_search_parameters.fit_intercept.len()
        {
            return None;
        }
@@ -198,6 +214,7 @@ impl Iterator for LassoSearchParametersIterator {
            normalize: self.lasso_search_parameters.normalize[self.current_normalize],
            tol: self.lasso_search_parameters.tol[self.current_tol],
            max_iter: self.lasso_search_parameters.max_iter[self.current_max_iter],
+            fit_intercept: self.lasso_search_parameters.fit_intercept[self.current_fit_intercept],
        };

        if self.current_alpha + 1 < self.lasso_search_parameters.alpha.len() {
@@ -214,11 +231,19 @@ impl Iterator for LassoSearchParametersIterator {
            self.current_normalize = 0;
            self.current_tol = 0;
            self.current_max_iter += 1;
+        } else if self.current_fit_intercept + 1 < self.lasso_search_parameters.fit_intercept.len()
+        {
+            self.current_alpha = 0;
+            self.current_normalize = 0;
+            self.current_tol = 0;
+            self.current_max_iter = 0;
+            self.current_fit_intercept += 1;
        } else {
            self.current_alpha += 1;
            self.current_normalize += 1;
            self.current_tol += 1;
            self.current_max_iter += 1;
+            self.current_fit_intercept += 1;
        }

        Some(next)
@@ -234,6 +259,7 @@ impl Default for LassoSearchParameters {
            normalize: vec![default_params.normalize],
            tol: vec![default_params.tol],
            max_iter: vec![default_params.max_iter],
+            fit_intercept: vec![default_params.fit_intercept],
        }
    }
 }
@@ -246,7 +272,7 @@ impl<TX: FloatNumber + RealNumber, TY: Number, X: Array2<TX>, Y: Array1<TY>> Las
    pub fn fit(x: &X, y: &Y, parameters: LassoParameters) -> Result<Lasso<TX, TY, X, Y>, Failed> {
        let (n, p) = x.shape();

-        if n <= p {
+        if n < p {
            return Err(Failed::fit(
                "Number of rows in X should be >= number of columns in X",
            ));
@@ -283,19 +309,23 @@ impl<TX: FloatNumber + RealNumber, TY: Number, X: Array2<TX>, Y: Array1<TY>> Las
                l1_reg,
                parameters.max_iter,
                TX::from_f64(parameters.tol).unwrap(),
+                parameters.fit_intercept,
            )?;

            for (j, col_std_j) in col_std.iter().enumerate().take(p) {
                w[j] /= *col_std_j;
            }

-            let mut b = TX::zero();
+            let b = if parameters.fit_intercept {
+                let mut xw_mean = TX::zero();
+                for (i, col_mean_i) in col_mean.iter().enumerate().take(p) {
+                    xw_mean += w[i] * *col_mean_i;
+                }

-            for (i, col_mean_i) in col_mean.iter().enumerate().take(p) {
-                b += w[i] * *col_mean_i;
-            }
-
-            b = TX::from_f64(y.mean_by()).unwrap() - b;
+                Some(TX::from_f64(y.mean_by()).unwrap() - xw_mean)
+            } else {
+                None
+            };
            (X::from_column(&w), b)
        } else {
            let mut optimizer = InteriorPointOptimizer::new(x, p);
@@ -306,13 +336,21 @@ impl<TX: FloatNumber + RealNumber, TY: Number, X: Array2<TX>, Y: Array1<TY>> Las
                l1_reg,
                parameters.max_iter,
                TX::from_f64(parameters.tol).unwrap(),
+                parameters.fit_intercept,
            )?;

-            (X::from_column(&w), TX::from_f64(y.mean_by()).unwrap())
+            (
+                X::from_column(&w),
+                if parameters.fit_intercept {
+                    Some(TX::from_f64(y.mean_by()).unwrap())
+                } else {
+                    None
+                },
+            )
        };

        Ok(Lasso {
-            intercept: Some(b),
+            intercept: b,
            coefficients: Some(w),
            _phantom_ty: PhantomData,
            _phantom_y: PhantomData,
@@ -369,6 +407,7 @@ impl<TX: FloatNumber + RealNumber, TY: Number, X: Array2<TX>, Y: Array1<TY>> Las
 #[cfg(test)]
 mod tests {
    use super::*;
+    use crate::linalg::basic::arrays::Array;
    use crate::linalg::basic::matrix::DenseMatrix;
    use crate::metrics::mean_absolute_error;

@@ -377,30 +416,28 @@ mod tests {
        let parameters = LassoSearchParameters {
            alpha: vec![0., 1.],
            max_iter: vec![10, 100],
+            fit_intercept: vec![false, true],
            ..Default::default()
        };
-        let mut iter = parameters.into_iter();
-        let next = iter.next().unwrap();
-        assert_eq!(next.alpha, 0.);
-        assert_eq!(next.max_iter, 10);
-        let next = iter.next().unwrap();
-        assert_eq!(next.alpha, 1.);
-        assert_eq!(next.max_iter, 10);
-        let next = iter.next().unwrap();
-        assert_eq!(next.alpha, 0.);
-        assert_eq!(next.max_iter, 100);
-        let next = iter.next().unwrap();
-        assert_eq!(next.alpha, 1.);
-        assert_eq!(next.max_iter, 100);
+
+        let mut iter = parameters.clone().into_iter();
+        for current_fit_intercept in 0..parameters.fit_intercept.len() {
+            for current_max_iter in 0..parameters.max_iter.len() {
+                for current_alpha in 0..parameters.alpha.len() {
+                    let next = iter.next().unwrap();
+                    assert_eq!(next.alpha, parameters.alpha[current_alpha]);
+                    assert_eq!(next.max_iter, parameters.max_iter[current_max_iter]);
+                    assert_eq!(
+                        next.fit_intercept,
+                        parameters.fit_intercept[current_fit_intercept]
+                    );
+                }
+            }
+        }
        assert!(iter.next().is_none());
    }

-    #[cfg_attr(
-        all(target_arch = "wasm32", not(target_os = "wasi")),
-        wasm_bindgen_test::wasm_bindgen_test
-    )]
-    #[test]
-    fn lasso_fit_predict() {
+    fn get_example_x_y() -> (DenseMatrix<f64>, Vec<f64>) {
        let x = DenseMatrix::from_2d_array(&[
            &[234.289, 235.6, 159.0, 107.608, 1947., 60.323],
            &[259.426, 232.5, 145.6, 108.632, 1948., 61.122],
@@ -426,6 +463,17 @@ mod tests {
            114.2, 115.7, 116.9,
        ];

+        (x, y)
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn lasso_fit_predict() {
+        let (x, y) = get_example_x_y();
+
        let y_hat = Lasso::fit(&x, &y, Default::default())
            .and_then(|lr| lr.predict(&x))
            .unwrap();
@@ -440,6 +488,7 @@ mod tests {
                normalize: false,
                tol: 1e-4,
                max_iter: 1000,
+                fit_intercept: true,
            },
        )
        .and_then(|lr| lr.predict(&x))
@@ -448,35 +497,76 @@ mod tests {
        assert!(mean_absolute_error(&y_hat, &y) < 2.0);
    }

+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn test_full_rank_x() {
+        // x: randn(3,3) * 10, demean, then round to 2 decimal points
+        // y = x @ [10.0, 0.2, -3.0], round to 2 decimal points
+        let param = LassoParameters::default()
+            .with_normalize(false)
+            .with_alpha(200.0);
+        let x = DenseMatrix::from_2d_array(&[
+            &[-8.9, -2.24, 8.89],
+            &[-4.02, 8.89, 12.33],
+            &[12.92, -6.65, -21.22],
+        ])
+        .unwrap();
+
+        let y = vec![-116.12, -75.41, 191.53];
+        let w = Lasso::fit(&x, &y, param)
+            .unwrap()
+            .coefficients()
+            .iterator(0)
+            .copied()
+            .collect();
+
+        let expected_w = vec![5.20289531, 0., -5.32823882]; // by coordinate descent
+        assert!(mean_absolute_error(&w, &expected_w) < 1e-3); // actual mean_absolute_error is about 2e-4
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn test_fit_intercept() {
+        let (x, y) = get_example_x_y();
+        let fit_result = Lasso::fit(
+            &x,
+            &y,
+            LassoParameters {
+                alpha: 0.1,
+                normalize: false,
+                tol: 1e-8,
+                max_iter: 1000,
+                fit_intercept: false,
+            },
+        )
+        .unwrap();
+
+        let w = fit_result.coefficients().iterator(0).copied().collect();
+        // by sklearn LassoLars. coordinate descent doesn't converge well
+        let expected_w = vec![
+            0.18335684,
+            0.02106526,
+            0.00703214,
+            -1.35952542,
+            0.09295222,
+            0.,
+        ];
+        assert!(mean_absolute_error(&w, &expected_w) < 1e-6);
+        assert_eq!(fit_result.intercept, None);
+    }
+
    // TODO: serialization for the new DenseMatrix needs to be implemented
    // #[cfg_attr(all(target_arch = "wasm32", not(target_os = "wasi")), wasm_bindgen_test::wasm_bindgen_test)]
    // #[test]
    // #[cfg(feature = "serde")]
    // fn serde() {
-    //     let x = DenseMatrix::from_2d_array(&[
-    //         &[234.289, 235.6, 159.0, 107.608, 1947., 60.323],
-    //         &[259.426, 232.5, 145.6, 108.632, 1948., 61.122],
-    //         &[258.054, 368.2, 161.6, 109.773, 1949., 60.171],
-    //         &[284.599, 335.1, 165.0, 110.929, 1950., 61.187],
-    //         &[328.975, 209.9, 309.9, 112.075, 1951., 63.221],
-    //         &[346.999, 193.2, 359.4, 113.270, 1952., 63.639],
-    //         &[365.385, 187.0, 354.7, 115.094, 1953., 64.989],
-    //         &[363.112, 357.8, 335.0, 116.219, 1954., 63.761],
-    //         &[397.469, 290.4, 304.8, 117.388, 1955., 66.019],
-    //         &[419.180, 282.2, 285.7, 118.734, 1956., 67.857],
-    //         &[442.769, 293.6, 279.8, 120.445, 1957., 68.169],
-    //         &[444.546, 468.1, 263.7, 121.950, 1958., 66.513],
-    //         &[482.704, 381.3, 255.2, 123.366, 1959., 68.655],
-    //         &[502.601, 393.1, 251.4, 125.368, 1960., 69.564],
-    //         &[518.173, 480.6, 257.2, 127.852, 1961., 69.331],
-    //         &[554.894, 400.7, 282.7, 130.081, 1962., 70.551],
-    //     ]);
-
-    //     let y = vec![
-    //         83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0, 101.2, 104.6, 108.4, 110.8, 112.6,
-    //         114.2, 115.7, 116.9,
-    //     ];
-
+    //     let (x, y) = get_lasso_sample_x_y();
    //     let lr = Lasso::fit(&x, &y, Default::default()).unwrap();

    //     let deserialized_lr: Lasso<f64, f64, DenseMatrix<f64>, Vec<f64>> =
@@ -16,7 +16,7 @@ use crate::linalg::basic::arrays::{Array1, Array2, ArrayView1, MutArray, MutArra
 use crate::linear::bg_solver::BiconjugateGradientSolver;
 use crate::numbers::floatnum::FloatNumber;

-///
+/// Interior Point Optimizer
 pub struct InteriorPointOptimizer<T: FloatNumber, X: Array2<T>> {
    ata: X,
    d1: Vec<T>,
@@ -25,9 +25,8 @@ pub struct InteriorPointOptimizer<T: FloatNumber, X: Array2<T>> {
    prs: Vec<T>,
 }

-///
 impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
-    ///
+    /// Initialize a new Interior Point Optimizer
    pub fn new(a: &X, n: usize) -> InteriorPointOptimizer<T, X> {
        InteriorPointOptimizer {
            ata: a.ab(true, a, false),
@@ -38,7 +37,7 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
        }
    }

-    ///
+    /// Run the optimization
    pub fn optimize(
        &mut self,
        x: &X,
@@ -46,6 +45,7 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
        lambda: T,
        max_iter: usize,
        tol: T,
+        fit_intercept: bool,
    ) -> Result<Vec<T>, Failed> {
        let (n, p) = x.shape();
        let p_f64 = T::from_usize(p).unwrap();
@@ -53,6 +53,7 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
        let lambda = lambda.max(T::epsilon());

        //parameters
+        let max_ls_iter = 100;
        let pcgmaxi = 5000;
        let min_pcgtol = T::from_f64(0.1).unwrap();
        let eta = T::from_f64(1E-3).unwrap();
@@ -62,9 +63,12 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
        let mu = T::two();

        // let y = M::from_row_vector(y.sub_scalar(y.mean_by())).transpose();
-        let y = y.sub_scalar(T::from_f64(y.mean_by()).unwrap());
+        let y = if fit_intercept {
+            y.sub_scalar(T::from_f64(y.mean_by()).unwrap())
+        } else {
+            y.to_owned()
+        };

-        let mut max_ls_iter = 100;
        let mut pitr = 0;
        let mut w = Vec::zeros(p);
        let mut neww = w.clone();
@@ -101,7 +105,7 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {

            // CALCULATE DUALITY GAP
            let xnu = nu.xa(false, x);
-            let max_xnu = xnu.norm(std::f64::INFINITY);
+            let max_xnu = xnu.norm(f64::INFINITY);
            if max_xnu > lambda_f64 {
                let lnu = T::from_f64(lambda_f64 / max_xnu).unwrap();
                nu.mul_scalar_mut(lnu);
@@ -166,7 +170,7 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
            s = T::one();
            let gdx = grad.dot(&dxu);

-            let lsiter = 0;
+            let mut lsiter = 0;
            while lsiter < max_ls_iter {
                for i in 0..p {
                    neww[i] = w[i] + s * dx[i];
@@ -191,7 +195,7 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
                    }
                }
                s = beta * s;
-                max_ls_iter += 1;
+                lsiter += 1;
            }

            if lsiter == max_ls_iter {
@@ -208,7 +212,6 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
        Ok(w)
    }

-    ///
    fn sumlogneg(f: &X) -> T {
        let (n, _) = f.shape();
        let mut sum = T::zero();
@@ -220,11 +223,9 @@ impl<T: FloatNumber, X: Array2<T>> InteriorPointOptimizer<T, X> {
    }
 }

-///
 impl<'a, T: FloatNumber, X: Array2<T>> BiconjugateGradientSolver<'a, T, X>
    for InteriorPointOptimizer<T, X>
 {
-    ///
    fn solve_preconditioner(&self, a: &'a X, b: &[T], x: &mut [T]) {
        let (_, p) = a.shape();

@@ -234,7 +235,6 @@ impl<'a, T: FloatNumber, X: Array2<T>> BiconjugateGradientSolver<'a, T, X>
        }
    }

-    ///
    fn mat_vec_mul(&self, _: &X, x: &Vec<T>, y: &mut Vec<T>) {
        let (_, p) = self.ata.shape();
        let x_slice = Vec::from_slice(x.slice(0..p).as_ref());
@@ -246,7 +246,6 @@ impl<'a, T: FloatNumber, X: Array2<T>> BiconjugateGradientSolver<'a, T, X>
        }
    }

-    ///
    fn mat_t_vec_mul(&self, a: &X, x: &Vec<T>, y: &mut Vec<T>) {
        self.mat_vec_mul(a, x, y);
    }
@@ -183,14 +183,11 @@ pub struct LogisticRegression<
 }

 trait ObjectiveFunction<T: Number + FloatNumber, X: Array2<T>> {
-    ///
    fn f(&self, w_bias: &[T]) -> T;

-    ///
    #[allow(clippy::ptr_arg)]
    fn df(&self, g: &mut Vec<T>, w_bias: &Vec<T>);

-    ///
    #[allow(clippy::ptr_arg)]
    fn partial_dot(w: &[T], x: &X, v_col: usize, m_row: usize) -> T {
        let mut sum = T::zero();
@@ -261,8 +258,8 @@ impl<TX: Number + FloatNumber + RealNumber, TY: Number + Ord, X: Array2<TX>, Y:
    }
 }

-impl<'a, T: Number + FloatNumber, X: Array2<T>> ObjectiveFunction<T, X>
-    for BinaryObjectiveFunction<'a, T, X>
+impl<T: Number + FloatNumber, X: Array2<T>> ObjectiveFunction<T, X>
+    for BinaryObjectiveFunction<'_, T, X>
 {
    fn f(&self, w_bias: &[T]) -> T {
        let mut f = T::zero();
@@ -316,8 +313,8 @@ struct MultiClassObjectiveFunction<'a, T: Number + FloatNumber, X: Array2<T>> {
    _phantom_t: PhantomData<T>,
 }

-impl<'a, T: Number + FloatNumber + RealNumber, X: Array2<T>> ObjectiveFunction<T, X>
-    for MultiClassObjectiveFunction<'a, T, X>
+impl<T: Number + FloatNumber + RealNumber, X: Array2<T>> ObjectiveFunction<T, X>
+    for MultiClassObjectiveFunction<'_, T, X>
 {
    fn f(&self, w_bias: &[T]) -> T {
        let mut f = T::zero();
@@ -629,11 +626,11 @@ mod tests {
        objective.df(&mut g, &vec![1., 2., 3., 4., 5., 6., 7., 8., 9.]);
        objective.df(&mut g, &vec![1., 2., 3., 4., 5., 6., 7., 8., 9.]);

-        assert!((g[0] + 33.000068218163484).abs() < std::f64::EPSILON);
+        assert!((g[0] + 33.000068218163484).abs() < f64::EPSILON);

        let f = objective.f(&[1., 2., 3., 4., 5., 6., 7., 8., 9.]);

-        assert!((f - 408.0052230582765).abs() < std::f64::EPSILON);
+        assert!((f - 408.0052230582765).abs() < f64::EPSILON);

        let objective_reg = MultiClassObjectiveFunction {
            x: &x,
@@ -689,13 +686,13 @@ mod tests {
        objective.df(&mut g, &vec![1., 2., 3.]);
        objective.df(&mut g, &vec![1., 2., 3.]);

-        assert!((g[0] - 26.051064349381285).abs() < std::f64::EPSILON);
-        assert!((g[1] - 10.239000702928523).abs() < std::f64::EPSILON);
-        assert!((g[2] - 3.869294270156324).abs() < std::f64::EPSILON);
+        assert!((g[0] - 26.051064349381285).abs() < f64::EPSILON);
+        assert!((g[1] - 10.239000702928523).abs() < f64::EPSILON);
+        assert!((g[2] - 3.869294270156324).abs() < f64::EPSILON);

        let f = objective.f(&[1., 2., 3.]);

-        assert!((f - 59.76994756647412).abs() < std::f64::EPSILON);
+        assert!((f - 59.76994756647412).abs() < f64::EPSILON);

        let objective_reg = BinaryObjectiveFunction {
            x: &x,
@@ -916,7 +913,7 @@ mod tests {
        let x: DenseMatrix<f32> = DenseMatrix::rand(52181, 94);
        let y1: Vec<i32> = vec![1; 2181];
        let y2: Vec<i32> = vec![0; 50000];
-        let y: Vec<i32> = y1.into_iter().chain(y2.into_iter()).collect();
+        let y: Vec<i32> = y1.into_iter().chain(y2).collect();

        let lr = LogisticRegression::fit(&x, &y, Default::default()).unwrap();
        let lr_reg = LogisticRegression::fit(
@@ -938,12 +935,12 @@ mod tests {
        let x: &DenseMatrix<f64> = &DenseMatrix::rand(52181, 94);
        let y1: Vec<u32> = vec![1; 2181];
        let y2: Vec<u32> = vec![0; 50000];
-        let y: &Vec<u32> = &(y1.into_iter().chain(y2.into_iter()).collect());
+        let y: &Vec<u32> = &(y1.into_iter().chain(y2).collect());
        println!("y vec height: {:?}", y.len());
        println!("x matrix shape: {:?}", x.shape());

        let lr = LogisticRegression::fit(x, y, Default::default()).unwrap();
-        let y_hat = lr.predict(&x).unwrap();
+        let y_hat = lr.predict(x).unwrap();

        println!("y_hat shape: {:?}", y_hat.shape());

@@ -0,0 +1,219 @@
+//! # Cosine Distance Metric
+//!
+//! The cosine distance between two points \\( x \\) and \\( y \\) in n-space is defined as:
+//!
+//! \\[ d(x, y) = 1 - \frac{x \cdot y}{||x|| ||y||} \\]
+//!
+//! where \\( x \cdot y \\) is the dot product of the vectors, and \\( ||x|| \\) and \\( ||y|| \\)
+//! are their respective magnitudes (Euclidean norms).
+//!
+//! Cosine distance measures the angular dissimilarity between vectors, ranging from 0 to 2.
+//! A value of 0 indicates identical direction (parallel vectors), while larger values indicate
+//! greater angular separation.
+//!
+//! Example:
+//!
+//! ```
+//! use smartcore::metrics::distance::Distance;
+//! use smartcore::metrics::distance::cosine::Cosine;
+//!
+//! let x = vec![1., 1.];
+//! let y = vec![2., 2.];
+//!
+//! let cosine_dist: f64 = Cosine::new().distance(&x, &y);
+//! ```
+//!
+//! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
+//! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+#[cfg(feature = "serde")]
+use serde::{Deserialize, Serialize};
+use std::marker::PhantomData;
+
+use crate::linalg::basic::arrays::ArrayView1;
+use crate::numbers::basenum::Number;
+
+use super::Distance;
+
+/// Cosine distance is a measure of the angular dissimilarity between two non-zero vectors in n-space.
+/// It is defined as 1 minus the cosine similarity of the vectors.
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug, Clone)]
+pub struct Cosine<T> {
+    _t: PhantomData<T>,
+}
+
+impl<T: Number> Default for Cosine<T> {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl<T: Number> Cosine<T> {
+    /// Instantiate the initial structure
+    pub fn new() -> Cosine<T> {
+        Cosine { _t: PhantomData }
+    }
+
+    /// Calculate the dot product of two vectors using smartcore's ArrayView1 trait
+    #[inline]
+    pub(crate) fn dot_product<A: ArrayView1<T>>(x: &A, y: &A) -> f64 {
+        if x.shape() != y.shape() {
+            panic!("Input vector sizes are different.");
+        }
+
+        // Use the built-in dot product method from ArrayView1 trait
+        x.dot(y).to_f64().unwrap()
+    }
+
+    /// Calculate the squared magnitude (norm squared) of a vector
+    #[inline]
+    #[allow(dead_code)]
+    pub(crate) fn squared_magnitude<A: ArrayView1<T>>(x: &A) -> f64 {
+        x.iterator(0)
+            .map(|&a| {
+                let val = a.to_f64().unwrap();
+                val * val
+            })
+            .sum()
+    }
+
+    /// Calculate the magnitude (Euclidean norm) of a vector using smartcore's norm2 method
+    #[inline]
+    pub(crate) fn magnitude<A: ArrayView1<T>>(x: &A) -> f64 {
+        // Use the built-in norm2 method from ArrayView1 trait
+        x.norm2()
+    }
+
+    /// Calculate cosine similarity between two vectors
+    #[inline]
+    pub(crate) fn cosine_similarity<A: ArrayView1<T>>(x: &A, y: &A) -> f64 {
+        let dot_product = Self::dot_product(x, y);
+        let magnitude_x = Self::magnitude(x);
+        let magnitude_y = Self::magnitude(y);
+
+        if magnitude_x == 0.0 || magnitude_y == 0.0 {
+            return f64::MIN;
+        }
+
+        dot_product / (magnitude_x * magnitude_y)
+    }
+}
+
+impl<T: Number, A: ArrayView1<T>> Distance<A> for Cosine<T> {
+    fn distance(&self, x: &A, y: &A) -> f64 {
+        let similarity = Cosine::cosine_similarity(x, y);
+        1.0 - similarity
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn cosine_distance_identical_vectors() {
+        let a = vec![1, 2, 3];
+        let b = vec![1, 2, 3];
+
+        let dist: f64 = Cosine::new().distance(&a, &b);
+
+        assert!((dist - 0.0).abs() < 1e-8);
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn cosine_distance_orthogonal_vectors() {
+        let a = vec![1, 0];
+        let b = vec![0, 1];
+
+        let dist: f64 = Cosine::new().distance(&a, &b);
+
+        assert!((dist - 1.0).abs() < 1e-8);
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn cosine_distance_opposite_vectors() {
+        let a = vec![1, 2, 3];
+        let b = vec![-1, -2, -3];
+
+        let dist: f64 = Cosine::new().distance(&a, &b);
+
+        assert!((dist - 2.0).abs() < 1e-8);
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn cosine_distance_general_case() {
+        let a = vec![1.0, 2.0, 3.0];
+        let b = vec![2.0, 1.0, 3.0];
+
+        let dist: f64 = Cosine::new().distance(&a, &b);
+
+        // Expected cosine similarity: (1*2 + 2*1 + 3*3) / (sqrt(1+4+9) * sqrt(4+1+9))
+        // = (2 + 2 + 9) / (sqrt(14) * sqrt(14)) = 13/14 ≈ 0.9286
+        // So cosine distance = 1 - 13/14 = 1/14 ≈ 0.0714
+        let expected_dist = 1.0 - (13.0 / 14.0);
+        assert!((dist - expected_dist).abs() < 1e-8);
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    #[should_panic(expected = "Input vector sizes are different.")]
+    fn cosine_distance_different_sizes() {
+        let a = vec![1, 2];
+        let b = vec![1, 2, 3];
+
+        let _dist: f64 = Cosine::new().distance(&a, &b);
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn cosine_distance_zero_vector() {
+        let a = vec![0, 0, 0];
+        let b = vec![1, 2, 3];
+
+        let dist: f64 = Cosine::new().distance(&a, &b);
+        assert!(dist > 1e300)
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn cosine_distance_float_precision() {
+        let a = vec![1.0f32, 2.0, 3.0];
+        let b = vec![4.0f32, 5.0, 6.0];
+
+        let dist: f64 = Cosine::new().distance(&a, &b);
+
+        // Calculate expected value manually
+        let dot_product = 1.0 * 4.0 + 2.0 * 5.0 + 3.0 * 6.0; // = 32
+        let mag_a = (1.0 * 1.0 + 2.0 * 2.0 + 3.0 * 3.0_f64).sqrt(); // = sqrt(14)
+        let mag_b = (4.0 * 4.0 + 5.0 * 5.0 + 6.0 * 6.0_f64).sqrt(); // = sqrt(77)
+        let expected_similarity = dot_product / (mag_a * mag_b);
+        let expected_distance = 1.0 - expected_similarity;
+
+        assert!((dist - expected_distance).abs() < 1e-6);
+    }
+}
@@ -13,6 +13,8 @@
 //! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
 //! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

+/// Cosine distance
+pub mod cosine;
 /// Euclidean Distance is the straight-line distance between two points in Euclidean spacere that presents the shortest distance between these points.
 pub mod euclidian;
 /// Hamming Distance between two strings is the number of positions at which the corresponding symbols are different.
@@ -4,7 +4,9 @@
 //!
 //! \\[precision = \frac{tp}{tp + fp}\\]
 //!
-//! where tp (true positive) - correct result, fp (false positive) - unexpected result
+//! where tp (true positive) - correct result, fp (false positive) - unexpected result.
+//! For binary classification, this is precision for the positive class (assumed to be 1.0).
+//! For multiclass, this is macro-averaged precision (average of per-class precisions).
 //!
 //! Example:
 //!
@@ -19,7 +21,8 @@
 //!
 //! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
 //! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
-use std::collections::HashSet;
+
+use std::collections::{HashMap, HashSet};
 use std::marker::PhantomData;

 #[cfg(feature = "serde")]
@@ -61,33 +64,63 @@ impl<T: RealNumber> Metrics<T> for Precision<T> {
            );
        }

-        let mut classes = HashSet::new();
-        for i in 0..y_true.shape() {
-            classes.insert(y_true.get(i).to_f64_bits());
-        }
-        let classes = classes.len();
+        let n = y_true.shape();

-        let mut tp = 0;
-        let mut fp = 0;
-        for i in 0..y_true.shape() {
-            if y_pred.get(i) == y_true.get(i) {
-                if classes == 2 {
-                    if *y_true.get(i) == T::one() {
+        let mut classes_set: HashSet<u64> = HashSet::new();
+        for i in 0..n {
+            classes_set.insert(y_true.get(i).to_f64_bits());
+        }
+        let classes: usize = classes_set.len();
+
+        if classes == 2 {
+            // Binary case: precision for positive class (assumed T::one())
+            let positive = T::one();
+            let mut tp: usize = 0;
+            let mut fp_count: usize = 0;
+            for i in 0..n {
+                let t = *y_true.get(i);
+                let p = *y_pred.get(i);
+                if p == t {
+                    if t == positive {
                        tp += 1;
                    }
-                } else {
-                    tp += 1;
-                }
-            } else if classes == 2 {
-                if *y_true.get(i) == T::one() {
-                    fp += 1;
+                } else if t != positive {
+                    fp_count += 1;
                }
+            }
+            if tp + fp_count == 0 {
+                0.0
            } else {
-                fp += 1;
+                tp as f64 / (tp + fp_count) as f64
+            }
+        } else {
+            // Multiclass case: macro-averaged precision
+            let mut predicted: HashMap<u64, usize> = HashMap::new();
+            let mut tp_map: HashMap<u64, usize> = HashMap::new();
+            for i in 0..n {
+                let p_bits = y_pred.get(i).to_f64_bits();
+                *predicted.entry(p_bits).or_insert(0) += 1;
+                if *y_true.get(i) == *y_pred.get(i) {
+                    *tp_map.entry(p_bits).or_insert(0) += 1;
+                }
+            }
+            let mut precision_sum = 0.0;
+            for &bits in &classes_set {
+                let pred_count = *predicted.get(&bits).unwrap_or(&0);
+                let tp = *tp_map.get(&bits).unwrap_or(&0);
+                let prec = if pred_count > 0 {
+                    tp as f64 / pred_count as f64
+                } else {
+                    0.0
+                };
+                precision_sum += prec;
+            }
+            if classes == 0 {
+                0.0
+            } else {
+                precision_sum / classes as f64
            }
        }
-
-        tp as f64 / (tp as f64 + fp as f64)
    }
 }

@@ -114,7 +147,7 @@ mod tests {
        let y_pred: Vec<f64> = vec![0., 0., 1., 1., 1., 1.];

        let score3: f64 = Precision::new().get_score(&y_true, &y_pred);
-        assert!((score3 - 0.6666666666).abs() < 1e-8);
+        assert!((score3 - 0.5).abs() < 1e-8);
    }

    #[cfg_attr(
@@ -132,4 +165,36 @@ mod tests {
        assert!((score1 - 0.333333333).abs() < 1e-8);
        assert!((score2 - 1.0).abs() < 1e-8);
    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn precision_multiclass_imbalanced() {
+        let y_true: Vec<f64> = vec![0., 0., 1., 2., 2., 2.];
+        let y_pred: Vec<f64> = vec![0., 1., 1., 2., 0., 2.];
+
+        let score: f64 = Precision::new().get_score(&y_true, &y_pred);
+        let expected = (0.5 + 0.5 + 1.0) / 3.0;
+        assert!((score - expected).abs() < 1e-8);
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn precision_multiclass_unpredicted_class() {
+        let y_true: Vec<f64> = vec![0., 0., 1., 2., 2., 2., 3.];
+        let y_pred: Vec<f64> = vec![0., 1., 1., 2., 0., 2., 0.];
+
+        let score: f64 = Precision::new().get_score(&y_true, &y_pred);
+        // Class 0: pred=3, tp=1 -> 1/3 ≈0.333
+        // Class 1: pred=2, tp=1 -> 0.5
+        // Class 2: pred=2, tp=2 -> 1.0
+        // Class 3: pred=0, tp=0 -> 0.0
+        let expected = (1.0 / 3.0 + 0.5 + 1.0 + 0.0) / 4.0;
+        assert!((score - expected).abs() < 1e-8);
+    }
 }
@@ -4,7 +4,9 @@
 //!
 //! \\[recall = \frac{tp}{tp + fn}\\]
 //!
-//! where tp (true positive) - correct result, fn (false negative) - missing result
+//! where tp (true positive) - correct result, fn (false negative) - missing result.
+//! For binary classification, this is recall for the positive class (assumed to be 1.0).
+//! For multiclass, this is macro-averaged recall (average of per-class recalls).
 //!
 //! Example:
 //!
@@ -20,8 +22,7 @@
 //! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
 //! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

-use std::collections::HashSet;
-use std::convert::TryInto;
+use std::collections::{HashMap, HashSet};
 use std::marker::PhantomData;

 #[cfg(feature = "serde")]
@@ -52,7 +53,7 @@ impl<T: RealNumber> Metrics<T> for Recall<T> {
        }
    }
    /// Calculated recall score
-    /// * `y_true` - cround truth (correct) labels.
+    /// * `y_true` - ground truth (correct) labels.
    /// * `y_pred` - predicted labels, as returned by a classifier.
    fn get_score(&self, y_true: &dyn ArrayView1<T>, y_pred: &dyn ArrayView1<T>) -> f64 {
        if y_true.shape() != y_pred.shape() {
@@ -63,32 +64,57 @@ impl<T: RealNumber> Metrics<T> for Recall<T> {
            );
        }

-        let mut classes = HashSet::new();
-        for i in 0..y_true.shape() {
-            classes.insert(y_true.get(i).to_f64_bits());
-        }
-        let classes: i64 = classes.len().try_into().unwrap();
+        let n = y_true.shape();

-        let mut tp = 0;
-        let mut fne = 0;
-        for i in 0..y_true.shape() {
-            if y_pred.get(i) == y_true.get(i) {
-                if classes == 2 {
-                    if *y_true.get(i) == T::one() {
+        let mut classes_set = HashSet::new();
+        for i in 0..n {
+            classes_set.insert(y_true.get(i).to_f64_bits());
+        }
+        let classes: usize = classes_set.len();
+
+        if classes == 2 {
+            // Binary case: recall for positive class (assumed T::one())
+            let positive = T::one();
+            let mut tp: usize = 0;
+            let mut fn_count: usize = 0;
+            for i in 0..n {
+                let t = *y_true.get(i);
+                let p = *y_pred.get(i);
+                if p == t {
+                    if t == positive {
                        tp += 1;
                    }
-                } else {
-                    tp += 1;
-                }
-            } else if classes == 2 {
-                if *y_true.get(i) != T::one() {
-                    fne += 1;
+                } else if t == positive {
+                    fn_count += 1;
                }
+            }
+            if tp + fn_count == 0 {
+                0.0
            } else {
-                fne += 1;
+                tp as f64 / (tp + fn_count) as f64
+            }
+        } else {
+            // Multiclass case: macro-averaged recall
+            let mut support: HashMap<u64, usize> = HashMap::new();
+            let mut tp_map: HashMap<u64, usize> = HashMap::new();
+            for i in 0..n {
+                let t_bits = y_true.get(i).to_f64_bits();
+                *support.entry(t_bits).or_insert(0) += 1;
+                if *y_true.get(i) == *y_pred.get(i) {
+                    *tp_map.entry(t_bits).or_insert(0) += 1;
+                }
+            }
+            let mut recall_sum = 0.0;
+            for (&bits, &sup) in &support {
+                let tp = *tp_map.get(&bits).unwrap_or(&0);
+                recall_sum += tp as f64 / sup as f64;
+            }
+            if support.is_empty() {
+                0.0
+            } else {
+                recall_sum / support.len() as f64
            }
        }
-        tp as f64 / (tp as f64 + fne as f64)
    }
 }

@@ -115,7 +141,7 @@ mod tests {
        let y_pred: Vec<f64> = vec![0., 0., 1., 1., 1., 1.];

        let score3: f64 = Recall::new().get_score(&y_true, &y_pred);
-        assert!((score3 - 0.5).abs() < 1e-8);
+        assert!((score3 - (2.0 / 3.0)).abs() < 1e-8);
    }

    #[cfg_attr(
@@ -133,4 +159,18 @@ mod tests {
        assert!((score1 - 0.333333333).abs() < 1e-8);
        assert!((score2 - 1.0).abs() < 1e-8);
    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn recall_multiclass_imbalanced() {
+        let y_true: Vec<f64> = vec![0., 0., 1., 2., 2., 2.];
+        let y_pred: Vec<f64> = vec![0., 1., 1., 2., 0., 2.];
+
+        let score: f64 = Recall::new().get_score(&y_true, &y_pred);
+        let expected = (0.5 + 1.0 + (2.0 / 3.0)) / 3.0;
+        assert!((score - expected).abs() < 1e-8);
+    }
 }
@@ -257,8 +257,7 @@ impl<TY: Number + Ord + Unsigned> BernoulliNBDistribution<TY> {
    /// Fits the distribution to a NxM matrix where N is number of samples and M is number of features.
    /// * `x` - training data.
    /// * `y` - vector with target values (classes) of length N.
-    /// * `priors` - Optional vector with prior probabilities of the classes. If not defined,
-    /// priors are adjusted according to the data.
+    /// * `priors` - Optional vector with prior probabilities of the classes. If not defined, priors are adjusted according to the data.
    /// * `alpha` - Additive (Laplace/Lidstone) smoothing parameter.
    /// * `binarize` - Threshold for binarizing.
    fn fit<TX: Number + PartialOrd, X: Array2<TX>, Y: Array1<TY>>(
@@ -402,10 +401,10 @@ impl<TX: Number + PartialOrd, TY: Number + Ord + Unsigned, X: Array2<TX>, Y: Arr
 {
    /// Fits BernoulliNB with given data
    /// * `x` - training data of size NxM where N is the number of samples and M is the number of
-    /// features.
+    ///   features.
    /// * `y` - vector with target values (classes) of length N.
    /// * `parameters` - additional parameters like class priors, alpha for smoothing and
-    /// binarizing threshold.
+    ///   binarizing threshold.
    pub fn fit(x: &X, y: &Y, parameters: BernoulliNBParameters<TX>) -> Result<Self, Failed> {
        let distribution = if let Some(threshold) = parameters.binarize {
            BernoulliNBDistribution::fit(
@@ -427,6 +426,7 @@ impl<TX: Number + PartialOrd, TY: Number + Ord + Unsigned, X: Array2<TX>, Y: Arr

    /// Estimates the class labels for the provided data.
    /// * `x` - data of shape NxM where N is number of data points to estimate and M is number of features.
+    ///
    /// Returns a vector of size N with class estimates.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
        if let Some(threshold) = self.binarize {
@@ -95,7 +95,7 @@ impl<T: Number + Unsigned> PartialEq for CategoricalNBDistribution<T> {
                        return false;
                    }
                    for (a_i_j, b_i_j) in a_i.iter().zip(b_i.iter()) {
-                        if (*a_i_j - *b_i_j).abs() > std::f64::EPSILON {
+                        if (*a_i_j - *b_i_j).abs() > f64::EPSILON {
                            return false;
                        }
                    }
@@ -363,7 +363,7 @@ impl<T: Number + Unsigned, X: Array2<T>, Y: Array1<T>> Predictor<X, Y> for Categ
 impl<T: Number + Unsigned, X: Array2<T>, Y: Array1<T>> CategoricalNB<T, X, Y> {
    /// Fits CategoricalNB with given data
    /// * `x` - training data of size NxM where N is the number of samples and M is the number of
-    /// features.
+    ///   features.
    /// * `y` - vector with target values (classes) of length N.
    /// * `parameters` - additional parameters like alpha for smoothing
    pub fn fit(x: &X, y: &Y, parameters: CategoricalNBParameters) -> Result<Self, Failed> {
@@ -375,6 +375,7 @@ impl<T: Number + Unsigned, X: Array2<T>, Y: Array1<T>> CategoricalNB<T, X, Y> {

    /// Estimates the class labels for the provided data.
    /// * `x` - data of shape NxM where N is number of data points to estimate and M is number of features.
+    ///
    /// Returns a vector of size N with class estimates.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
        self.inner.as_ref().unwrap().predict(x)
@@ -174,8 +174,7 @@ impl<TY: Number + Ord + Unsigned> GaussianNBDistribution<TY> {
    /// Fits the distribution to a NxM matrix where N is number of samples and M is number of features.
    /// * `x` - training data.
    /// * `y` - vector with target values (classes) of length N.
-    /// * `priors` - Optional vector with prior probabilities of the classes. If not defined,
-    /// priors are adjusted according to the data.
+    /// * `priors` - Optional vector with prior probabilities of the classes. If not defined, priors are adjusted according to the data.
    pub fn fit<TX: Number + RealNumber, X: Array2<TX>, Y: Array1<TY>>(
        x: &X,
        y: &Y,
@@ -317,7 +316,7 @@ impl<TX: Number + RealNumber, TY: Number + Ord + Unsigned, X: Array2<TX>, Y: Arr
 {
    /// Fits GaussianNB with given data
    /// * `x` - training data of size NxM where N is the number of samples and M is the number of
-    /// features.
+    ///   features.
    /// * `y` - vector with target values (classes) of length N.
    /// * `parameters` - additional parameters like class priors.
    pub fn fit(x: &X, y: &Y, parameters: GaussianNBParameters) -> Result<Self, Failed> {
@@ -328,6 +327,7 @@ impl<TX: Number + RealNumber, TY: Number + Ord + Unsigned, X: Array2<TX>, Y: Arr

    /// Estimates the class labels for the provided data.
    /// * `x` - data of shape NxM where N is number of data points to estimate and M is number of features.
+    ///
    /// Returns a vector of size N with class estimates.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
        self.inner.as_ref().unwrap().predict(x)
@@ -40,7 +40,7 @@ use crate::linalg::basic::arrays::{Array1, Array2, ArrayView1};
 use crate::numbers::basenum::Number;
 #[cfg(feature = "serde")]
 use serde::{Deserialize, Serialize};
-use std::{cmp::Ordering, marker::PhantomData};
+use std::marker::PhantomData;

 /// Distribution used in the Naive Bayes classifier.
 pub(crate) trait NBDistribution<X: Number, Y: Number>: Clone {
@@ -89,44 +89,45 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>, D: NBDistribution<TX,

    /// Estimates the class labels for the provided data.
    /// * `x` - data of shape NxM where N is number of data points to estimate and M is number of features.
+    ///
    /// Returns a vector of size N with class estimates.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
        let y_classes = self.distribution.classes();
-        let predictions = x
-            .row_iter()
-            .map(|row| {
-                y_classes
-                    .iter()
-                    .enumerate()
-                    .map(|(class_index, class)| {
-                        (
-                            class,
-                            self.distribution.log_likelihood(class_index, &row)
-                                + self.distribution.prior(class_index).ln(),
-                        )
-                    })
-                    // For some reason, the max_by method cannot use NaNs for finding the maximum value, it panics.
-                    // NaN must be considered as minimum values,
-                    // therefore it's like NaNs would not be considered for choosing the maximum value.
-                    // So we need to handle this case for avoiding panicking by using `Option::unwrap`.
-                    .max_by(|(_, p1), (_, p2)| match p1.partial_cmp(p2) {
-                        Some(ordering) => ordering,
-                        None => {
-                            if p1.is_nan() {
-                                Ordering::Less
-                            } else if p2.is_nan() {
-                                Ordering::Greater
-                            } else {
-                                Ordering::Equal
-                            }
-                        }
-                    })
-                    .map(|(prediction, _probability)| *prediction)
-                    .ok_or_else(|| Failed::predict("Failed to predict, there is no result"))
-            })
-            .collect::<Result<Vec<TY>, Failed>>()?;
-        let y_hat = Y::from_vec_slice(&predictions);
-        Ok(y_hat)
+
+        if y_classes.is_empty() {
+            return Err(Failed::predict("Failed to predict, no classes available"));
+        }
+
+        let (rows, _) = x.shape();
+        let mut predictions = Vec::with_capacity(rows);
+        let mut all_probs_nan = true;
+
+        for row_index in 0..rows {
+            let row = x.get_row(row_index);
+            let mut max_log_prob = f64::NEG_INFINITY;
+            let mut max_class = None;
+
+            for (class_index, class) in y_classes.iter().enumerate() {
+                let log_likelihood = self.distribution.log_likelihood(class_index, &row);
+                let log_prob = log_likelihood + self.distribution.prior(class_index).ln();
+
+                if !log_prob.is_nan() && log_prob > max_log_prob {
+                    max_log_prob = log_prob;
+                    max_class = Some(*class);
+                    all_probs_nan = false;
+                }
+            }
+
+            predictions.push(max_class.unwrap_or(y_classes[0]));
+        }
+
+        if all_probs_nan {
+            Err(Failed::predict(
+                "Failed to predict, all probabilities were NaN",
+            ))
+        } else {
+            Ok(Y::from_vec_slice(&predictions))
+        }
    }
 }
 pub mod bernoulli;
@@ -146,7 +147,7 @@ mod tests {
    #[derive(Debug, PartialEq, Clone)]
    struct TestDistribution<'d>(&'d Vec<i32>);

-    impl<'d> NBDistribution<i32, i32> for TestDistribution<'d> {
+    impl NBDistribution<i32, i32> for TestDistribution<'_> {
        fn prior(&self, _class_index: usize) -> f64 {
            1.
        }
@@ -163,7 +164,7 @@ mod tests {
        }

        fn classes(&self) -> &Vec<i32> {
-            &self.0
+            self.0
        }
    }

@@ -176,7 +177,7 @@ mod tests {
            Ok(_) => panic!("Should return error in case of empty classes"),
            Err(err) => assert_eq!(
                err.to_string(),
-                "Predict failed: Failed to predict, there is no result"
+                "Predict failed: Failed to predict, no classes available"
            ),
        }

@@ -192,4 +193,441 @@ mod tests {
            Err(_) => panic!("Should success in normal case without NaNs"),
        }
    }
+
+    // A simple test distribution using float
+    #[derive(Debug, PartialEq, Clone)]
+    struct TestDistributionAgain {
+        classes: Vec<u32>,
+        probs: Vec<f64>,
+    }
+
+    impl NBDistribution<f64, u32> for TestDistributionAgain {
+        fn classes(&self) -> &Vec<u32> {
+            &self.classes
+        }
+        fn prior(&self, class_index: usize) -> f64 {
+            self.probs[class_index]
+        }
+        fn log_likelihood<'a>(
+            &'a self,
+            class_index: usize,
+            _j: &'a Box<dyn ArrayView1<f64> + 'a>,
+        ) -> f64 {
+            self.probs[class_index].ln()
+        }
+    }
+
+    type TestNB = BaseNaiveBayes<f64, u32, DenseMatrix<f64>, Vec<u32>, TestDistributionAgain>;
+
+    #[test]
+    fn test_predict_empty_classes() {
+        let dist = TestDistributionAgain {
+            classes: vec![],
+            probs: vec![],
+        };
+        let nb = TestNB::fit(dist).unwrap();
+        let x = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0]]).unwrap();
+        assert!(nb.predict(&x).is_err());
+    }
+
+    #[test]
+    fn test_predict_single_class() {
+        let dist = TestDistributionAgain {
+            classes: vec![1],
+            probs: vec![1.0],
+        };
+        let nb = TestNB::fit(dist).unwrap();
+        let x = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0]]).unwrap();
+        let result = nb.predict(&x).unwrap();
+        assert_eq!(result, vec![1, 1]);
+    }
+
+    #[test]
+    fn test_predict_multiple_classes() {
+        let dist = TestDistributionAgain {
+            classes: vec![1, 2, 3],
+            probs: vec![0.2, 0.5, 0.3],
+        };
+        let nb = TestNB::fit(dist).unwrap();
+        let x = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0], &[5.0, 6.0]]).unwrap();
+        let result = nb.predict(&x).unwrap();
+        assert_eq!(result, vec![2, 2, 2]);
+    }
+
+    #[test]
+    fn test_predict_with_nans() {
+        let dist = TestDistributionAgain {
+            classes: vec![1, 2],
+            probs: vec![f64::NAN, 0.5],
+        };
+        let nb = TestNB::fit(dist).unwrap();
+        let x = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0]]).unwrap();
+        let result = nb.predict(&x).unwrap();
+        assert_eq!(result, vec![2, 2]);
+    }
+
+    #[test]
+    fn test_predict_all_nans() {
+        let dist = TestDistributionAgain {
+            classes: vec![1, 2],
+            probs: vec![f64::NAN, f64::NAN],
+        };
+        let nb = TestNB::fit(dist).unwrap();
+        let x = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0]]).unwrap();
+        assert!(nb.predict(&x).is_err());
+    }
+
+    #[test]
+    fn test_predict_extreme_probabilities() {
+        let dist = TestDistributionAgain {
+            classes: vec![1, 2],
+            probs: vec![1e-300, 1e-301],
+        };
+        let nb = TestNB::fit(dist).unwrap();
+        let x = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0]]).unwrap();
+        let result = nb.predict(&x).unwrap();
+        assert_eq!(result, vec![1, 1]);
+    }
+
+    #[test]
+    fn test_predict_with_infinity() {
+        let dist = TestDistributionAgain {
+            classes: vec![1, 2, 3],
+            probs: vec![f64::INFINITY, 1.0, 2.0],
+        };
+        let nb = TestNB::fit(dist).unwrap();
+        let x = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0]]).unwrap();
+        let result = nb.predict(&x).unwrap();
+        assert_eq!(result, vec![1, 1]);
+    }
+
+    #[test]
+    fn test_predict_with_negative_infinity() {
+        let dist = TestDistributionAgain {
+            classes: vec![1, 2, 3],
+            probs: vec![f64::NEG_INFINITY, 1.0, 2.0],
+        };
+        let nb = TestNB::fit(dist).unwrap();
+        let x = DenseMatrix::from_2d_array(&[&[1.0, 2.0], &[3.0, 4.0]]).unwrap();
+        let result = nb.predict(&x).unwrap();
+        assert_eq!(result, vec![3, 3]);
+    }
+
+    #[test]
+    fn test_gaussian_naive_bayes_numerical_stability() {
+        #[derive(Debug, PartialEq, Clone)]
+        struct GaussianTestDistribution {
+            classes: Vec<u32>,
+            means: Vec<Vec<f64>>,
+            variances: Vec<Vec<f64>>,
+            priors: Vec<f64>,
+        }
+
+        impl NBDistribution<f64, u32> for GaussianTestDistribution {
+            fn classes(&self) -> &Vec<u32> {
+                &self.classes
+            }
+
+            fn prior(&self, class_index: usize) -> f64 {
+                self.priors[class_index]
+            }
+
+            fn log_likelihood<'a>(
+                &'a self,
+                class_index: usize,
+                j: &'a Box<dyn ArrayView1<f64> + 'a>,
+            ) -> f64 {
+                let means = &self.means[class_index];
+                let variances = &self.variances[class_index];
+                j.iterator(0)
+                    .enumerate()
+                    .map(|(i, &xi)| {
+                        let mean = means[i];
+                        let var = variances[i] + 1e-9; // Small smoothing for numerical stability
+                        let coeff = -0.5 * (2.0 * std::f64::consts::PI * var).ln();
+                        let exponent = -(xi - mean).powi(2) / (2.0 * var);
+                        coeff + exponent
+                    })
+                    .sum()
+            }
+        }
+
+        fn train_distribution(x: &DenseMatrix<f64>, y: &[u32]) -> GaussianTestDistribution {
+            let mut classes: Vec<u32> = y
+                .iter()
+                .cloned()
+                .collect::<std::collections::HashSet<u32>>()
+                .into_iter()
+                .collect();
+            classes.sort();
+            let n_classes = classes.len();
+            let n_features = x.shape().1;
+
+            let mut means = vec![vec![0.0; n_features]; n_classes];
+            let mut variances = vec![vec![0.0; n_features]; n_classes];
+            let mut class_counts = vec![0; n_classes];
+
+            // Calculate means and count samples per class
+            for (sample, &class) in x.row_iter().zip(y.iter()) {
+                let class_idx = classes.iter().position(|&c| c == class).unwrap();
+                class_counts[class_idx] += 1;
+                for (i, &value) in sample.iterator(0).enumerate() {
+                    means[class_idx][i] += value;
+                }
+            }
+
+            // Normalize means
+            for (class_idx, mean) in means.iter_mut().enumerate() {
+                for value in mean.iter_mut() {
+                    *value /= class_counts[class_idx] as f64;
+                }
+            }
+
+            // Calculate variances
+            for (sample, &class) in x.row_iter().zip(y.iter()) {
+                let class_idx = classes.iter().position(|&c| c == class).unwrap();
+                for (i, &value) in sample.iterator(0).enumerate() {
+                    let diff = value - means[class_idx][i];
+                    variances[class_idx][i] += diff * diff;
+                }
+            }
+
+            // Normalize variances and add small epsilon to avoid zero variance
+            let epsilon = 1e-9;
+            for (class_idx, variance) in variances.iter_mut().enumerate() {
+                for value in variance.iter_mut() {
+                    *value = *value / class_counts[class_idx] as f64 + epsilon;
+                }
+            }
+
+            // Calculate priors
+            let total_samples = y.len() as f64;
+            let priors: Vec<f64> = class_counts
+                .iter()
+                .map(|&count| count as f64 / total_samples)
+                .collect();
+
+            GaussianTestDistribution {
+                classes,
+                means,
+                variances,
+                priors,
+            }
+        }
+
+        type TestNBGaussian =
+            BaseNaiveBayes<f64, u32, DenseMatrix<f64>, Vec<u32>, GaussianTestDistribution>;
+
+        // Create a constant training dataset
+        let n_samples = 1000;
+        let n_features = 5;
+        let n_classes = 4;
+
+        let mut x_data = Vec::with_capacity(n_samples * n_features);
+        let mut y_data = Vec::with_capacity(n_samples);
+
+        for i in 0..n_samples {
+            for j in 0..n_features {
+                x_data.push((i * j) as f64 % 10.0);
+            }
+            y_data.push((i % n_classes) as u32);
+        }
+
+        let x = DenseMatrix::new(n_samples, n_features, x_data, true).unwrap();
+        let y = y_data;
+
+        // Train the model
+        let dist = train_distribution(&x, &y);
+        let nb = TestNBGaussian::fit(dist).unwrap();
+
+        // Create constant test data
+        let n_test_samples = 100;
+        let mut test_x_data = Vec::with_capacity(n_test_samples * n_features);
+        for i in 0..n_test_samples {
+            for j in 0..n_features {
+                test_x_data.push((i * j * 2) as f64 % 15.0);
+            }
+        }
+        let test_x = DenseMatrix::new(n_test_samples, n_features, test_x_data, true).unwrap();
+
+        // Make predictions
+        let predictions = nb
+            .predict(&test_x)
+            .map_err(|e| format!("Prediction failed: {}", e))
+            .unwrap();
+
+        // Check numerical stability
+        assert_eq!(
+            predictions.len(),
+            n_test_samples,
+            "Number of predictions should match number of test samples"
+        );
+
+        // Check that all predictions are valid class labels
+        for &pred in predictions.iter() {
+            assert!(pred < n_classes as u32, "Predicted class should be valid");
+        }
+
+        // Check consistency of predictions
+        let repeated_predictions = nb
+            .predict(&test_x)
+            .map_err(|e| format!("Repeated prediction failed: {}", e))
+            .unwrap();
+        assert_eq!(
+            predictions, repeated_predictions,
+            "Predictions should be consistent when repeated"
+        );
+
+        // Check extreme values
+        let extreme_x =
+            DenseMatrix::new(2, n_features, vec![f64::MAX; n_features * 2], true).unwrap();
+        let extreme_predictions = nb.predict(&extreme_x);
+        assert!(
+            extreme_predictions.is_err(),
+            "Extreme value input should result in an error"
+        );
+        assert_eq!(
+            extreme_predictions.unwrap_err().to_string(),
+            "Predict failed: Failed to predict, all probabilities were NaN",
+            "Incorrect error message for extreme values"
+        );
+
+        // Check for NaN handling
+        let nan_x = DenseMatrix::new(2, n_features, vec![f64::NAN; n_features * 2], true).unwrap();
+        let nan_predictions = nb.predict(&nan_x);
+        assert!(
+            nan_predictions.is_err(),
+            "NaN input should result in an error"
+        );
+
+        // Check for very small values
+        let small_x =
+            DenseMatrix::new(2, n_features, vec![f64::MIN_POSITIVE; n_features * 2], true).unwrap();
+        let small_predictions = nb
+            .predict(&small_x)
+            .map_err(|e| format!("Small value prediction failed: {}", e))
+            .unwrap();
+        for &pred in small_predictions.iter() {
+            assert!(
+                pred < n_classes as u32,
+                "Predictions for very small values should be valid"
+            );
+        }
+
+        // Check for values close to zero
+        let near_zero_x =
+            DenseMatrix::new(2, n_features, vec![1e-300; n_features * 2], true).unwrap();
+        let near_zero_predictions = nb
+            .predict(&near_zero_x)
+            .map_err(|e| format!("Near-zero value prediction failed: {}", e))
+            .unwrap();
+        for &pred in near_zero_predictions.iter() {
+            assert!(
+                pred < n_classes as u32,
+                "Predictions for near-zero values should be valid"
+            );
+        }
+
+        println!("All numerical stability checks passed!");
+    }
+
+    #[test]
+    fn test_gaussian_naive_bayes_numerical_stability_random_data() {
+        #[derive(Debug)]
+        struct MySimpleRng {
+            state: u64,
+        }
+
+        impl MySimpleRng {
+            fn new(seed: u64) -> Self {
+                MySimpleRng { state: seed }
+            }
+
+            /// Get the next u64 in the sequence.
+            fn next_u64(&mut self) -> u64 {
+                // LCG parameters; these are somewhat arbitrary but commonly used.
+                // Feel free to tweak the multiplier/adder etc.
+                self.state = self.state.wrapping_mul(6364136223846793005).wrapping_add(1);
+                self.state
+            }
+
+            /// Get an f64 in the range [min, max).
+            fn next_f64(&mut self, min: f64, max: f64) -> f64 {
+                let fraction = (self.next_u64() as f64) / (u64::MAX as f64);
+                min + fraction * (max - min)
+            }
+
+            /// Get a usize in the range [min, max). This floors the floating result.
+            fn gen_range_usize(&mut self, min: usize, max: usize) -> usize {
+                let v = self.next_f64(min as f64, max as f64);
+                // Truncate into the integer range. Because of floating inexactness,
+                // ensure we also clamp.
+                let int_v = v.floor() as isize;
+                // simple clamp to avoid any float rounding out of range
+                let clamped = int_v.max(min as isize).min((max - 1) as isize);
+                clamped as usize
+            }
+        }
+        use crate::naive_bayes::gaussian::GaussianNB;
+        // We will generate random data in a reproducible way (using a fixed seed).
+        // We will generate random data in a reproducible way:
+        let mut rng = MySimpleRng::new(42);
+
+        let n_samples = 1000;
+        let n_features = 5;
+        let n_classes = 4;
+
+        // Our feature matrix and label vector
+        let mut x_data = Vec::with_capacity(n_samples * n_features);
+        let mut y_data = Vec::with_capacity(n_samples);
+
+        // Fill x_data with random values and y_data with random class labels.
+        for _i in 0..n_samples {
+            for _j in 0..n_features {
+                // We’ll pick random values in [-10, 10).
+                x_data.push(rng.next_f64(-10.0, 10.0));
+            }
+            let class = rng.gen_range_usize(0, n_classes) as u32;
+            y_data.push(class);
+        }
+
+        // Create DenseMatrix from x_data
+        let x = DenseMatrix::new(n_samples, n_features, x_data, true).unwrap();
+
+        // Train GaussianNB
+        let gnb = GaussianNB::fit(&x, &y_data, Default::default())
+            .expect("Fitting GaussianNB with random data failed.");
+
+        // Predict on the same training data to verify no numerical instability
+        let predictions = gnb.predict(&x).expect("Prediction on random data failed.");
+
+        // Basic sanity checks
+        assert_eq!(
+            predictions.len(),
+            n_samples,
+            "Prediction size must match n_samples"
+        );
+        for &pred_class in &predictions {
+            assert!(
+                (pred_class as usize) < n_classes,
+                "Predicted class {} is out of range [0..n_classes).",
+                pred_class
+            );
+        }
+
+        // If you want to compare with scikit-learn, you can do something like:
+        // println!("X = {:?}", &x);
+        // println!("Y = {:?}", &y_data);
+        // println!("predictions = {:?}", &predictions);
+        // and then in Python:
+        //    import numpy as np
+        //    from sklearn.naive_bayes import GaussianNB
+        //    X = np.reshape(np.array(x), (1000, 5), order='F')
+        //    Y = np.array(y)
+        //    gnb = GaussianNB().fit(X, Y)
+        //    preds = gnb.predict(X)
+        //    expected = np.array(predictions)
+        //    assert expected == preds
+        // They should match closely (or exactly) depending on floating rounding.
+    }
 }
@@ -207,8 +207,7 @@ impl<TY: Number + Ord + Unsigned> MultinomialNBDistribution<TY> {
    /// Fits the distribution to a NxM matrix where N is number of samples and M is number of features.
    /// * `x` - training data.
    /// * `y` - vector with target values (classes) of length N.
-    /// * `priors` - Optional vector with prior probabilities of the classes. If not defined,
-    /// priors are adjusted according to the data.
+    /// * `priors` - Optional vector with prior probabilities of the classes. If not defined, priors are adjusted according to the data.
    /// * `alpha` - Additive (Laplace/Lidstone) smoothing parameter.
    pub fn fit<TX: Number + Unsigned, X: Array2<TX>, Y: Array1<TY>>(
        x: &X,
@@ -345,10 +344,10 @@ impl<TX: Number + Unsigned, TY: Number + Ord + Unsigned, X: Array2<TX>, Y: Array
 {
    /// Fits MultinomialNB with given data
    /// * `x` - training data of size NxM where N is the number of samples and M is the number of
-    /// features.
+    ///   features.
    /// * `y` - vector with target values (classes) of length N.
    /// * `parameters` - additional parameters like class priors, alpha for smoothing and
-    /// binarizing threshold.
+    ///   binarizing threshold.
    pub fn fit(x: &X, y: &Y, parameters: MultinomialNBParameters) -> Result<Self, Failed> {
        let distribution =
            MultinomialNBDistribution::fit(x, y, parameters.alpha, parameters.priors)?;
@@ -358,6 +357,7 @@ impl<TX: Number + Unsigned, TY: Number + Ord + Unsigned, X: Array2<TX>, Y: Array

    /// Estimates the class labels for the provided data.
    /// * `x` - data of shape NxM where N is number of data points to estimate and M is number of features.
+    ///
    /// Returns a vector of size N with class estimates.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
        self.inner.as_ref().unwrap().predict(x)
@@ -261,6 +261,7 @@ impl<TX: Number, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>, D: Distance<Vec

    /// Estimates the class labels for the provided data.
    /// * `x` - data of shape NxM where N is number of data points to estimate and M is number of features.
+    ///
    /// Returns a vector of size N with class estimates.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
        let mut result = Y::zeros(x.shape().0);
@@ -1,6 +1,7 @@
-//! # K Nearest Neighbors Regressor
+//! # K Nearest Neighbors Regressor with Feature Sparsing
 //!
 //! Regressor that predicts estimated values as a function of k nearest neightbours.
+//! Now supports feature sparsing - the ability to consider only a subset of features during prediction.
 //!
 //! `KNNRegressor` relies on 2 backend algorithms to speedup KNN queries:
 //! * [`LinearSearch`](../../algorithm/neighbour/linear_search/index.html)
@@ -29,6 +30,10 @@
 //!
 //! let knn = KNNRegressor::fit(&x, &y, Default::default()).unwrap();
 //! let y_hat = knn.predict(&x).unwrap();
+//! 
+//! // Predict using only features at indices 0
+//! let feature_indices = vec![0];
+//! let y_hat_sparse = knn.predict_sparse(&x, &feature_indices).unwrap();
 //! ```
 //!
 //! variable `y_hat` will hold predicted value
@@ -77,36 +82,41 @@ pub struct KNNRegressorParameters<T: Number, D: Distance<Vec<T>>> {
 pub struct KNNRegressor<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>, D: Distance<Vec<TX>>>
 {
    y: Option<Y>,
+    x: Option<X>, // Store training data for sparse feature prediction
    knn_algorithm: Option<KNNAlgorithm<TX, D>>,
+    distance: Option<D>, // Store distance function for sparse prediction
    weight: Option<KNNWeightFunction>,
    k: Option<usize>,
    _phantom_tx: PhantomData<TX>,
    _phantom_ty: PhantomData<TY>,
-    _phantom_x: PhantomData<X>,
 }

 impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>, D: Distance<Vec<TX>>>
    KNNRegressor<TX, TY, X, Y, D>
 {
-    ///
    fn y(&self) -> &Y {
        self.y.as_ref().unwrap()
    }

-    ///
+    fn x(&self) -> &X {
+        self.x.as_ref().unwrap()
+    }
+
    fn knn_algorithm(&self) -> &KNNAlgorithm<TX, D> {
        self.knn_algorithm
            .as_ref()
            .expect("Missing parameter: KNNAlgorithm")
    }

-    ///
+    fn distance(&self) -> &D {
+        self.distance.as_ref().expect("Missing parameter: distance")
+    }
+
    fn weight(&self) -> &KNNWeightFunction {
        self.weight.as_ref().expect("Missing parameter: weight")
    }

    #[allow(dead_code)]
-    ///
    fn k(&self) -> usize {
        self.k.unwrap()
    }
@@ -180,12 +190,13 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>, D: Distance<Vec<TX>>>
    fn new() -> Self {
        Self {
            y: Option::None,
+            x: Option::None,
            knn_algorithm: Option::None,
+            distance: Option::None,
            weight: Option::None,
            k: Option::None,
            _phantom_tx: PhantomData,
            _phantom_ty: PhantomData,
-            _phantom_x: PhantomData,
        }
    }

@@ -235,21 +246,23 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>, D: Distance<Vec<TX>>>
            )));
        }

-        let knn_algo = parameters.algorithm.fit(data, parameters.distance)?;
+        let knn_algo = parameters.algorithm.fit(data, parameters.distance.clone())?;

        Ok(KNNRegressor {
            y: Some(y.clone()),
+            x: Some(x.clone()),
            k: Some(parameters.k),
            knn_algorithm: Some(knn_algo),
+            distance: Some(parameters.distance),
            weight: Some(parameters.weight),
            _phantom_tx: PhantomData,
            _phantom_ty: PhantomData,
-            _phantom_x: PhantomData,
        })
    }

    /// Predict the target for the provided data.
    /// * `x` - data of shape NxM where N is number of data points to estimate and M is number of features.
+    ///
    /// Returns a vector of size N with estimates.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
        let mut result = Y::zeros(x.shape().0);
@@ -265,6 +278,45 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>, D: Distance<Vec<TX>>>
        Ok(result)
    }

+    /// Predict the target for the provided data using only specified features.
+    /// * `x` - data of shape NxM where N is number of data points to estimate and M is number of features.
+    /// * `feature_indices` - indices of features to consider (e.g., [0, 2, 4] to use only features at positions 0, 2, and 4)
+    ///
+    /// Returns a vector of size N with estimates.
+    pub fn predict_sparse(&self, x: &X, feature_indices: &[usize]) -> Result<Y, Failed> {
+        let (n_samples, n_features) = x.shape();
+        
+        // Validate feature indices
+        for &idx in feature_indices {
+            if idx >= n_features {
+                return Err(Failed::predict(&format!(
+                    "Feature index {} out of bounds (max: {})",
+                    idx,
+                    n_features - 1
+                )));
+            }
+        }
+
+        if feature_indices.is_empty() {
+            return Err(Failed::predict(
+                "feature_indices cannot be empty"
+            ));
+        }
+
+        let mut result = Y::zeros(n_samples);
+
+        let mut row_vec = vec![TX::zero(); feature_indices.len()];
+        for (i, row) in x.row_iter().enumerate() {
+            // Extract only the specified features
+            for (j, &feat_idx) in feature_indices.iter().enumerate() {
+                row_vec[j] = *row.get(feat_idx);
+            }
+            result.set(i, self.predict_for_row_sparse(&row_vec, feature_indices)?);
+        }
+
+        Ok(result)
+    }
+
    fn predict_for_row(&self, row: &Vec<TX>) -> Result<TY, Failed> {
        let search_result = self.knn_algorithm().find(row, self.k.unwrap())?;
        let mut result = TY::zero();
@@ -280,6 +332,50 @@ impl<TX: Number, TY: Number, X: Array2<TX>, Y: Array1<TY>, D: Distance<Vec<TX>>>

        Ok(result)
    }
+
+    fn predict_for_row_sparse(
+        &self,
+        row: &Vec<TX>,
+        feature_indices: &[usize],
+    ) -> Result<TY, Failed> {
+        let training_data = self.x();
+        let (n_training_samples, _) = training_data.shape();
+        let k = self.k.unwrap();
+
+        // Manually compute distances using only specified features
+        let mut distances: Vec<(usize, f64)> = Vec::with_capacity(n_training_samples);
+
+        for i in 0..n_training_samples {
+            let train_row = training_data.get_row(i);
+            
+            // Extract sparse features from training data
+            let mut train_sparse = Vec::with_capacity(feature_indices.len());
+            for &feat_idx in feature_indices {
+                train_sparse.push(*train_row.get(feat_idx));
+            }
+
+            // Compute distance using only selected features
+            let dist = self.distance().distance(row, &train_sparse);
+            distances.push((i, dist));
+        }
+
+        // Sort by distance and take k nearest
+        distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
+        let k_nearest: Vec<(usize, f64)> = distances.into_iter().take(k).collect();
+
+        // Compute weighted prediction
+        let mut result = TY::zero();
+        let weights = self
+            .weight()
+            .calc_weights(k_nearest.iter().map(|v| v.1).collect());
+        let w_sum: f64 = weights.iter().copied().sum();
+
+        for (neighbor, w) in k_nearest.iter().zip(weights.iter()) {
+            result += *self.y().get(neighbor.0) * TY::from_f64(*w / w_sum).unwrap();
+        }
+
+        Ok(result)
+    }
 }

 #[cfg(test)]
@@ -312,7 +408,7 @@ mod tests {
        let y_hat = knn.predict(&x).unwrap();
        assert_eq!(5, Vec::len(&y_hat));
        for i in 0..y_hat.len() {
-            assert!((y_hat[i] - y_exp[i]).abs() < std::f64::EPSILON);
+            assert!((y_hat[i] - y_exp[i]).abs() < f64::EPSILON);
        }
    }

@@ -335,6 +431,91 @@ mod tests {
        }
    }

+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn knn_predict_sparse() {
+        // Training data with 3 features
+        let x = DenseMatrix::from_2d_array(&[
+            &[1., 2., 10.],
+            &[3., 4., 20.],
+            &[5., 6., 30.],
+            &[7., 8., 40.],
+            &[9., 10., 50.],
+        ])
+        .unwrap();
+        let y: Vec<f64> = vec![1., 2., 3., 4., 5.];
+
+        let knn = KNNRegressor::fit(&x, &y, Default::default()).unwrap();
+
+        // Test data
+        let x_test = DenseMatrix::from_2d_array(&[
+            &[1., 2., 999.], // Third feature is very different
+            &[5., 6., 999.],
+        ])
+        .unwrap();
+
+        // Predict using only first two features (ignore the third)
+        let feature_indices = vec![0, 1];
+        let y_hat_sparse = knn.predict_sparse(&x_test, &feature_indices).unwrap();
+
+        // Should get good predictions since we're ignoring the mismatched third feature
+        assert_eq!(2, Vec::len(&y_hat_sparse));
+        assert!((y_hat_sparse[0] - 2.0).abs() < 1.0); // Should be close to 1-2
+        assert!((y_hat_sparse[1] - 3.0).abs() < 1.0); // Should be close to 3
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn knn_predict_sparse_single_feature() {
+        let x = DenseMatrix::from_2d_array(&[
+            &[1., 100., 1000.],
+            &[2., 200., 2000.],
+            &[3., 300., 3000.],
+            &[4., 400., 4000.],
+            &[5., 500., 5000.],
+        ])
+        .unwrap();
+        let y: Vec<f64> = vec![1., 2., 3., 4., 5.];
+
+        let knn = KNNRegressor::fit(&x, &y, Default::default()).unwrap();
+
+        let x_test = DenseMatrix::from_2d_array(&[&[1.5, 999., 9999.]]).unwrap();
+
+        // Use only first feature
+        let y_hat = knn.predict_sparse(&x_test, &[0]).unwrap();
+        
+        // Should predict based on first feature only
+        assert_eq!(1, Vec::len(&y_hat));
+        assert!((y_hat[0] - 1.5).abs() < 1.0);
+    }
+
+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn knn_predict_sparse_invalid_indices() {
+        let x = DenseMatrix::from_2d_array(&[&[1., 2.], &[3., 4.]]).unwrap();
+        let y: Vec<f64> = vec![1., 2.];
+
+        let knn = KNNRegressor::fit(&x, &y, Default::default()).unwrap();
+        let x_test = DenseMatrix::from_2d_array(&[&[1., 2.]]).unwrap();
+
+        // Index out of bounds
+        let result = knn.predict_sparse(&x_test, &[5]);
+        assert!(result.is_err());
+
+        // Empty indices
+        let result = knn.predict_sparse(&x_test, &[]);
+        assert!(result.is_err());
+    }
+
    #[cfg_attr(
        all(target_arch = "wasm32", not(target_os = "wasi")),
        wasm_bindgen_test::wasm_bindgen_test
@@ -353,4 +534,4 @@ mod tests {

        assert_eq!(knn, deserialized_knn);
    }
-}
+}
@@ -64,7 +64,7 @@ impl KNNWeightFunction {
            KNNWeightFunction::Distance => {
                // if there are any points that has zero distance from one or more training points,
                // those training points are weighted as 1.0 and the other points as 0.0
-                if distances.iter().any(|&e| e == 0f64) {
+                if distances.contains(&0f64) {
                    distances
                        .iter()
                        .map(|e| if *e == 0f64 { 1f64 } else { 0f64 })
@@ -1,5 +1,3 @@
-// TODO: missing documentation
-
 use std::default::Default;

 use crate::linalg::basic::arrays::Array1;
@@ -8,30 +6,27 @@ use crate::optimization::first_order::{FirstOrderOptimizer, OptimizerResult};
 use crate::optimization::line_search::LineSearchMethod;
 use crate::optimization::{DF, F};

-///
+/// Gradient Descent optimization algorithm
 pub struct GradientDescent {
-    ///
+    /// Maximum number of iterations
    pub max_iter: usize,
-    ///
+    /// Relative tolerance for the gradient norm
    pub g_rtol: f64,
-    ///
+    /// Absolute tolerance for the gradient norm
    pub g_atol: f64,
 }

-///
 impl Default for GradientDescent {
    fn default() -> Self {
        GradientDescent {
            max_iter: 10000,
-            g_rtol: std::f64::EPSILON.sqrt(),
-            g_atol: std::f64::EPSILON,
+            g_rtol: f64::EPSILON.sqrt(),
+            g_atol: f64::EPSILON,
        }
    }
 }

-///
 impl<T: FloatNumber> FirstOrderOptimizer<T> for GradientDescent {
-    ///
    fn optimize<'a, X: Array1<T>, LS: LineSearchMethod<T>>(
        &self,
        f: &'a F<'_, T, X>,
@@ -11,31 +11,29 @@ use crate::optimization::first_order::{FirstOrderOptimizer, OptimizerResult};
 use crate::optimization::line_search::LineSearchMethod;
 use crate::optimization::{DF, F};

-///
+/// Limited-memory BFGS optimization algorithm
 pub struct LBFGS {
-    ///
+    /// Maximum number of iterations
    pub max_iter: usize,
-    ///
+    /// TODO: Add documentation
    pub g_rtol: f64,
-    ///
+    /// TODO: Add documentation
    pub g_atol: f64,
-    ///
+    /// TODO: Add documentation
    pub x_atol: f64,
-    ///
+    /// TODO: Add documentation
    pub x_rtol: f64,
-    ///
+    /// TODO: Add documentation
    pub f_abstol: f64,
-    ///
+    /// TODO: Add documentation
    pub f_reltol: f64,
-    ///
+    /// TODO: Add documentation
    pub successive_f_tol: usize,
-    ///
+    /// TODO: Add documentation
    pub m: usize,
 }

-///
 impl Default for LBFGS {
-    ///
    fn default() -> Self {
        LBFGS {
            max_iter: 1000,
@@ -51,9 +49,7 @@ impl Default for LBFGS {
    }
 }

-///
 impl LBFGS {
-    ///
    fn two_loops<T: FloatNumber + RealNumber, X: Array1<T>>(&self, state: &mut LBFGSState<T, X>) {
        let lower = state.iteration.max(self.m) - self.m;
        let upper = state.iteration;
@@ -95,7 +91,6 @@ impl LBFGS {
        state.s.mul_scalar_mut(-T::one());
    }

-    ///
    fn init_state<T: FloatNumber + RealNumber, X: Array1<T>>(&self, x: &X) -> LBFGSState<T, X> {
        LBFGSState {
            x: x.clone(),
@@ -119,7 +114,6 @@ impl LBFGS {
        }
    }

-    ///
    fn update_state<'a, T: FloatNumber + RealNumber, X: Array1<T>, LS: LineSearchMethod<T>>(
        &self,
        f: &'a F<'_, T, X>,
@@ -161,7 +155,6 @@ impl LBFGS {
        df(&mut state.x_df, &state.x);
    }

-    ///
    fn assess_convergence<T: FloatNumber, X: Array1<T>>(
        &self,
        state: &mut LBFGSState<T, X>,
@@ -173,7 +166,7 @@ impl LBFGS {
        }

        if state.x.max_diff(&state.x_prev)
-            <= T::from_f64(self.x_rtol * state.x.norm(std::f64::INFINITY)).unwrap()
+            <= T::from_f64(self.x_rtol * state.x.norm(f64::INFINITY)).unwrap()
        {
            x_converged = true;
        }
@@ -188,14 +181,13 @@ impl LBFGS {
            state.counter_f_tol += 1;
        }

-        if state.x_df.norm(std::f64::INFINITY) <= self.g_atol {
+        if state.x_df.norm(f64::INFINITY) <= self.g_atol {
            g_converged = true;
        }

        g_converged || x_converged || state.counter_f_tol > self.successive_f_tol
    }

-    ///
    fn update_hessian<T: FloatNumber, X: Array1<T>>(
        &self,
        _: &DF<'_, X>,
@@ -212,7 +204,6 @@ impl LBFGS {
    }
 }

-///
 #[derive(Debug)]
 struct LBFGSState<T: FloatNumber, X: Array1<T>> {
    x: X,
@@ -234,9 +225,7 @@ struct LBFGSState<T: FloatNumber, X: Array1<T>> {
    alpha: T,
 }

-///
 impl<T: FloatNumber + RealNumber> FirstOrderOptimizer<T> for LBFGS {
-    ///
    fn optimize<'a, X: Array1<T>, LS: LineSearchMethod<T>>(
        &self,
        f: &F<'_, T, X>,
@@ -248,7 +237,7 @@ impl<T: FloatNumber + RealNumber> FirstOrderOptimizer<T> for LBFGS {

        df(&mut state.x_df, x0);

-        let g_converged = state.x_df.norm(std::f64::INFINITY) < self.g_atol;
+        let g_converged = state.x_df.norm(f64::INFINITY) < self.g_atol;
        let mut converged = g_converged;
        let stopped = false;

@@ -299,7 +288,7 @@ mod tests {

        let result = optimizer.optimize(&f, &df, &x0, &ls);

-        assert!((result.f_x - 0.0).abs() < std::f64::EPSILON);
+        assert!((result.f_x - 0.0).abs() < f64::EPSILON);
        assert!((result.x[0] - 1.0).abs() < 1e-8);
        assert!((result.x[1] - 1.0).abs() < 1e-8);
        assert!(result.iterations <= 24);
@@ -1,6 +1,6 @@
-///
+/// Gradient descent optimization algorithm
 pub mod gradient_descent;
-///
+/// Limited-memory BFGS optimization algorithm
 pub mod lbfgs;

 use std::clone::Clone;
@@ -11,9 +11,9 @@ use crate::numbers::floatnum::FloatNumber;
 use crate::optimization::line_search::LineSearchMethod;
 use crate::optimization::{DF, F};

-///
+/// First-order optimization is a class of algorithms that use the first derivative of a function to find optimal solutions.
 pub trait FirstOrderOptimizer<T: FloatNumber> {
-    ///
+    /// run first order optimization
    fn optimize<'a, X: Array1<T>, LS: LineSearchMethod<T>>(
        &self,
        f: &F<'_, T, X>,
@@ -23,13 +23,13 @@ pub trait FirstOrderOptimizer<T: FloatNumber> {
    ) -> OptimizerResult<T, X>;
 }

-///
+/// Result of optimization
 #[derive(Debug, Clone)]
 pub struct OptimizerResult<T: FloatNumber, X: Array1<T>> {
-    ///
+    /// Solution
    pub x: X,
-    ///
+    /// f(x) value
    pub f_x: T,
-    ///
+    /// number of iterations
    pub iterations: usize,
 }
@@ -1,47 +1,44 @@
-// TODO: missing documentation
-
 use crate::optimization::FunctionOrder;
 use num_traits::Float;

-///
+/// Line search optimization.
 pub trait LineSearchMethod<T: Float> {
-    ///
+    /// Find alpha that satisfies strong Wolfe conditions.
    fn search(
        &self,
-        f: &(dyn Fn(T) -> T),
-        df: &(dyn Fn(T) -> T),
+        f: &dyn Fn(T) -> T,
+        df: &dyn Fn(T) -> T,
        alpha: T,
        f0: T,
        df0: T,
    ) -> LineSearchResult<T>;
 }

-///
+/// Line search result
 #[derive(Debug, Clone)]
 pub struct LineSearchResult<T: Float> {
-    ///
+    /// Alpha value
    pub alpha: T,
-    ///
+    /// f(alpha) value
    pub f_x: T,
 }

-///
+/// Backtracking line search method.
 pub struct Backtracking<T: Float> {
-    ///
+    /// TODO: Add documentation
    pub c1: T,
-    ///
+    /// Maximum number of iterations for Backtracking single run
    pub max_iterations: usize,
-    ///
+    /// TODO: Add documentation
    pub max_infinity_iterations: usize,
-    ///
+    /// TODO: Add documentation
    pub phi: T,
-    ///
+    /// TODO: Add documentation
    pub plo: T,
-    ///
+    /// function order
    pub order: FunctionOrder,
 }

-///
 impl<T: Float> Default for Backtracking<T> {
    fn default() -> Self {
        Backtracking {
@@ -55,13 +52,11 @@ impl<T: Float> Default for Backtracking<T> {
    }
 }

-///
 impl<T: Float> LineSearchMethod<T> for Backtracking<T> {
-    ///
    fn search(
        &self,
-        f: &(dyn Fn(T) -> T),
-        _: &(dyn Fn(T) -> T),
+        f: &dyn Fn(T) -> T,
+        _: &dyn Fn(T) -> T,
        alpha: T,
        f0: T,
        df0: T,
@@ -1,21 +1,19 @@
-// TODO: missing documentation
-
-///
+/// first order optimization algorithms
 pub mod first_order;
-///
+/// line search algorithms
 pub mod line_search;

-///
+/// Function f(x) = y
 pub type F<'a, T, X> = dyn for<'b> Fn(&'b X) -> T + 'a;
-///
+/// Function df(x)
 pub type DF<'a, X> = dyn for<'b> Fn(&'b mut X, &'b X) + 'a;

-///
+/// Function order
 #[allow(clippy::upper_case_acronyms)]
 #[derive(Debug, PartialEq, Eq)]
 pub enum FunctionOrder {
-    ///
+    /// Second order
    SECOND,
-    ///
+    /// Third order
    THIRD,
 }
@@ -24,7 +24,7 @@
 //! //    &[1.5, 1.0, 0.0, 1.5, 0.0, 0.0, 1.0, 0.0]
 //! //    &[1.5, 0.0, 1.0, 1.5, 0.0, 0.0, 0.0, 1.0]
 //! ```
-use std::iter;
+use std::iter::repeat_n;

 use crate::error::Failed;
 use crate::linalg::basic::arrays::Array2;
@@ -75,11 +75,7 @@ fn find_new_idxs(num_params: usize, cat_sizes: &[usize], cat_idxs: &[usize]) ->
    let offset = (0..1).chain(offset_);

    let new_param_idxs: Vec<usize> = (0..num_params)
-        .zip(
-            repeats
-                .zip(offset)
-                .flat_map(|(r, o)| iter::repeat(o).take(r)),
-        )
+        .zip(repeats.zip(offset).flat_map(|(r, o)| repeat_n(o, r)))
        .map(|(idx, ofst)| idx + ofst)
        .collect();
    new_param_idxs
@@ -124,7 +120,7 @@ impl OneHotEncoder {
                let (nrows, _) = data.shape();

                // col buffer to avoid allocations
-                let mut col_buf: Vec<T> = iter::repeat(T::zero()).take(nrows).collect();
+                let mut col_buf: Vec<T> = repeat_n(T::zero(), nrows).collect();

                let mut res: Vec<CategoryMapper<CategoricalFloat>> = Vec::with_capacity(idxs.len());

@@ -172,18 +172,14 @@ where
    T: Number + RealNumber,
    M: Array2<T>,
 {
-    if let Some(output_matrix) = columns.first().cloned() {
-        return Some(
-            columns
-                .iter()
-                .skip(1)
-                .fold(output_matrix, |current_matrix, new_colum| {
-                    current_matrix.h_stack(new_colum)
-                }),
-        );
-    } else {
-        None
-    }
+    columns.first().cloned().map(|output_matrix| {
+        columns
+            .iter()
+            .skip(1)
+            .fold(output_matrix, |current_matrix, new_colum| {
+                current_matrix.h_stack(new_colum)
+            })
+    })
 }

 #[cfg(test)]
@@ -30,7 +30,7 @@ pub struct CSVDefinition<'a> {
    /// What seperates the fields in your csv-file?
    field_seperator: &'a str,
 }
-impl<'a> Default for CSVDefinition<'a> {
+impl Default for CSVDefinition<'_> {
    fn default() -> Self {
        Self {
            n_rows_header: 1,
@@ -25,14 +25,18 @@
 /// search parameters
 pub mod svc;
 pub mod svr;
-// /// search parameters space
-// pub mod search;
+// search parameters space
+pub mod search;

 use core::fmt::Debug;

 #[cfg(feature = "serde")]
 use serde::{Deserialize, Serialize};

+// Only import typetag if not compiling for wasm32 and serde is enabled
+#[cfg(all(feature = "serde", not(target_arch = "wasm32")))]
+use typetag;
+
 use crate::error::{Failed, FailedError};
 use crate::linalg::basic::arrays::{Array1, ArrayView1};

@@ -48,197 +52,281 @@ pub trait Kernel: Debug {
    fn apply(&self, x_i: &Vec<f64>, x_j: &Vec<f64>) -> Result<f64, Failed>;
 }

-/// Pre-defined kernel functions
+/// A enumerator for all the kernels type to support.
+/// This allows kernel selection and parameterization ergonomic, type-safe, and ready for use in parameter structs like SVRParameters.
+/// You can construct kernels using the provided variants and builder-style methods.
+///
+/// # Examples
+///
+/// ```
+/// use smartcore::svm::Kernels;
+///
+/// let linear = Kernels::linear();
+/// let rbf = Kernels::rbf().with_gamma(0.5);
+/// let poly = Kernels::polynomial().with_degree(3.0).with_gamma(0.5).with_coef0(1.0);
+/// let sigmoid = Kernels::sigmoid().with_gamma(0.2).with_coef0(0.0);
+/// ```
 #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-#[derive(Debug, Clone)]
-pub struct Kernels;
+#[derive(Debug, Clone, PartialEq)]
+pub enum Kernels {
+    /// Linear kernel (default).
+    ///
+    /// Computes the standard dot product between vectors.
+    Linear,
+
+    /// Radial Basis Function (RBF) kernel.
+    ///
+    /// Formula: K(x, y) = exp(-gamma * ||x-y||²)
+    RBF {
+        /// Controls the width of the Gaussian RBF kernel.
+        ///
+        /// Larger values of gamma lead to higher bias and lower variance.
+        /// This parameter is inversely proportional to the radius of influence
+        /// of samples selected by the model as support vectors.
+        gamma: Option<f64>,
+    },
+
+    /// Polynomial kernel.
+    ///
+    /// Formula: K(x, y) = (gamma * <x, y> + coef0)^degree
+    Polynomial {
+        /// The degree of the polynomial kernel.
+        ///
+        /// Integer values are typical (2 = quadratic, 3 = cubic), but any positive real value is valid.
+        /// Higher degree values create decision boundaries with higher complexity.
+        degree: Option<f64>,
+
+        /// Kernel coefficient for the dot product.
+        ///
+        /// Controls the influence of higher-degree versus lower-degree terms in the polynomial.
+        /// If None, a default value will be used.
+        gamma: Option<f64>,
+
+        /// Independent term in the polynomial kernel.
+        ///
+        /// Controls the influence of higher-degree versus lower-degree terms.
+        /// If None, a default value of 1.0 will be used.
+        coef0: Option<f64>,
+    },
+
+    /// Sigmoid kernel.
+    ///
+    /// Formula: K(x, y) = tanh(gamma * <x, y> + coef0)
+    Sigmoid {
+        /// Kernel coefficient for the dot product.
+        ///
+        /// Controls the scaling of the dot product in the sigmoid function.
+        /// If None, a default value will be used.
+        gamma: Option<f64>,
+
+        /// Independent term in the sigmoid kernel.
+        ///
+        /// Acts as a threshold/bias term in the sigmoid function.
+        /// If None, a default value of 1.0 will be used.
+        coef0: Option<f64>,
+    },
+}

 impl Kernels {
-    /// Return a default linear
-    pub fn linear() -> LinearKernel {
-        LinearKernel
+    /// Create a linear kernel.
+    ///
+    /// The linear kernel computes the dot product between two vectors:
+    /// K(x, y) = <x, y>
+    pub fn linear() -> Self {
+        Kernels::Linear
    }
-    /// Return a default RBF
-    pub fn rbf() -> RBFKernel {
-        RBFKernel::default()
+
+    /// Create an RBF kernel with unspecified gamma.
+    ///
+    /// The RBF kernel is defined as:
+    /// K(x, y) = exp(-gamma * ||x-y||²)
+    ///
+    /// You should specify gamma using `with_gamma()` before using this kernel.
+    pub fn rbf() -> Self {
+        Kernels::RBF { gamma: None }
    }
-    /// Return a default polynomial
-    pub fn polynomial() -> PolynomialKernel {
-        PolynomialKernel::default()
+
+    /// Create a polynomial kernel with default parameters.
+    ///
+    /// The polynomial kernel is defined as:
+    /// K(x, y) = (gamma * <x, y> + coef0)^degree
+    ///
+    /// Default values:
+    /// - gamma: None (must be specified)
+    /// - degree: None (must be specified)
+    /// - coef0: 1.0
+    pub fn polynomial() -> Self {
+        Kernels::Polynomial {
+            gamma: None,
+            degree: None,
+            coef0: Some(1.0),
+        }
    }
-    /// Return a default sigmoid
-    pub fn sigmoid() -> SigmoidKernel {
-        SigmoidKernel::default()
+
+    /// Create a sigmoid kernel with default parameters.
+    ///
+    /// The sigmoid kernel is defined as:
+    /// K(x, y) = tanh(gamma * <x, y> + coef0)
+    ///
+    /// Default values:
+    /// - gamma: None (must be specified)
+    /// - coef0: 1.0
+    ///
+    pub fn sigmoid() -> Self {
+        Kernels::Sigmoid {
+            gamma: None,
+            coef0: Some(1.0),
+        }
    }
-}

-/// Linear Kernel
-#[allow(clippy::derive_partial_eq_without_eq)]
-#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-#[derive(Debug, Clone, PartialEq, Eq, Default)]
-pub struct LinearKernel;
-
-/// Radial basis function (Gaussian) kernel
-#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-#[derive(Debug, Default, Clone, PartialEq)]
-pub struct RBFKernel {
-    /// kernel coefficient
-    pub gamma: Option<f64>,
-}
-
-#[allow(dead_code)]
-impl RBFKernel {
-    /// assign gamma parameter to kernel (required)
-    /// ```rust
-    /// use smartcore::svm::RBFKernel;
-    /// let knl = RBFKernel::default().with_gamma(0.7);
-    /// ```
-    pub fn with_gamma(mut self, gamma: f64) -> Self {
-        self.gamma = Some(gamma);
-        self
+    /// Set the `gamma` parameter for RBF, polynomial, or sigmoid kernels.
+    ///
+    /// The gamma parameter has different interpretations depending on the kernel:
+    /// - For RBF: Controls the width of the Gaussian. Larger values mean tighter fit.
+    /// - For Polynomial: Scaling factor for the dot product.
+    /// - For Sigmoid: Scaling factor for the dot product.
+    ///
+    pub fn with_gamma(self, gamma: f64) -> Self {
+        match self {
+            Kernels::RBF { .. } => Kernels::RBF { gamma: Some(gamma) },
+            Kernels::Polynomial { degree, coef0, .. } => Kernels::Polynomial {
+                gamma: Some(gamma),
+                degree,
+                coef0,
+            },
+            Kernels::Sigmoid { coef0, .. } => Kernels::Sigmoid {
+                gamma: Some(gamma),
+                coef0,
+            },
+            other => other,
+        }
    }
-}

-/// Polynomial kernel
-#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-#[derive(Debug, Clone, PartialEq)]
-pub struct PolynomialKernel {
-    /// degree of the polynomial
-    pub degree: Option<f64>,
-    /// kernel coefficient
-    pub gamma: Option<f64>,
-    /// independent term in kernel function
-    pub coef0: Option<f64>,
-}
+    /// Set the `degree` parameter for the polynomial kernel.
+    ///
+    /// The degree parameter controls the flexibility of the decision boundary.
+    /// Higher degrees create more complex boundaries but may lead to overfitting.
+    ///
+    pub fn with_degree(self, degree: f64) -> Self {
+        match self {
+            Kernels::Polynomial { gamma, coef0, .. } => Kernels::Polynomial {
+                degree: Some(degree),
+                gamma,
+                coef0,
+            },
+            other => other,
+        }
+    }

-impl Default for PolynomialKernel {
-    fn default() -> Self {
-        Self {
-            gamma: Option::None,
-            degree: Option::None,
-            coef0: Some(1f64),
+    /// Set the `coef0` parameter for polynomial or sigmoid kernels.
+    ///
+    /// The coef0 parameter is the independent term in the kernel function:
+    /// - For Polynomial: Controls the influence of higher-degree vs. lower-degree terms.
+    /// - For Sigmoid: Acts as a threshold/bias term.
+    ///
+    pub fn with_coef0(self, coef0: f64) -> Self {
+        match self {
+            Kernels::Polynomial { degree, gamma, .. } => Kernels::Polynomial {
+                degree,
+                gamma,
+                coef0: Some(coef0),
+            },
+            Kernels::Sigmoid { gamma, .. } => Kernels::Sigmoid {
+                gamma,
+                coef0: Some(coef0),
+            },
+            other => other,
        }
    }
 }

-impl PolynomialKernel {
-    /// set parameters for kernel
-    /// ```rust
-    /// use smartcore::svm::PolynomialKernel;
-    /// let knl = PolynomialKernel::default().with_params(3.0, 0.7, 1.0);
-    /// ```
-    pub fn with_params(mut self, degree: f64, gamma: f64, coef0: f64) -> Self {
-        self.degree = Some(degree);
-        self.gamma = Some(gamma);
-        self.coef0 = Some(coef0);
-        self
-    }
-    /// set gamma parameter for kernel
-    /// ```rust
-    /// use smartcore::svm::PolynomialKernel;
-    /// let knl = PolynomialKernel::default().with_gamma(0.7);
-    /// ```
-    pub fn with_gamma(mut self, gamma: f64) -> Self {
-        self.gamma = Some(gamma);
-        self
-    }
-    /// set degree parameter for kernel
-    /// ```rust
-    /// use smartcore::svm::PolynomialKernel;
-    /// let knl = PolynomialKernel::default().with_degree(3.0, 100);
-    /// ```
-    pub fn with_degree(self, degree: f64, n_features: usize) -> Self {
-        self.with_params(degree, 1f64, 1f64 / n_features as f64)
-    }
-}
-
-/// Sigmoid (hyperbolic tangent) kernel
-#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-#[derive(Debug, Clone, PartialEq)]
-pub struct SigmoidKernel {
-    /// kernel coefficient
-    pub gamma: Option<f64>,
-    /// independent term in kernel function
-    pub coef0: Option<f64>,
-}
-
-impl Default for SigmoidKernel {
-    fn default() -> Self {
-        Self {
-            gamma: Option::None,
-            coef0: Some(1f64),
-        }
-    }
-}
-
-impl SigmoidKernel {
-    /// set parameters for kernel
-    /// ```rust
-    /// use smartcore::svm::SigmoidKernel;
-    /// let knl = SigmoidKernel::default().with_params(0.7, 1.0);
-    /// ```
-    pub fn with_params(mut self, gamma: f64, coef0: f64) -> Self {
-        self.gamma = Some(gamma);
-        self.coef0 = Some(coef0);
-        self
-    }
-    /// set gamma parameter for kernel
-    /// ```rust
-    /// use smartcore::svm::SigmoidKernel;
-    /// let knl = SigmoidKernel::default().with_gamma(0.7);
-    /// ```
-    pub fn with_gamma(mut self, gamma: f64) -> Self {
-        self.gamma = Some(gamma);
-        self
-    }
-}
-
+/// Implementation of the [`Kernel`] trait for the [`Kernels`] enum in smartcore.
+///
+/// This method computes the value of the kernel function between two feature vectors `x_i` and `x_j`,
+/// according to the variant and parameters of the [`Kernels`] enum. This enables flexible and type-safe
+/// selection of kernel functions for SVM and SVR models in smartcore.
+///
+/// # Supported Kernels
+///
+/// - [`Kernels::Linear`]: Computes the standard dot product between `x_i` and `x_j`.
+/// - [`Kernels::RBF`]: Computes the Radial Basis Function (Gaussian) kernel. Requires `gamma`.
+/// - [`Kernels::Polynomial`]: Computes the polynomial kernel. Requires `degree`, `gamma`, and `coef0`.
+/// - [`Kernels::Sigmoid`]: Computes the sigmoid kernel. Requires `gamma` and `coef0`.
+///
+/// # Parameters
+///
+/// - `x_i`: First input vector (feature vector).
+/// - `x_j`: Second input vector (feature vector).
+///
+/// # Returns
+///
+/// - `Ok(f64)`: The computed kernel value.
+/// - `Err(Failed)`: If any required kernel parameter is missing.
+///
+/// # Errors
+///
+/// Returns `Err(Failed)` if a required parameter (such as `gamma`, `degree`, or `coef0`)
+/// is `None` for the selected kernel variant.
+///
+/// # Example
+///
+/// ```
+/// use smartcore::svm::Kernels;
+/// use smartcore::svm::Kernel;
+///
+/// let x = vec![1.0, 2.0, 3.0];
+/// let y = vec![4.0, 5.0, 6.0];
+/// let kernel = Kernels::rbf().with_gamma(0.5);
+/// let value = kernel.apply(&x, &y).unwrap();
+/// ```
+///
+/// # Notes
+///
+/// - This implementation follows smartcore's philosophy: pure Rust, no macros, no unsafe code,
+///   and an accessible, pythonic API surface for both ML practitioners and Rust beginners.
+/// - All kernel parameters must be set before calling `apply`; missing parameters will result in an error.
+///
+/// See the [`Kernels`] enum documentation for more details on each kernel type and its parameters.
 #[cfg_attr(all(feature = "serde", not(target_arch = "wasm32")), typetag::serde)]
-impl Kernel for LinearKernel {
+impl Kernel for Kernels {
    fn apply(&self, x_i: &Vec<f64>, x_j: &Vec<f64>) -> Result<f64, Failed> {
-        Ok(x_i.dot(x_j))
-    }
-}
-
-#[cfg_attr(all(feature = "serde", not(target_arch = "wasm32")), typetag::serde)]
-impl Kernel for RBFKernel {
-    fn apply(&self, x_i: &Vec<f64>, x_j: &Vec<f64>) -> Result<f64, Failed> {
-        if self.gamma.is_none() {
-            return Err(Failed::because(
-                FailedError::ParametersError,
-                "gamma should be set, use {Kernel}::default().with_gamma(..)",
-            ));
+        match self {
+            Kernels::Linear => Ok(x_i.dot(x_j)),
+            Kernels::RBF { gamma } => {
+                let gamma = gamma.ok_or_else(|| {
+                    Failed::because(FailedError::ParametersError, "gamma not set")
+                })?;
+                let v_diff = x_i.sub(x_j);
+                Ok((-gamma * v_diff.mul(&v_diff).sum()).exp())
+            }
+            Kernels::Polynomial {
+                degree,
+                gamma,
+                coef0,
+            } => {
+                let degree = degree.ok_or_else(|| {
+                    Failed::because(FailedError::ParametersError, "degree not set")
+                })?;
+                let gamma = gamma.ok_or_else(|| {
+                    Failed::because(FailedError::ParametersError, "gamma not set")
+                })?;
+                let coef0 = coef0.ok_or_else(|| {
+                    Failed::because(FailedError::ParametersError, "coef0 not set")
+                })?;
+                let dot = x_i.dot(x_j);
+                Ok((gamma * dot + coef0).powf(degree))
+            }
+            Kernels::Sigmoid { gamma, coef0 } => {
+                let gamma = gamma.ok_or_else(|| {
+                    Failed::because(FailedError::ParametersError, "gamma not set")
+                })?;
+                let coef0 = coef0.ok_or_else(|| {
+                    Failed::because(FailedError::ParametersError, "coef0 not set")
+                })?;
+                let dot = x_i.dot(x_j);
+                Ok((gamma * dot + coef0).tanh())
+            }
        }
-        let v_diff = x_i.sub(x_j);
-        Ok((-self.gamma.unwrap() * v_diff.mul(&v_diff).sum()).exp())
-    }
-}
-
-#[cfg_attr(all(feature = "serde", not(target_arch = "wasm32")), typetag::serde)]
-impl Kernel for PolynomialKernel {
-    fn apply(&self, x_i: &Vec<f64>, x_j: &Vec<f64>) -> Result<f64, Failed> {
-        if self.gamma.is_none() || self.coef0.is_none() || self.degree.is_none() {
-            return Err(Failed::because(
-                FailedError::ParametersError, "gamma, coef0, degree should be set, 
-                                                        use {Kernel}::default().with_{parameter}(..)")
-            );
-        }
-        let dot = x_i.dot(x_j);
-        Ok((self.gamma.unwrap() * dot + self.coef0.unwrap()).powf(self.degree.unwrap()))
-    }
-}
-
-#[cfg_attr(all(feature = "serde", not(target_arch = "wasm32")), typetag::serde)]
-impl Kernel for SigmoidKernel {
-    fn apply(&self, x_i: &Vec<f64>, x_j: &Vec<f64>) -> Result<f64, Failed> {
-        if self.gamma.is_none() || self.coef0.is_none() {
-            return Err(Failed::because(
-                FailedError::ParametersError, "gamma, coef0, degree should be set, 
-                                                        use {Kernel}::default().with_{parameter}(..)")
-            );
-        }
-        let dot = x_i.dot(x_j);
-        Ok(self.gamma.unwrap() * dot + self.coef0.unwrap().tanh())
    }
 }

@@ -247,6 +335,18 @@ mod tests {
    use super::*;
    use crate::svm::Kernels;

+    #[test]
+    fn rbf_kernel() {
+        let v1 = vec![1., 2., 3.];
+        let v2 = vec![4., 5., 6.];
+        let result = Kernels::rbf()
+            .with_gamma(0.055)
+            .apply(&v1, &v2)
+            .unwrap()
+            .abs();
+        assert!((0.2265f64 - result) < 1e-4);
+    }
+
    #[cfg_attr(
        all(target_arch = "wasm32", not(target_os = "wasi")),
        wasm_bindgen_test::wasm_bindgen_test
@@ -264,7 +364,7 @@ mod tests {
        wasm_bindgen_test::wasm_bindgen_test
    )]
    #[test]
-    fn rbf_kernel() {
+    fn test_rbf_kernel() {
        let v1 = vec![1., 2., 3.];
        let v2 = vec![4., 5., 6.];

@@ -287,12 +387,15 @@ mod tests {
        let v2 = vec![4., 5., 6.];

        let result = Kernels::polynomial()
-            .with_params(3.0, 0.5, 1.0)
+            .with_gamma(0.5)
+            .with_degree(3.0)
+            .with_coef0(1.0)
+            //.with_params(3.0, 0.5, 1.0)
            .apply(&v1, &v2)
            .unwrap()
            .abs();

-        assert!((4913f64 - result) < std::f64::EPSILON);
+        assert!((4913f64 - result).abs() < f64::EPSILON);
    }

    #[cfg_attr(
@@ -305,7 +408,8 @@ mod tests {
        let v2 = vec![4., 5., 6.];

        let result = Kernels::sigmoid()
-            .with_params(0.01, 0.1)
+            .with_gamma(0.01)
+            .with_coef0(0.1)
            .apply(&v1, &v2)
            .unwrap()
            .abs();
@@ -1,3 +1,5 @@
+//! SVC and Grid Search
+
 /// SVC search parameters
 pub mod svc_params;
 /// SVC search parameters
@@ -1,112 +1,293 @@
-// /// SVR grid search parameters
-// #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-// #[derive(Debug, Clone)]
-// pub struct SVRSearchParameters<T: Number + RealNumber, M: Matrix<T>, K: Kernel<T, M::RowVector>> {
-//     /// Epsilon in the epsilon-SVR model.
-//     pub eps: Vec<T>,
-//     /// Regularization parameter.
-//     pub c: Vec<T>,
-//     /// Tolerance for stopping eps.
-//     pub tol: Vec<T>,
-//     /// The kernel function.
-//     pub kernel: Vec<K>,
-//     /// Unused parameter.
-//     m: PhantomData<M>,
-// }
+//! # SVR Grid Search Parameters
+//!
+//! This module provides utilities for defining and iterating over grid search parameter spaces
+//! for Support Vector Regression (SVR) models in [smartcore](https://github.com/smartcorelib/smartcore).
+//!
+//! The main struct, [`SVRSearchParameters`], allows users to specify multiple values for each
+//! SVR hyperparameter (epsilon, regularization parameter C, tolerance, and kernel function).
+//! The provided iterator yields all possible combinations (the Cartesian product) of these parameters,
+//! enabling exhaustive grid search for hyperparameter tuning.
+//!
+//!
+//! ## Example
+//! ```
+//! use smartcore::svm::Kernels;
+//! use smartcore::svm::search::svr_params::SVRSearchParameters;
+//! use smartcore::linalg::basic::matrix::DenseMatrix;
+//!
+//! let params = SVRSearchParameters::<f64, DenseMatrix<f64>> {
+//!     eps: vec![0.1, 0.2],
+//!     c: vec![1.0, 10.0],
+//!     tol: vec![1e-3],
+//!     kernel: vec![Kernels::linear(), Kernels::rbf().with_gamma(0.5)],
+//!     m: std::marker::PhantomData,
+//! };
+//!
+//! // for param_set in params.into_iter() {
+//!     // Use param_set (of type svr::SVRParameters) to fit and evaluate your SVR model.
+//! // }
+//! ```
+//!
+//!
+//! ## Note
+//! This module is intended for use with smartcore version 0.4 or later. The API is not compatible with older versions[1].
+#[cfg(feature = "serde")]
+use serde::{Deserialize, Serialize};

-// /// SVR grid search iterator
-// pub struct SVRSearchParametersIterator<T: Number + RealNumber, M: Matrix<T>, K: Kernel<T, M::RowVector>> {
-//     svr_search_parameters: SVRSearchParameters<T, M, K>,
-//     current_eps: usize,
-//     current_c: usize,
-//     current_tol: usize,
-//     current_kernel: usize,
-// }
+use crate::linalg::basic::arrays::Array2;
+use crate::numbers::basenum::Number;
+use crate::numbers::floatnum::FloatNumber;
+use crate::numbers::realnum::RealNumber;
+use crate::svm::{svr, Kernels};
+use std::marker::PhantomData;

-// impl<T: Number + RealNumber, M: Matrix<T>, K: Kernel<T, M::RowVector>> IntoIterator
-//     for SVRSearchParameters<T, M, K>
-// {
-//     type Item = SVRParameters<T, M, K>;
-//     type IntoIter = SVRSearchParametersIterator<T, M, K>;
+/// ## SVR grid search parameters
+/// A struct representing a grid of hyperparameters for SVR grid search in smartcore.
+///
+/// Each field is a vector of possible values for the corresponding SVR hyperparameter.
+/// The [`IntoIterator`] implementation yields every possible combination of these parameters
+/// as an `svr::SVRParameters` struct, suitable for use in model selection routines.
+///
+/// # Type Parameters
+/// - `T`: Numeric type for parameters (e.g., `f64`)
+/// - `M`: Matrix type implementing [`Array2<T>`]
+///
+/// # Fields
+/// - `eps`: Vector of epsilon values for the epsilon-insensitive loss in SVR.
+/// - `c`: Vector of regularization parameters (C) for SVR.
+/// - `tol`: Vector of tolerance values for the stopping criterion.
+/// - `kernel`: Vector of kernel function variants (see [`Kernels`]).
+/// - `m`: Phantom data for the matrix type parameter.
+///
+/// # Example
+/// ```
+/// use smartcore::svm::Kernels;
+/// use smartcore::svm::search::svr_params::SVRSearchParameters;
+/// use smartcore::linalg::basic::matrix::DenseMatrix;
+///
+/// let params = SVRSearchParameters::<f64, DenseMatrix<f64>> {
+///     eps: vec![0.1, 0.2],
+///     c: vec![1.0, 10.0],
+///     tol: vec![1e-3],
+///     kernel: vec![Kernels::linear(), Kernels::rbf().with_gamma(0.5)],
+///     m: std::marker::PhantomData,
+/// };
+/// ```
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug, Clone)]
+pub struct SVRSearchParameters<T: Number + RealNumber, M: Array2<T>> {
+    /// Epsilon in the epsilon-SVR model.
+    pub eps: Vec<T>,
+    /// Regularization parameter.
+    pub c: Vec<T>,
+    /// Tolerance for stopping eps.
+    pub tol: Vec<T>,
+    /// The kernel function.
+    pub kernel: Vec<Kernels>,
+    /// Unused parameter.
+    pub m: PhantomData<M>,
+}

-//     fn into_iter(self) -> Self::IntoIter {
-//         SVRSearchParametersIterator {
-//             svr_search_parameters: self,
-//             current_eps: 0,
-//             current_c: 0,
-//             current_tol: 0,
-//             current_kernel: 0,
-//         }
-//     }
-// }
+/// SVR grid search iterator
+pub struct SVRSearchParametersIterator<T: Number + RealNumber, M: Array2<T>> {
+    svr_search_parameters: SVRSearchParameters<T, M>,
+    current_eps: usize,
+    current_c: usize,
+    current_tol: usize,
+    current_kernel: usize,
+}

-// impl<T: Number + RealNumber, M: Matrix<T>, K: Kernel<T, M::RowVector>> Iterator
-//     for SVRSearchParametersIterator<T, M, K>
-// {
-//     type Item = SVRParameters<T, M, K>;
+impl<T: Number + FloatNumber + RealNumber, M: Array2<T>> IntoIterator
+    for SVRSearchParameters<T, M>
+{
+    type Item = svr::SVRParameters<T>;
+    type IntoIter = SVRSearchParametersIterator<T, M>;

-//     fn next(&mut self) -> Option<Self::Item> {
-//         if self.current_eps == self.svr_search_parameters.eps.len()
-//             && self.current_c == self.svr_search_parameters.c.len()
-//             && self.current_tol == self.svr_search_parameters.tol.len()
-//             && self.current_kernel == self.svr_search_parameters.kernel.len()
-//         {
-//             return None;
-//         }
+    fn into_iter(self) -> Self::IntoIter {
+        SVRSearchParametersIterator {
+            svr_search_parameters: self,
+            current_eps: 0,
+            current_c: 0,
+            current_tol: 0,
+            current_kernel: 0,
+        }
+    }
+}

-//         let next = SVRParameters::<T, M, K> {
-//             eps: self.svr_search_parameters.eps[self.current_eps],
-//             c: self.svr_search_parameters.c[self.current_c],
-//             tol: self.svr_search_parameters.tol[self.current_tol],
-//             kernel: self.svr_search_parameters.kernel[self.current_kernel].clone(),
-//             m: PhantomData,
-//         };
+impl<T: Number + FloatNumber + RealNumber, M: Array2<T>> Iterator
+    for SVRSearchParametersIterator<T, M>
+{
+    type Item = svr::SVRParameters<T>;

-//         if self.current_eps + 1 < self.svr_search_parameters.eps.len() {
-//             self.current_eps += 1;
-//         } else if self.current_c + 1 < self.svr_search_parameters.c.len() {
-//             self.current_eps = 0;
-//             self.current_c += 1;
-//         } else if self.current_tol + 1 < self.svr_search_parameters.tol.len() {
-//             self.current_eps = 0;
-//             self.current_c = 0;
-//             self.current_tol += 1;
-//         } else if self.current_kernel + 1 < self.svr_search_parameters.kernel.len() {
-//             self.current_eps = 0;
-//             self.current_c = 0;
-//             self.current_tol = 0;
-//             self.current_kernel += 1;
-//         } else {
-//             self.current_eps += 1;
-//             self.current_c += 1;
-//             self.current_tol += 1;
-//             self.current_kernel += 1;
-//         }
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.current_eps == self.svr_search_parameters.eps.len()
+            && self.current_c == self.svr_search_parameters.c.len()
+            && self.current_tol == self.svr_search_parameters.tol.len()
+            && self.current_kernel == self.svr_search_parameters.kernel.len()
+        {
+            return None;
+        }

-//         Some(next)
-//     }
-// }
+        let next = svr::SVRParameters::<T> {
+            eps: self.svr_search_parameters.eps[self.current_eps],
+            c: self.svr_search_parameters.c[self.current_c],
+            tol: self.svr_search_parameters.tol[self.current_tol],
+            kernel: Some(self.svr_search_parameters.kernel[self.current_kernel].clone()),
+        };

-// impl<T: Number + RealNumber, M: Matrix<T>> Default for SVRSearchParameters<T, M, LinearKernel> {
-//     fn default() -> Self {
-//         let default_params: SVRParameters<T, M, LinearKernel> = SVRParameters::default();
+        if self.current_eps + 1 < self.svr_search_parameters.eps.len() {
+            self.current_eps += 1;
+        } else if self.current_c + 1 < self.svr_search_parameters.c.len() {
+            self.current_eps = 0;
+            self.current_c += 1;
+        } else if self.current_tol + 1 < self.svr_search_parameters.tol.len() {
+            self.current_eps = 0;
+            self.current_c = 0;
+            self.current_tol += 1;
+        } else if self.current_kernel + 1 < self.svr_search_parameters.kernel.len() {
+            self.current_eps = 0;
+            self.current_c = 0;
+            self.current_tol = 0;
+            self.current_kernel += 1;
+        } else {
+            self.current_eps += 1;
+            self.current_c += 1;
+            self.current_tol += 1;
+            self.current_kernel += 1;
+        }

-//         SVRSearchParameters {
-//             eps: vec![default_params.eps],
-//             c: vec![default_params.c],
-//             tol: vec![default_params.tol],
-//             kernel: vec![default_params.kernel],
-//             m: PhantomData,
-//         }
-//     }
-// }
+        Some(next)
+    }
+}

-// #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-// #[derive(Debug)]
-// #[cfg_attr(
-//     feature = "serde",
-//     serde(bound(
-//         serialize = "M::RowVector: Serialize, K: Serialize, T: Serialize",
-//         deserialize = "M::RowVector: Deserialize<'de>, K: Deserialize<'de>, T: Deserialize<'de>",
-//     ))
-// )]
+impl<T: Number + FloatNumber + RealNumber, M: Array2<T>> Default for SVRSearchParameters<T, M> {
+    fn default() -> Self {
+        let default_params: svr::SVRParameters<T> = svr::SVRParameters::default();
+
+        SVRSearchParameters {
+            eps: vec![default_params.eps],
+            c: vec![default_params.c],
+            tol: vec![default_params.tol],
+            kernel: vec![default_params.kernel.unwrap_or_else(Kernels::linear)],
+            m: PhantomData,
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::linalg::basic::matrix::DenseMatrix;
+    use crate::svm::Kernels;
+
+    type T = f64;
+    type M = DenseMatrix<T>;
+
+    #[test]
+    fn test_default_parameters() {
+        let params = SVRSearchParameters::<T, M>::default();
+        assert_eq!(params.eps.len(), 1);
+        assert_eq!(params.c.len(), 1);
+        assert_eq!(params.tol.len(), 1);
+        assert_eq!(params.kernel.len(), 1);
+        // Check that the default kernel is linear
+        assert_eq!(params.kernel[0], Kernels::linear());
+    }
+
+    #[test]
+    fn test_single_grid_iteration() {
+        let params = SVRSearchParameters::<T, M> {
+            eps: vec![0.1],
+            c: vec![1.0],
+            tol: vec![1e-3],
+            kernel: vec![Kernels::rbf().with_gamma(0.5)],
+            m: PhantomData,
+        };
+        let mut iter = params.into_iter();
+        let param = iter.next().unwrap();
+        assert_eq!(param.eps, 0.1);
+        assert_eq!(param.c, 1.0);
+        assert_eq!(param.tol, 1e-3);
+        assert_eq!(param.kernel, Some(Kernels::rbf().with_gamma(0.5)));
+        assert!(iter.next().is_none());
+    }
+
+    #[test]
+    fn test_cartesian_grid_iteration() {
+        let params = SVRSearchParameters::<T, M> {
+            eps: vec![0.1, 0.2],
+            c: vec![1.0, 2.0],
+            tol: vec![1e-3],
+            kernel: vec![Kernels::linear(), Kernels::rbf().with_gamma(0.5)],
+            m: PhantomData,
+        };
+        let expected_count =
+            params.eps.len() * params.c.len() * params.tol.len() * params.kernel.len();
+        let results: Vec<_> = params.into_iter().collect();
+        assert_eq!(results.len(), expected_count);
+
+        // Check that all parameter combinations are present
+        let mut seen = vec![];
+        for p in &results {
+            seen.push((p.eps, p.c, p.tol, p.kernel.clone().unwrap()));
+        }
+        for &eps in &[0.1, 0.2] {
+            for &c in &[1.0, 2.0] {
+                for &tol in &[1e-3] {
+                    for kernel in &[Kernels::linear(), Kernels::rbf().with_gamma(0.5)] {
+                        assert!(seen.contains(&(eps, c, tol, kernel.clone())));
+                    }
+                }
+            }
+        }
+    }
+
+    #[test]
+    fn test_empty_grid() {
+        let params = SVRSearchParameters::<T, M> {
+            eps: vec![],
+            c: vec![],
+            tol: vec![],
+            kernel: vec![],
+            m: PhantomData,
+        };
+        let mut iter = params.into_iter();
+        assert!(iter.next().is_none());
+    }
+
+    #[test]
+    fn test_kernel_enum_variants() {
+        let lin = Kernels::linear();
+        let rbf = Kernels::rbf().with_gamma(0.2);
+        let poly = Kernels::polynomial()
+            .with_degree(2.0)
+            .with_gamma(1.0)
+            .with_coef0(0.5);
+        let sig = Kernels::sigmoid().with_gamma(0.3).with_coef0(0.1);
+
+        assert_eq!(lin, Kernels::Linear);
+        match rbf {
+            Kernels::RBF { gamma } => assert_eq!(gamma, Some(0.2)),
+            _ => panic!("Not RBF"),
+        }
+        match poly {
+            Kernels::Polynomial {
+                degree,
+                gamma,
+                coef0,
+            } => {
+                assert_eq!(degree, Some(2.0));
+                assert_eq!(gamma, Some(1.0));
+                assert_eq!(coef0, Some(0.5));
+            }
+            _ => panic!("Not Polynomial"),
+        }
+        match sig {
+            Kernels::Sigmoid { gamma, coef0 } => {
+                assert_eq!(gamma, Some(0.3));
+                assert_eq!(coef0, Some(0.1));
+            }
+            _ => panic!("Not Sigmoid"),
+        }
+    }
+}
@@ -58,10 +58,11 @@
 //!            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1];
 //!
 //! let knl = Kernels::linear();
-//! let params = &SVCParameters::default().with_c(200.0).with_kernel(knl);
-//! let svc = SVC::fit(&x, &y, params).unwrap();
+//! let parameters = &SVCParameters::default().with_c(200.0).with_kernel(knl);
+//! let svc = SVC::fit(&x, &y, parameters).unwrap();
 //!
 //! let y_hat = svc.predict(&x).unwrap();
+//!
 //! ```
 //!
 //! ## References:
@@ -84,12 +85,194 @@ use serde::{Deserialize, Serialize};

 use crate::api::{PredictorBorrow, SupervisedEstimatorBorrow};
 use crate::error::{Failed, FailedError};
-use crate::linalg::basic::arrays::{Array1, Array2, MutArray};
+use crate::linalg::basic::arrays::{Array, Array1, Array2, MutArray};
 use crate::numbers::basenum::Number;
 use crate::numbers::realnum::RealNumber;
 use crate::rand_custom::get_rng_impl;
 use crate::svm::Kernel;

+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug)]
+/// Configuration for a multi-class Support Vector Machine (SVM) classifier.
+/// This struct holds the indices of the data points relevant to a specific binary
+/// classification problem within a multi-class context, and the two classes
+/// being discriminated.
+struct MultiClassConfig<TY: Number + Ord> {
+    /// The indices of the data points from the original dataset that belong to the two `classes`.
+    indices: Vec<usize>,
+    /// A tuple representing the two classes that this configuration is designed to distinguish.
+    classes: (TY, TY),
+}
+
+impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
+    SupervisedEstimatorBorrow<'a, X, Y, SVCParameters<TX, TY, X, Y>>
+    for MultiClassSVC<'a, TX, TY, X, Y>
+{
+    /// Creates a new, empty `MultiClassSVC` instance.
+    fn new() -> Self {
+        Self {
+            classifiers: Option::None,
+        }
+    }
+
+    /// Fits the `MultiClassSVC` model to the provided data and parameters.
+    ///
+    /// This method delegates the fitting process to the inherent `MultiClassSVC::fit` method.
+    ///
+    /// # Arguments
+    /// * `x` - A reference to the input features (2D array).
+    /// * `y` - A reference to the target labels (1D array).
+    /// * `parameters` - A reference to the `SVCParameters` controlling the SVM training.
+    ///
+    /// # Returns
+    /// A `Result` indicating success (`Self`) or failure (`Failed`).
+    fn fit(
+        x: &'a X,
+        y: &'a Y,
+        parameters: &'a SVCParameters<TX, TY, X, Y>,
+    ) -> Result<Self, Failed> {
+        MultiClassSVC::fit(x, y, parameters)
+    }
+}
+
+impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
+    PredictorBorrow<'a, X, TX> for MultiClassSVC<'a, TX, TY, X, Y>
+{
+    /// Predicts the class labels for new data points.
+    ///
+    /// This method delegates the prediction process to the inherent `MultiClassSVC::predict` method.
+    ///
+    /// # Arguments
+    /// * `x` - A reference to the input features (2D array) for which to make predictions.
+    ///
+    /// # Returns
+    /// A `Result` containing a `Vec` of predicted class labels (`TX`) or a `Failed` error.
+    fn predict(&self, x: &'a X) -> Result<Vec<TX>, Failed> {
+        Ok(self.predict(x).unwrap())
+    }
+}
+
+/// A multi-class Support Vector Machine (SVM) classifier.
+///
+/// This struct implements a multi-class SVM using the "one-vs-one" strategy,
+/// where a separate binary SVC classifier is trained for every pair of classes.
+///
+/// # Type Parameters
+/// * `'a` - Lifetime parameter for borrowed data.
+/// * `TX` - The numeric type of the input features (must implement `Number` and `RealNumber`).
+/// * `TY` - The numeric type of the target labels (must implement `Number` and `Ord`).
+/// * `X` - The type representing the 2D array of input features (e.g., a matrix).
+/// * `Y` - The type representing the 1D array of target labels (e.g., a vector).
+pub struct MultiClassSVC<
+    'a,
+    TX: Number + RealNumber,
+    TY: Number + Ord,
+    X: Array2<TX>,
+    Y: Array1<TY>,
+> {
+    /// An optional vector of binary `SVC` classifiers.
+    classifiers: Option<Vec<SVC<'a, TX, TY, X, Y>>>,
+}
+
+impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
+    MultiClassSVC<'a, TX, TY, X, Y>
+{
+    /// Fits the `MultiClassSVC` model to the provided data using a one-vs-one strategy.
+    ///
+    /// This method identifies all unique classes in the target labels `y` and then
+    /// trains a binary `SVC` for every unique pair of classes. For each pair, it
+    /// extracts the relevant data points and their labels, and then trains a
+    /// specialized `SVC` for that binary classification task.
+    ///
+    /// # Arguments
+    /// * `x` - A reference to the input features (2D array).
+    /// * `y` - A reference to the target labels (1D array).
+    /// * `parameters` - A reference to the `SVCParameters` controlling the SVM training for each individual binary classifier.
+    ///  
+    ///
+    /// # Returns
+    /// A `Result` indicating success (`MultiClassSVC`) or failure (`Failed`).
+    pub fn fit(
+        x: &'a X,
+        y: &'a Y,
+        parameters: &'a SVCParameters<TX, TY, X, Y>,
+    ) -> Result<MultiClassSVC<'a, TX, TY, X, Y>, Failed> {
+        let unique_classes = y.unique();
+        let mut classifiers = Vec::new();
+        // Iterate through all unique pairs of classes (one-vs-one strategy)
+        for i in 0..unique_classes.len() {
+            for j in i..unique_classes.len() {
+                if i == j {
+                    continue;
+                }
+                let class0 = unique_classes[j];
+                let class1 = unique_classes[i];
+
+                let mut indices = Vec::new();
+                // Collect indices of data points belonging to the current pair of classes
+                for (index, v) in y.iterator(0).enumerate() {
+                    if *v == class0 || *v == class1 {
+                        indices.push(index)
+                    }
+                }
+                let classes = (class0, class1);
+                let multiclass_config = MultiClassConfig { classes, indices };
+                // Fit a binary SVC for the current pair of classes
+                let svc = SVC::multiclass_fit(x, y, parameters, multiclass_config).unwrap();
+                classifiers.push(svc);
+            }
+        }
+        Ok(Self {
+            classifiers: Some(classifiers),
+        })
+    }
+
+    /// Predicts the class labels for new data points using the trained multi-class SVM.
+    ///
+    /// This method uses a "voting" scheme (majority vote) among all the binary
+    /// classifiers to determine the final prediction for each data point.
+    ///
+    /// # Arguments
+    /// * `x` - A reference to the input features (2D array) for which to make predictions.
+    ///
+    /// # Returns
+    /// A `Result` containing a `Vec` of predicted class labels (`TX`) or a `Failed` error.
+    ///
+    pub fn predict(&self, x: &X) -> Result<Vec<TX>, Failed> {
+        // Initialize a HashMap for each data point to store votes for each class
+        let mut polls = vec![HashMap::new(); x.shape().0];
+        // Retrieve the trained binary classifiers
+        let classifiers = self.classifiers.as_ref().unwrap();
+
+        // Iterate through each binary classifier
+        for i in 0..classifiers.len() {
+            let svc = classifiers.get(i).unwrap();
+            let predictions = svc.predict(x).unwrap(); // call SVC::predict for each binary classifier
+
+            // For each prediction from the current binary classifier
+            for (j, prediction) in predictions.iter().enumerate() {
+                let prediction = prediction.to_i32().unwrap();
+                let poll = polls.get_mut(j).unwrap(); // Get the poll for the current data point
+                                                      // Increment the vote for the predicted class
+                if let Some(count) = poll.get_mut(&prediction) {
+                    *count += 1
+                } else {
+                    poll.insert(prediction, 1);
+                }
+            }
+        }
+
+        // Determine the final prediction for each data point based on majority vote
+        Ok(polls
+            .iter()
+            .map(|v| {
+                // Find the class with the maximum votes for each data point
+                TX::from(*v.iter().max_by_key(|(_, class)| *class).unwrap().0).unwrap()
+            })
+            .collect())
+    }
+}
+
 #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
 #[derive(Debug)]
 /// SVC Parameters
@@ -123,7 +306,7 @@ pub struct SVCParameters<TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX
 )]
 /// Support Vector Classifier
 pub struct SVC<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>> {
-    classes: Option<Vec<TY>>,
+    classes: Option<(TY, TY)>,
    instances: Option<Vec<Vec<TX>>>,
    #[cfg_attr(feature = "serde", serde(skip))]
    parameters: Option<&'a SVCParameters<TX, TY, X, Y>>,
@@ -152,7 +335,9 @@ struct Cache<TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1
 struct Optimizer<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>> {
    x: &'a X,
    y: &'a Y,
+    indices: Option<Vec<usize>>,
    parameters: &'a SVCParameters<TX, TY, X, Y>,
+    classes: &'a (TY, TY),
    svmin: usize,
    svmax: usize,
    gmin: TX,
@@ -180,12 +365,12 @@ impl<TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
        self.tol = tol;
        self
    }
+
    /// The kernel function.
    pub fn with_kernel<K: Kernel + 'static>(mut self, kernel: K) -> Self {
        self.kernel = Some(Box::new(kernel));
        self
    }
-
    /// Seed for the pseudo random number generator.
    pub fn with_seed(mut self, seed: Option<u64>) -> Self {
        self.seed = seed;
@@ -241,17 +426,98 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>
 impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX> + 'a, Y: Array1<TY> + 'a>
    SVC<'a, TX, TY, X, Y>
 {
-    /// Fits SVC to your data.
-    /// * `x` - _NxM_ matrix with _N_ observations and _M_ features in each observation.
-    /// * `y` - class labels
-    /// * `parameters` - optional parameters, use `Default::default()` to set parameters to default values.
+    /// Fits a binary Support Vector Classifier (SVC) to the provided data.
+    ///
+    /// This is the primary `fit` method for a standalone binary SVC. It expects
+    /// the target labels `y` to contain exactly two unique classes. If more or
+    /// fewer than two classes are found, it returns an error. It then extracts
+    /// these two classes and proceeds to optimize and fit the SVC model.
+    ///
+    /// # Arguments
+    /// * `x` - A reference to the input features (2D array) of the training data.
+    /// * `y` - A reference to the target labels (1D array) of the training data. `y` must contain exactly two unique class labels.
+    /// * `parameters` - A reference to the `SVCParameters` controlling the training process.
+    ///
+    /// # Returns
+    /// A `Result` which is:
+    /// - `Ok(SVC<'a, TX, TY, X, Y>)`: A new, fitted binary SVC instance.
+    /// - `Err(Failed)`: If the number of unique classes in `y` is not exactly two, or if the underlying optimization fails.
    pub fn fit(
        x: &'a X,
        y: &'a Y,
        parameters: &'a SVCParameters<TX, TY, X, Y>,
    ) -> Result<SVC<'a, TX, TY, X, Y>, Failed> {
-        let (n, _) = x.shape();
+        let classes = y.unique();
+        // Validate that there are exactly two unique classes in the target labels.
+        if classes.len() != 2 {
+            return Err(Failed::fit(&format!(
+                "Incorrect number of classes: {}. A binary SVC requires exactly two classes.",
+                classes.len()
+            )));
+        }
+        let classes = (classes[0], classes[1]);
+        let svc = Self::optimize_and_fit(x, y, parameters, classes, None);
+        svc
+    }

+    /// Fits a binary Support Vector Classifier (SVC) specifically for multi-class scenarios.
+    ///
+    /// This function is intended to be called by a multi-class strategy (e.g., one-vs-one)
+    /// to train individual binary SVCs. It takes a `MultiClassConfig` which specifies
+    /// the two classes this SVC should discriminate and the subset of data indices
+    /// relevant to these classes. It then delegates the actual optimization and fitting
+    /// to `optimize_and_fit`.
+    ///
+    /// # Arguments
+    /// * `x` - A reference to the input features (2D array) of the training data.
+    /// * `y` - A reference to the target labels (1D array) of the training data.
+    /// * `parameters` - A reference to the `SVCParameters` controlling the training process (e.g., kernel, C-value, tolerance).
+    /// * `multiclass_config` - A `MultiClassConfig` struct containing:
+    ///     - `classes`: A tuple `(class0, class1)` specifying the two classes this SVC should distinguish.
+    ///     - `indices`: A `Vec<usize>` containing the indices of the data points in `x` and `y that belong to either `class0` or `class1`.`
+    ///
+    /// # Returns
+    /// A `Result` which is:
+    /// - `Ok(SVC<'a, TX, TY, X, Y>)`: A new, fitted binary SVC instance.
+    /// - `Err(Failed)`: If the fitting process encounters an error (e.g., invalid parameters).
+    fn multiclass_fit(
+        x: &'a X,
+        y: &'a Y,
+        parameters: &'a SVCParameters<TX, TY, X, Y>,
+        multiclass_config: MultiClassConfig<TY>,
+    ) -> Result<SVC<'a, TX, TY, X, Y>, Failed> {
+        let classes = multiclass_config.classes;
+        let indices = multiclass_config.indices;
+        let svc = Self::optimize_and_fit(x, y, parameters, classes, Some(indices));
+        svc
+    }
+
+    /// Internal function to optimize and fit the Support Vector Classifier.
+    ///
+    /// This is the core logic for training a binary SVC. It performs several checks
+    /// (e.g., kernel presence, data shape consistency) and then initializes an
+    /// `Optimizer` to find the support vectors, weights (`w`), and bias (`b`).
+    ///
+    /// # Arguments
+    /// * `x` - A reference to the input features (2D array) of the training data.
+    /// * `y` - A reference to the target labels (1D array) of the training data.
+    /// * `parameters` - A reference to the `SVCParameters` defining the SVM model's configuration.
+    /// * `classes` - A tuple `(class0, class1)` representing the two distinct class labels that the SVC will learn to separate.
+    /// * `indices` - An `Option<Vec<usize>>`. If `Some`, it contains the specific indices of data points from `x` and `y` that should be used for training this binary classifier. If `None`, all data points in `x` and `y` are considered.
+    /// # Returns
+    /// A `Result` which is:
+    /// - `Ok(SVC<'a, TX, TY, X, Y>)`: A new `SVC` instance populated with the learned model components (support vectors, weights, bias).
+    /// - `Err(Failed)`: If any of the validation checks fail (e.g., missing kernel, mismatched data shapes), or if the optimization process fails.
+    fn optimize_and_fit(
+        x: &'a X,
+        y: &'a Y,
+        parameters: &'a SVCParameters<TX, TY, X, Y>,
+        classes: (TY, TY),
+        indices: Option<Vec<usize>>,
+    ) -> Result<SVC<'a, TX, TY, X, Y>, Failed> {
+        let (n_samples, _) = x.shape();
+
+        // Validate that a kernel has been defined in the parameters.
        if parameters.kernel.is_none() {
            return Err(Failed::because(
                FailedError::ParametersError,
@@ -259,55 +525,39 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX> + 'a, Y: Array
            ));
        }

-        if n != y.shape() {
+        // Validate that the number of samples in X matches the number of labels in Y.
+        if n_samples != y.shape() {
            return Err(Failed::fit(
-                "Number of rows of X doesn\'t match number of rows of Y",
+                "Number of rows of X doesn't match number of rows of Y",
            ));
        }

-        let classes = y.unique();
-
-        if classes.len() != 2 {
-            return Err(Failed::fit(&format!(
-                "Incorrect number of classes: {}",
-                classes.len()
-            )));
-        }
-
-        // Make sure class labels are either 1 or -1
-        for e in y.iterator(0) {
-            let y_v = e.to_i32().unwrap();
-            if y_v != -1 && y_v != 1 {
-                return Err(Failed::because(
-                    FailedError::ParametersError,
-                    "Class labels must be 1 or -1",
-                ));
-            }
-        }
-
-        let optimizer: Optimizer<'_, TX, TY, X, Y> = Optimizer::new(x, y, parameters);
+        let optimizer: Optimizer<'_, TX, TY, X, Y> =
+            Optimizer::new(x, y, indices, parameters, &classes);

+        // Perform the optimization to find the support vectors, weight vector, and bias.
+        // This is where the core SVM algorithm (e.g., SMO) would run.
        let (support_vectors, weight, b) = optimizer.optimize();

+        // Construct and return the fitted SVC model.
        Ok(SVC::<'a> {
-            classes: Some(classes),
-            instances: Some(support_vectors),
-            parameters: Some(parameters),
-            w: Some(weight),
-            b: Some(b),
-            phantomdata: PhantomData,
+            classes: Some(classes), // Store the two classes the SVC was trained on.
+            instances: Some(support_vectors), // Store the data points that are support vectors.
+            parameters: Some(parameters), // Reference to the parameters used for fitting.
+            w: Some(weight),        // The learned weight vector (for linear kernels).
+            b: Some(b),             // The learned bias term.
+            phantomdata: PhantomData, // Placeholder for type parameters not directly stored.
        })
    }
-
    /// Predicts estimated class labels from `x`
    /// * `x` - _KxM_ data where _K_ is number of observations and _M_ is number of features.
    pub fn predict(&self, x: &'a X) -> Result<Vec<TX>, Failed> {
        let mut y_hat: Vec<TX> = self.decision_function(x)?;

        for i in 0..y_hat.len() {
-            let cls_idx = match *y_hat.get(i).unwrap() > TX::zero() {
-                false => TX::from(self.classes.as_ref().unwrap()[0]).unwrap(),
-                true => TX::from(self.classes.as_ref().unwrap()[1]).unwrap(),
+            let cls_idx = match *y_hat.get(i) > TX::zero() {
+                false => TX::from(self.classes.as_ref().unwrap().0).unwrap(),
+                true => TX::from(self.classes.as_ref().unwrap().1).unwrap(),
            };

            y_hat.set(i, cls_idx);
@@ -360,8 +610,8 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX> + 'a, Y: Array
    }
 }

-impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>> PartialEq
-    for SVC<'a, TX, TY, X, Y>
+impl<TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>> PartialEq
+    for SVC<'_, TX, TY, X, Y>
 {
    fn eq(&self, other: &Self) -> bool {
        if (self.b.unwrap().sub(other.b.unwrap())).abs() > TX::epsilon() * TX::two()
@@ -445,14 +695,18 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>
    fn new(
        x: &'a X,
        y: &'a Y,
+        indices: Option<Vec<usize>>,
        parameters: &'a SVCParameters<TX, TY, X, Y>,
+        classes: &'a (TY, TY),
    ) -> Optimizer<'a, TX, TY, X, Y> {
        let (n, _) = x.shape();

        Optimizer {
            x,
            y,
+            indices,
            parameters,
+            classes,
            svmin: 0,
            svmax: 0,
            gmin: <TX as Bounded>::max_value(),
@@ -478,7 +732,12 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>
            for i in self.permutate(n) {
                x.clear();
                x.extend(self.x.get_row(i).iterator(0).take(n).copied());
-                self.process(i, &x, *self.y.get(i), &mut cache);
+                let y = if *self.y.get(i) == self.classes.1 {
+                    1
+                } else {
+                    -1
+                } as f64;
+                self.process(i, &x, y, &mut cache);
                loop {
                    self.reprocess(tol, &mut cache);
                    self.find_min_max_gradient();
@@ -514,14 +773,16 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>
        for i in self.permutate(n) {
            x.clear();
            x.extend(self.x.get_row(i).iterator(0).take(n).copied());
-            if *self.y.get(i) == TY::one() && cp < few {
-                if self.process(i, &x, *self.y.get(i), cache) {
+            let y = if *self.y.get(i) == self.classes.1 {
+                1
+            } else {
+                -1
+            } as f64;
+            if y == 1.0 && cp < few {
+                if self.process(i, &x, y, cache) {
                    cp += 1;
                }
-            } else if *self.y.get(i) == TY::from(-1).unwrap()
-                && cn < few
-                && self.process(i, &x, *self.y.get(i), cache)
-            {
+            } else if y == -1.0 && cn < few && self.process(i, &x, y, cache) {
                cn += 1;
            }

@@ -531,14 +792,14 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>
        }
    }

-    fn process(&mut self, i: usize, x: &[TX], y: TY, cache: &mut Cache<TX, TY, X, Y>) -> bool {
+    fn process(&mut self, i: usize, x: &[TX], y: f64, cache: &mut Cache<TX, TY, X, Y>) -> bool {
        for j in 0..self.sv.len() {
            if self.sv[j].index == i {
                return true;
            }
        }

-        let mut g: f64 = y.to_f64().unwrap();
+        let mut g = y;

        let mut cache_values: Vec<((usize, usize), TX)> = Vec::new();

@@ -559,8 +820,8 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>
        self.find_min_max_gradient();

        if self.gmin < self.gmax
-            && ((y > TY::zero() && g < self.gmin.to_f64().unwrap())
-                || (y < TY::zero() && g > self.gmax.to_f64().unwrap()))
+            && ((y > 0.0 && g < self.gmin.to_f64().unwrap())
+                || (y < 0.0 && g > self.gmax.to_f64().unwrap()))
        {
            return false;
        }
@@ -590,7 +851,7 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>
            ),
        );

-        if y > TY::zero() {
+        if y > 0.0 {
            self.smo(None, Some(0), TX::zero(), cache);
        } else {
            self.smo(Some(0), None, TX::zero(), cache);
@@ -647,7 +908,6 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>
        let gmin = self.gmin;

        let mut idxs_to_drop: HashSet<usize> = HashSet::new();
-
        self.sv.retain(|v| {
            if v.alpha == 0f64
                && ((TX::from(v.grad).unwrap() >= gmax && TX::zero() >= TX::from(v.cmax).unwrap())
@@ -666,7 +926,11 @@ impl<'a, TX: Number + RealNumber, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>

    fn permutate(&self, n: usize) -> Vec<usize> {
        let mut rng = get_rng_impl(self.parameters.seed);
-        let mut range: Vec<usize> = (0..n).collect();
+        let mut range = if let Some(indices) = self.indices.clone() {
+            indices
+        } else {
+            (0..n).collect::<Vec<usize>>()
+        };
        range.shuffle(&mut rng);
        range
    }
@@ -965,12 +1229,12 @@ mod tests {
        ];

        let knl = Kernels::linear();
-        let params = SVCParameters::default()
+        let parameters = SVCParameters::default()
            .with_c(200.0)
            .with_kernel(knl)
            .with_seed(Some(100));

-        let y_hat = SVC::fit(&x, &y, &params)
+        let y_hat = SVC::fit(&x, &y, &parameters)
            .and_then(|lr| lr.predict(&x))
            .unwrap();
        let acc = accuracy(&y, &(y_hat.iter().map(|e| e.to_i32().unwrap()).collect()));
@@ -1070,6 +1334,56 @@ mod tests {
        assert!(acc >= 0.9, "accuracy ({acc}) is not larger or equal to 0.9");
    }

+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn svc_multiclass_fit_predict() {
+        let x = DenseMatrix::from_2d_array(&[
+            &[5.1, 3.5, 1.4, 0.2],
+            &[4.9, 3.0, 1.4, 0.2],
+            &[4.7, 3.2, 1.3, 0.2],
+            &[4.6, 3.1, 1.5, 0.2],
+            &[5.0, 3.6, 1.4, 0.2],
+            &[5.4, 3.9, 1.7, 0.4],
+            &[4.6, 3.4, 1.4, 0.3],
+            &[5.0, 3.4, 1.5, 0.2],
+            &[4.4, 2.9, 1.4, 0.2],
+            &[4.9, 3.1, 1.5, 0.1],
+            &[7.0, 3.2, 4.7, 1.4],
+            &[6.4, 3.2, 4.5, 1.5],
+            &[6.9, 3.1, 4.9, 1.5],
+            &[5.5, 2.3, 4.0, 1.3],
+            &[6.5, 2.8, 4.6, 1.5],
+            &[5.7, 2.8, 4.5, 1.3],
+            &[6.3, 3.3, 4.7, 1.6],
+            &[4.9, 2.4, 3.3, 1.0],
+            &[6.6, 2.9, 4.6, 1.3],
+            &[5.2, 2.7, 3.9, 1.4],
+        ])
+        .unwrap();
+
+        let y: Vec<i32> = vec![0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2];
+
+        let knl = Kernels::linear();
+        let parameters = SVCParameters::default()
+            .with_c(200.0)
+            .with_kernel(knl)
+            .with_seed(Some(100));
+
+        let y_hat = MultiClassSVC::fit(&x, &y, &parameters)
+            .and_then(|lr| lr.predict(&x))
+            .unwrap();
+
+        let acc = accuracy(&y, &(y_hat.iter().map(|e| e.to_i32().unwrap()).collect()));
+
+        assert!(
+            acc >= 0.9,
+            "Multiclass accuracy ({acc}) is not larger or equal to 0.9"
+        );
+    }
+
    #[cfg_attr(
        all(target_arch = "wasm32", not(target_os = "wasi")),
        wasm_bindgen_test::wasm_bindgen_test
@@ -1106,11 +1420,11 @@ mod tests {
        ];

        let knl = Kernels::linear();
-        let params = SVCParameters::default().with_kernel(knl);
-        let svc = SVC::fit(&x, &y, &params).unwrap();
+        let parameters = SVCParameters::default().with_kernel(knl);
+        let svc = SVC::fit(&x, &y, &parameters).unwrap();

        // serialization
-        let deserialized_svc: SVC<f64, i32, _, _> =
+        let deserialized_svc: SVC<'_, f64, i32, _, _> =
            serde_json::from_str(&serde_json::to_string(&svc).unwrap()).unwrap();

        assert_eq!(svc, deserialized_svc);
@@ -51,9 +51,9 @@
 //!
 //! let knl = Kernels::linear();
 //! let params = &SVRParameters::default().with_eps(2.0).with_c(10.0).with_kernel(knl);
-//! // let svr = SVR::fit(&x, &y, params).unwrap();
+//! let svr = SVR::fit(&x, &y, params).unwrap();
 //!
-//! // let y_hat = svr.predict(&x).unwrap();
+//! let y_hat = svr.predict(&x).unwrap();
 //! ```
 //!
 //! ## References:
@@ -80,11 +80,12 @@ use crate::error::{Failed, FailedError};
 use crate::linalg::basic::arrays::{Array1, Array2, MutArray};
 use crate::numbers::basenum::Number;
 use crate::numbers::floatnum::FloatNumber;
-use crate::svm::Kernel;

+use crate::svm::{Kernel, Kernels};
+
+/// SVR Parameters
 #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
 #[derive(Debug)]
-/// SVR Parameters
 pub struct SVRParameters<T: Number + FloatNumber + PartialOrd> {
    /// Epsilon in the epsilon-SVR model.
    pub eps: T,
@@ -97,7 +98,7 @@ pub struct SVRParameters<T: Number + FloatNumber + PartialOrd> {
        all(feature = "serde", target_arch = "wasm32"),
        serde(skip_serializing, skip_deserializing)
    )]
-    pub kernel: Option<Box<dyn Kernel>>,
+    pub kernel: Option<Kernels>,
 }

 #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
@@ -160,8 +161,8 @@ impl<T: Number + FloatNumber + PartialOrd> SVRParameters<T> {
        self
    }
    /// The kernel function.
-    pub fn with_kernel<K: Kernel + 'static>(mut self, kernel: K) -> Self {
-        self.kernel = Some(Box::new(kernel));
+    pub fn with_kernel(mut self, kernel: Kernels) -> Self {
+        self.kernel = Some(kernel);
        self
    }
 }
@@ -281,8 +282,8 @@ impl<'a, T: Number + FloatNumber + PartialOrd, X: Array2<T>, Y: Array1<T>> SVR<'
    }
 }

-impl<'a, T: Number + FloatNumber + PartialOrd, X: Array2<T>, Y: Array1<T>> PartialEq
-    for SVR<'a, T, X, Y>
+impl<T: Number + FloatNumber + PartialOrd, X: Array2<T>, Y: Array1<T>> PartialEq
+    for SVR<'_, T, X, Y>
 {
    fn eq(&self, other: &Self) -> bool {
        if (self.b - other.b).abs() > T::epsilon() * T::two()
@@ -597,25 +598,25 @@ mod tests {
    use super::*;
    use crate::linalg::basic::matrix::DenseMatrix;
    use crate::metrics::mean_squared_error;
+    use crate::svm::search::svr_params::SVRSearchParameters;
    use crate::svm::Kernels;

-    // #[test]
-    // fn search_parameters() {
-    //     let parameters: SVRSearchParameters<f64, DenseMatrix<f64>, LinearKernel> =
-    //         SVRSearchParameters {
-    //             eps: vec![0., 1.],
-    //             kernel: vec![LinearKernel {}],
-    //             ..Default::default()
-    //         };
-    //     let mut iter = parameters.into_iter();
-    //     let next = iter.next().unwrap();
-    //     assert_eq!(next.eps, 0.);
-    //     assert_eq!(next.kernel, LinearKernel {});
-    //     let next = iter.next().unwrap();
-    //     assert_eq!(next.eps, 1.);
-    //     assert_eq!(next.kernel, LinearKernel {});
-    //     assert!(iter.next().is_none());
-    // }
+    #[test]
+    fn search_parameters() {
+        let parameters: SVRSearchParameters<f64, DenseMatrix<f64>> = SVRSearchParameters {
+            eps: vec![0., 1.],
+            kernel: vec![Kernels::linear()],
+            ..Default::default()
+        };
+        let mut iter = parameters.into_iter();
+        let next = iter.next().unwrap();
+        assert_eq!(next.eps, 0.);
+        // assert_eq!(next.kernel, LinearKernel {});
+        // let next = iter.next().unwrap();
+        // assert_eq!(next.eps, 1.);
+        // assert_eq!(next.kernel, LinearKernel {});
+        // assert!(iter.next().is_none());
+    }

    #[cfg_attr(
        all(target_arch = "wasm32", not(target_os = "wasi")),
@@ -648,7 +649,7 @@ mod tests {
            114.2, 115.7, 116.9,
        ];

-        let knl = Kernels::linear();
+        let knl: Kernels = Kernels::linear();
        let y_hat = SVR::fit(
            &x,
            &y,
@@ -702,7 +703,7 @@ mod tests {

        let svr = SVR::fit(&x, &y, &params).unwrap();

-        let deserialized_svr: SVR<f64, DenseMatrix<f64>, _> =
+        let deserialized_svr: SVR<'_, f64, DenseMatrix<f64>, _> =
            serde_json::from_str(&serde_json::to_string(&svr).unwrap()).unwrap();

        assert_eq!(svr, deserialized_svr);
@@ -0,0 +1,551 @@
+use std::collections::LinkedList;
+use std::default::Default;
+use std::fmt::Debug;
+use std::marker::PhantomData;
+
+use rand::seq::SliceRandom;
+use rand::Rng;
+
+#[cfg(feature = "serde")]
+use serde::{Deserialize, Serialize};
+
+use crate::error::Failed;
+use crate::linalg::basic::arrays::{Array1, Array2, MutArrayView1};
+use crate::numbers::basenum::Number;
+use crate::rand_custom::get_rng_impl;
+
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug, Clone, Default)]
+pub enum Splitter {
+    Random,
+    #[default]
+    Best,
+}
+
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug, Clone)]
+/// Parameters of Regression base_tree
+pub struct BaseTreeRegressorParameters {
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The maximum depth of the base_tree.
+    pub max_depth: Option<u16>,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The minimum number of samples required to be at a leaf node.
+    pub min_samples_leaf: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// The minimum number of samples required to split an internal node.
+    pub min_samples_split: usize,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Controls the randomness of the estimator
+    pub seed: Option<u64>,
+    #[cfg_attr(feature = "serde", serde(default))]
+    /// Determines the strategy used to choose the split at each node.
+    pub splitter: Splitter,
+}
+
+/// Regression base_tree
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug)]
+pub struct BaseTreeRegressor<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> {
+    nodes: Vec<Node>,
+    parameters: Option<BaseTreeRegressorParameters>,
+    depth: u16,
+    _phantom_tx: PhantomData<TX>,
+    _phantom_ty: PhantomData<TY>,
+    _phantom_x: PhantomData<X>,
+    _phantom_y: PhantomData<Y>,
+}
+
+impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    BaseTreeRegressor<TX, TY, X, Y>
+{
+    /// Get nodes, return a shared reference
+    fn nodes(&self) -> &Vec<Node> {
+        self.nodes.as_ref()
+    }
+    /// Get parameters, return a shared reference
+    fn parameters(&self) -> &BaseTreeRegressorParameters {
+        self.parameters.as_ref().unwrap()
+    }
+    /// Get estimate of intercept, return value
+    fn depth(&self) -> u16 {
+        self.depth
+    }
+}
+
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug, Clone)]
+struct Node {
+    output: f64,
+    split_feature: usize,
+    split_value: Option<f64>,
+    split_score: Option<f64>,
+    true_child: Option<usize>,
+    false_child: Option<usize>,
+}
+
+impl Node {
+    fn new(output: f64) -> Self {
+        Node {
+            output,
+            split_feature: 0,
+            split_value: Option::None,
+            split_score: Option::None,
+            true_child: Option::None,
+            false_child: Option::None,
+        }
+    }
+}
+
+impl PartialEq for Node {
+    fn eq(&self, other: &Self) -> bool {
+        (self.output - other.output).abs() < f64::EPSILON
+            && self.split_feature == other.split_feature
+            && match (self.split_value, other.split_value) {
+                (Some(a), Some(b)) => (a - b).abs() < f64::EPSILON,
+                (None, None) => true,
+                _ => false,
+            }
+            && match (self.split_score, other.split_score) {
+                (Some(a), Some(b)) => (a - b).abs() < f64::EPSILON,
+                (None, None) => true,
+                _ => false,
+            }
+    }
+}
+
+impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> PartialEq
+    for BaseTreeRegressor<TX, TY, X, Y>
+{
+    fn eq(&self, other: &Self) -> bool {
+        if self.depth != other.depth || self.nodes().len() != other.nodes().len() {
+            false
+        } else {
+            self.nodes()
+                .iter()
+                .zip(other.nodes().iter())
+                .all(|(a, b)| a == b)
+        }
+    }
+}
+
+struct NodeVisitor<'a, TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> {
+    x: &'a X,
+    y: &'a Y,
+    node: usize,
+    samples: Vec<usize>,
+    order: &'a [Vec<usize>],
+    true_child_output: f64,
+    false_child_output: f64,
+    level: u16,
+    _phantom_tx: PhantomData<TX>,
+    _phantom_ty: PhantomData<TY>,
+}
+
+impl<'a, TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    NodeVisitor<'a, TX, TY, X, Y>
+{
+    fn new(
+        node_id: usize,
+        samples: Vec<usize>,
+        order: &'a [Vec<usize>],
+        x: &'a X,
+        y: &'a Y,
+        level: u16,
+    ) -> Self {
+        NodeVisitor {
+            x,
+            y,
+            node: node_id,
+            samples,
+            order,
+            true_child_output: 0f64,
+            false_child_output: 0f64,
+            level,
+            _phantom_tx: PhantomData,
+            _phantom_ty: PhantomData,
+        }
+    }
+}
+
+impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    BaseTreeRegressor<TX, TY, X, Y>
+{
+    /// Build a decision base_tree regressor from the training data.
+    /// * `x` - _NxM_ matrix with _N_ observations and _M_ features in each observation.
+    /// * `y` - the target values
+    pub fn fit(
+        x: &X,
+        y: &Y,
+        parameters: BaseTreeRegressorParameters,
+    ) -> Result<BaseTreeRegressor<TX, TY, X, Y>, Failed> {
+        let (x_nrows, num_attributes) = x.shape();
+        if x_nrows != y.shape() {
+            return Err(Failed::fit("Size of x should equal size of y"));
+        }
+
+        let samples = vec![1; x_nrows];
+        BaseTreeRegressor::fit_weak_learner(x, y, samples, num_attributes, parameters)
+    }
+
+    pub(crate) fn fit_weak_learner(
+        x: &X,
+        y: &Y,
+        samples: Vec<usize>,
+        mtry: usize,
+        parameters: BaseTreeRegressorParameters,
+    ) -> Result<BaseTreeRegressor<TX, TY, X, Y>, Failed> {
+        let y_m = y.clone();
+
+        let y_ncols = y_m.shape();
+        let (_, num_attributes) = x.shape();
+
+        let mut nodes: Vec<Node> = Vec::new();
+        let mut rng = get_rng_impl(parameters.seed);
+
+        let mut n = 0;
+        let mut sum = 0f64;
+        for (i, sample_i) in samples.iter().enumerate().take(y_ncols) {
+            n += *sample_i;
+            sum += *sample_i as f64 * y_m.get(i).to_f64().unwrap();
+        }
+
+        let root = Node::new(sum / (n as f64));
+        nodes.push(root);
+        let mut order: Vec<Vec<usize>> = Vec::new();
+
+        for i in 0..num_attributes {
+            let mut col_i: Vec<TX> = x.get_col(i).iterator(0).copied().collect();
+            order.push(col_i.argsort_mut());
+        }
+
+        let mut base_tree = BaseTreeRegressor {
+            nodes,
+            parameters: Some(parameters),
+            depth: 0u16,
+            _phantom_tx: PhantomData,
+            _phantom_ty: PhantomData,
+            _phantom_x: PhantomData,
+            _phantom_y: PhantomData,
+        };
+
+        let mut visitor = NodeVisitor::<TX, TY, X, Y>::new(0, samples, &order, x, &y_m, 1);
+
+        let mut visitor_queue: LinkedList<NodeVisitor<'_, TX, TY, X, Y>> = LinkedList::new();
+
+        if base_tree.find_best_cutoff(&mut visitor, mtry, &mut rng) {
+            visitor_queue.push_back(visitor);
+        }
+
+        while base_tree.depth() < base_tree.parameters().max_depth.unwrap_or(u16::MAX) {
+            match visitor_queue.pop_front() {
+                Some(node) => base_tree.split(node, mtry, &mut visitor_queue, &mut rng),
+                None => break,
+            };
+        }
+
+        Ok(base_tree)
+    }
+
+    /// Predict regression value for `x`.
+    /// * `x` - _KxM_ data where _K_ is number of observations and _M_ is number of features.
+    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
+        let mut result = Y::zeros(x.shape().0);
+
+        let (n, _) = x.shape();
+
+        for i in 0..n {
+            result.set(i, self.predict_for_row(x, i));
+        }
+
+        Ok(result)
+    }
+
+    pub(crate) fn predict_for_row(&self, x: &X, row: usize) -> TY {
+        let mut result = 0f64;
+        let mut queue: LinkedList<usize> = LinkedList::new();
+
+        queue.push_back(0);
+
+        while !queue.is_empty() {
+            match queue.pop_front() {
+                Some(node_id) => {
+                    let node = &self.nodes()[node_id];
+                    if node.true_child.is_none() && node.false_child.is_none() {
+                        result = node.output;
+                    } else if x.get((row, node.split_feature)).to_f64().unwrap()
+                        <= node.split_value.unwrap_or(f64::NAN)
+                    {
+                        queue.push_back(node.true_child.unwrap());
+                    } else {
+                        queue.push_back(node.false_child.unwrap());
+                    }
+                }
+                None => break,
+            };
+        }
+
+        TY::from_f64(result).unwrap()
+    }
+
+    fn find_best_cutoff(
+        &mut self,
+        visitor: &mut NodeVisitor<'_, TX, TY, X, Y>,
+        mtry: usize,
+        rng: &mut impl Rng,
+    ) -> bool {
+        let (_, n_attr) = visitor.x.shape();
+
+        let n: usize = visitor.samples.iter().sum();
+
+        if n < self.parameters().min_samples_split {
+            return false;
+        }
+
+        let sum = self.nodes()[visitor.node].output * n as f64;
+
+        let mut variables = (0..n_attr).collect::<Vec<_>>();
+
+        if mtry < n_attr {
+            variables.shuffle(rng);
+        }
+
+        let parent_gain =
+            n as f64 * self.nodes()[visitor.node].output * self.nodes()[visitor.node].output;
+
+        let splitter = self.parameters().splitter.clone();
+
+        for variable in variables.iter().take(mtry) {
+            match splitter {
+                Splitter::Random => {
+                    self.find_random_split(visitor, n, sum, parent_gain, *variable, rng);
+                }
+                Splitter::Best => {
+                    self.find_best_split(visitor, n, sum, parent_gain, *variable);
+                }
+            }
+        }
+
+        self.nodes()[visitor.node].split_score.is_some()
+    }
+
+    fn find_random_split(
+        &mut self,
+        visitor: &mut NodeVisitor<'_, TX, TY, X, Y>,
+        n: usize,
+        sum: f64,
+        parent_gain: f64,
+        j: usize,
+        rng: &mut impl Rng,
+    ) {
+        let (min_val, max_val) = {
+            let mut min_opt = None;
+            let mut max_opt = None;
+            for &i in &visitor.order[j] {
+                if visitor.samples[i] > 0 {
+                    min_opt = Some(*visitor.x.get((i, j)));
+                    break;
+                }
+            }
+            for &i in visitor.order[j].iter().rev() {
+                if visitor.samples[i] > 0 {
+                    max_opt = Some(*visitor.x.get((i, j)));
+                    break;
+                }
+            }
+            if min_opt.is_none() {
+                return;
+            }
+            (min_opt.unwrap(), max_opt.unwrap())
+        };
+
+        if min_val >= max_val {
+            return;
+        }
+
+        let split_value = rng.gen_range(min_val.to_f64().unwrap()..max_val.to_f64().unwrap());
+
+        let mut true_sum = 0f64;
+        let mut true_count = 0;
+        for &i in &visitor.order[j] {
+            if visitor.samples[i] > 0 {
+                if visitor.x.get((i, j)).to_f64().unwrap() <= split_value {
+                    true_sum += visitor.samples[i] as f64 * visitor.y.get(i).to_f64().unwrap();
+                    true_count += visitor.samples[i];
+                } else {
+                    break;
+                }
+            }
+        }
+
+        let false_count = n - true_count;
+
+        if true_count < self.parameters().min_samples_leaf
+            || false_count < self.parameters().min_samples_leaf
+        {
+            return;
+        }
+
+        let true_mean = if true_count > 0 {
+            true_sum / true_count as f64
+        } else {
+            0.0
+        };
+        let false_mean = if false_count > 0 {
+            (sum - true_sum) / false_count as f64
+        } else {
+            0.0
+        };
+        let gain = (true_count as f64 * true_mean * true_mean
+            + false_count as f64 * false_mean * false_mean)
+            - parent_gain;
+
+        if self.nodes[visitor.node].split_score.is_none()
+            || gain > self.nodes[visitor.node].split_score.unwrap()
+        {
+            self.nodes[visitor.node].split_feature = j;
+            self.nodes[visitor.node].split_value = Some(split_value);
+            self.nodes[visitor.node].split_score = Some(gain);
+            visitor.true_child_output = true_mean;
+            visitor.false_child_output = false_mean;
+        }
+    }
+
+    fn find_best_split(
+        &mut self,
+        visitor: &mut NodeVisitor<'_, TX, TY, X, Y>,
+        n: usize,
+        sum: f64,
+        parent_gain: f64,
+        j: usize,
+    ) {
+        let mut true_sum = 0f64;
+        let mut true_count = 0;
+        let mut prevx = Option::None;
+
+        for i in visitor.order[j].iter() {
+            if visitor.samples[*i] > 0 {
+                let x_ij = *visitor.x.get((*i, j));
+
+                if prevx.is_none() || x_ij == prevx.unwrap() {
+                    prevx = Some(x_ij);
+                    true_count += visitor.samples[*i];
+                    true_sum += visitor.samples[*i] as f64 * visitor.y.get(*i).to_f64().unwrap();
+                    continue;
+                }
+
+                let false_count = n - true_count;
+
+                if true_count < self.parameters().min_samples_leaf
+                    || false_count < self.parameters().min_samples_leaf
+                {
+                    prevx = Some(x_ij);
+                    true_count += visitor.samples[*i];
+                    true_sum += visitor.samples[*i] as f64 * visitor.y.get(*i).to_f64().unwrap();
+                    continue;
+                }
+
+                let true_mean = true_sum / true_count as f64;
+                let false_mean = (sum - true_sum) / false_count as f64;
+
+                let gain = (true_count as f64 * true_mean * true_mean
+                    + false_count as f64 * false_mean * false_mean)
+                    - parent_gain;
+
+                if self.nodes()[visitor.node].split_score.is_none()
+                    || gain > self.nodes()[visitor.node].split_score.unwrap()
+                {
+                    self.nodes[visitor.node].split_feature = j;
+                    self.nodes[visitor.node].split_value =
+                        Option::Some((x_ij + prevx.unwrap()).to_f64().unwrap() / 2f64);
+                    self.nodes[visitor.node].split_score = Option::Some(gain);
+
+                    visitor.true_child_output = true_mean;
+                    visitor.false_child_output = false_mean;
+                }
+
+                prevx = Some(x_ij);
+                true_sum += visitor.samples[*i] as f64 * visitor.y.get(*i).to_f64().unwrap();
+                true_count += visitor.samples[*i];
+            }
+        }
+    }
+
+    fn split<'a>(
+        &mut self,
+        mut visitor: NodeVisitor<'a, TX, TY, X, Y>,
+        mtry: usize,
+        visitor_queue: &mut LinkedList<NodeVisitor<'a, TX, TY, X, Y>>,
+        rng: &mut impl Rng,
+    ) -> bool {
+        let (n, _) = visitor.x.shape();
+        let mut tc = 0;
+        let mut fc = 0;
+        let mut true_samples: Vec<usize> = vec![0; n];
+
+        for (i, true_sample) in true_samples.iter_mut().enumerate().take(n) {
+            if visitor.samples[i] > 0 {
+                if visitor
+                    .x
+                    .get((i, self.nodes()[visitor.node].split_feature))
+                    .to_f64()
+                    .unwrap()
+                    <= self.nodes()[visitor.node].split_value.unwrap_or(f64::NAN)
+                {
+                    *true_sample = visitor.samples[i];
+                    tc += *true_sample;
+                    visitor.samples[i] = 0;
+                } else {
+                    fc += visitor.samples[i];
+                }
+            }
+        }
+
+        if tc < self.parameters().min_samples_leaf || fc < self.parameters().min_samples_leaf {
+            self.nodes[visitor.node].split_feature = 0;
+            self.nodes[visitor.node].split_value = Option::None;
+            self.nodes[visitor.node].split_score = Option::None;
+
+            return false;
+        }
+
+        let true_child_idx = self.nodes().len();
+
+        self.nodes.push(Node::new(visitor.true_child_output));
+        let false_child_idx = self.nodes().len();
+        self.nodes.push(Node::new(visitor.false_child_output));
+
+        self.nodes[visitor.node].true_child = Some(true_child_idx);
+        self.nodes[visitor.node].false_child = Some(false_child_idx);
+
+        self.depth = u16::max(self.depth, visitor.level + 1);
+
+        let mut true_visitor = NodeVisitor::<TX, TY, X, Y>::new(
+            true_child_idx,
+            true_samples,
+            visitor.order,
+            visitor.x,
+            visitor.y,
+            visitor.level + 1,
+        );
+
+        if self.find_best_cutoff(&mut true_visitor, mtry, rng) {
+            visitor_queue.push_back(true_visitor);
+        }
+
+        let mut false_visitor = NodeVisitor::<TX, TY, X, Y>::new(
+            false_child_idx,
+            visitor.samples,
+            visitor.order,
+            visitor.x,
+            visitor.y,
+            visitor.level + 1,
+        );
+
+        if self.find_best_cutoff(&mut false_visitor, mtry, rng) {
+            visitor_queue.push_back(false_visitor);
+        }
+
+        true
+    }
+}
@@ -77,7 +77,9 @@ use serde::{Deserialize, Serialize};

 use crate::api::{Predictor, SupervisedEstimator};
 use crate::error::Failed;
+use crate::linalg::basic::arrays::MutArray;
 use crate::linalg::basic::arrays::{Array1, Array2, MutArrayView1};
+use crate::linalg::basic::matrix::DenseMatrix;
 use crate::numbers::basenum::Number;
 use crate::rand_custom::get_rng_impl;

@@ -197,12 +199,12 @@ impl PartialEq for Node {
        self.output == other.output
            && self.split_feature == other.split_feature
            && match (self.split_value, other.split_value) {
-                (Some(a), Some(b)) => (a - b).abs() < std::f64::EPSILON,
+                (Some(a), Some(b)) => (a - b).abs() < f64::EPSILON,
                (None, None) => true,
                _ => false,
            }
            && match (self.split_score, other.split_score) {
-                (Some(a), Some(b)) => (a - b).abs() < std::f64::EPSILON,
+                (Some(a), Some(b)) => (a - b).abs() < f64::EPSILON,
                (None, None) => true,
                _ => false,
            }
@@ -613,7 +615,7 @@ impl<TX: Number + PartialOrd, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
            visitor_queue.push_back(visitor);
        }

-        while tree.depth() < tree.parameters().max_depth.unwrap_or(std::u16::MAX) {
+        while tree.depth() < tree.parameters().max_depth.unwrap_or(u16::MAX) {
            match visitor_queue.pop_front() {
                Some(node) => tree.split(node, mtry, &mut visitor_queue, &mut rng),
                None => break,
@@ -650,7 +652,7 @@ impl<TX: Number + PartialOrd, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
                    if node.true_child.is_none() && node.false_child.is_none() {
                        result = node.output;
                    } else if x.get((row, node.split_feature)).to_f64().unwrap()
-                        <= node.split_value.unwrap_or(std::f64::NAN)
+                        <= node.split_value.unwrap_or(f64::NAN)
                    {
                        queue.push_back(node.true_child.unwrap());
                    } else {
@@ -672,15 +674,20 @@ impl<TX: Number + PartialOrd, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
    ) -> bool {
        let (n_rows, n_attr) = visitor.x.shape();

-        let mut label = Option::None;
+        let mut label = None;
        let mut is_pure = true;
        for i in 0..n_rows {
            if visitor.samples[i] > 0 {
-                if label.is_none() {
-                    label = Option::Some(visitor.y[i]);
-                } else if visitor.y[i] != label.unwrap() {
-                    is_pure = false;
-                    break;
+                match label {
+                    None => {
+                        label = Some(visitor.y[i]);
+                    }
+                    Some(current_label) => {
+                        if visitor.y[i] != current_label {
+                            is_pure = false;
+                            break;
+                        }
+                    }
                }
            }
        }
@@ -803,9 +810,7 @@ impl<TX: Number + PartialOrd, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
                    .get((i, self.nodes()[visitor.node].split_feature))
                    .to_f64()
                    .unwrap()
-                    <= self.nodes()[visitor.node]
-                        .split_value
-                        .unwrap_or(std::f64::NAN)
+                    <= self.nodes()[visitor.node].split_value.unwrap_or(f64::NAN)
                {
                    *true_sample = visitor.samples[i];
                    tc += *true_sample;
@@ -889,11 +894,77 @@ impl<TX: Number + PartialOrd, TY: Number + Ord, X: Array2<TX>, Y: Array1<TY>>
        }
        importances
    }
+
+    /// Predict class probabilities for the input samples.
+    ///
+    /// # Arguments
+    ///
+    /// * `x` - The input samples as a matrix where each row is a sample and each column is a feature.
+    ///
+    /// # Returns
+    ///
+    /// A `Result` containing a `DenseMatrix<f64>` where each row corresponds to a sample and each column
+    /// corresponds to a class. The values represent the probability of the sample belonging to each class.
+    ///
+    /// # Errors
+    ///
+    /// Returns an error if at least one row prediction process fails.
+    pub fn predict_proba(&self, x: &X) -> Result<DenseMatrix<f64>, Failed> {
+        let (n_samples, _) = x.shape();
+        let n_classes = self.classes().len();
+        let mut result = DenseMatrix::<f64>::zeros(n_samples, n_classes);
+
+        for i in 0..n_samples {
+            let probs = self.predict_proba_for_row(x, i)?;
+            for (j, &prob) in probs.iter().enumerate() {
+                result.set((i, j), prob);
+            }
+        }
+
+        Ok(result)
+    }
+
+    /// Predict class probabilities for a single input sample.
+    ///
+    /// # Arguments
+    ///
+    /// * `x` - The input matrix containing all samples.
+    /// * `row` - The index of the row in `x` for which to predict probabilities.
+    ///
+    /// # Returns
+    ///
+    /// A vector of probabilities, one for each class, representing the probability
+    /// of the input sample belonging to each class.
+    fn predict_proba_for_row(&self, x: &X, row: usize) -> Result<Vec<f64>, Failed> {
+        let mut node = 0;
+
+        while let Some(current_node) = self.nodes().get(node) {
+            if current_node.true_child.is_none() && current_node.false_child.is_none() {
+                // Leaf node reached
+                let mut probs = vec![0.0; self.classes().len()];
+                probs[current_node.output] = 1.0;
+                return Ok(probs);
+            }
+
+            let split_feature = current_node.split_feature;
+            let split_value = current_node.split_value.unwrap_or(f64::NAN);
+
+            if x.get((row, split_feature)).to_f64().unwrap() <= split_value {
+                node = current_node.true_child.unwrap();
+            } else {
+                node = current_node.false_child.unwrap();
+            }
+        }
+
+        // This should never happen if the tree is properly constructed
+        Err(Failed::predict("Nodes iteration did not reach leaf"))
+    }
 }

 #[cfg(test)]
 mod tests {
    use super::*;
+    use crate::linalg::basic::arrays::Array;
    use crate::linalg::basic::matrix::DenseMatrix;

    #[test]
@@ -925,17 +996,62 @@ mod tests {
    )]
    #[test]
    fn gini_impurity() {
-        assert!((impurity(&SplitCriterion::Gini, &[7, 3], 10) - 0.42).abs() < std::f64::EPSILON);
+        assert!((impurity(&SplitCriterion::Gini, &[7, 3], 10) - 0.42).abs() < f64::EPSILON);
        assert!(
            (impurity(&SplitCriterion::Entropy, &[7, 3], 10) - 0.8812908992306927).abs()
-                < std::f64::EPSILON
+                < f64::EPSILON
        );
        assert!(
            (impurity(&SplitCriterion::ClassificationError, &[7, 3], 10) - 0.3).abs()
-                < std::f64::EPSILON
+                < f64::EPSILON
        );
    }

+    #[cfg_attr(
+        all(target_arch = "wasm32", not(target_os = "wasi")),
+        wasm_bindgen_test::wasm_bindgen_test
+    )]
+    #[test]
+    fn test_predict_proba() {
+        let x: DenseMatrix<f64> = DenseMatrix::from_2d_array(&[
+            &[5.1, 3.5, 1.4, 0.2],
+            &[4.9, 3.0, 1.4, 0.2],
+            &[4.7, 3.2, 1.3, 0.2],
+            &[4.6, 3.1, 1.5, 0.2],
+            &[5.0, 3.6, 1.4, 0.2],
+            &[7.0, 3.2, 4.7, 1.4],
+            &[6.4, 3.2, 4.5, 1.5],
+            &[6.9, 3.1, 4.9, 1.5],
+            &[5.5, 2.3, 4.0, 1.3],
+            &[6.5, 2.8, 4.6, 1.5],
+        ])
+        .unwrap();
+        let y: Vec<usize> = vec![0, 0, 0, 0, 0, 1, 1, 1, 1, 1];
+
+        let tree = DecisionTreeClassifier::fit(&x, &y, Default::default()).unwrap();
+        let probabilities = tree.predict_proba(&x).unwrap();
+
+        assert_eq!(probabilities.shape(), (10, 2));
+
+        for row in 0..10 {
+            let row_sum: f64 = probabilities.get_row(row).sum();
+            assert!(
+                (row_sum - 1.0).abs() < 1e-6,
+                "Row probabilities should sum to 1"
+            );
+        }
+
+        // Check if the first 5 samples have higher probability for class 0
+        for i in 0..5 {
+            assert!(probabilities.get((i, 0)) > probabilities.get((i, 1)));
+        }
+
+        // Check if the last 5 samples have higher probability for class 1
+        for i in 5..10 {
+            assert!(probabilities.get((i, 1)) > probabilities.get((i, 0)));
+        }
+    }
+
    #[cfg_attr(
        all(target_arch = "wasm32", not(target_os = "wasi")),
        wasm_bindgen_test::wasm_bindgen_test
@@ -58,22 +58,17 @@
 //! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
 //! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

-use std::collections::LinkedList;
 use std::default::Default;
 use std::fmt::Debug;
-use std::marker::PhantomData;
-
-use rand::seq::SliceRandom;
-use rand::Rng;

 #[cfg(feature = "serde")]
 use serde::{Deserialize, Serialize};

+use super::base_tree_regressor::{BaseTreeRegressor, BaseTreeRegressorParameters, Splitter};
 use crate::api::{Predictor, SupervisedEstimator};
 use crate::error::Failed;
-use crate::linalg::basic::arrays::{Array1, Array2, MutArrayView1};
+use crate::linalg::basic::arrays::{Array1, Array2};
 use crate::numbers::basenum::Number;
-use crate::rand_custom::get_rng_impl;

 #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
 #[derive(Debug, Clone)]
@@ -98,41 +93,7 @@ pub struct DecisionTreeRegressorParameters {
 #[derive(Debug)]
 pub struct DecisionTreeRegressor<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
 {
-    nodes: Vec<Node>,
-    parameters: Option<DecisionTreeRegressorParameters>,
-    depth: u16,
-    _phantom_tx: PhantomData<TX>,
-    _phantom_ty: PhantomData<TY>,
-    _phantom_x: PhantomData<X>,
-    _phantom_y: PhantomData<Y>,
-}
-
-impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
-    DecisionTreeRegressor<TX, TY, X, Y>
-{
-    /// Get nodes, return a shared reference
-    fn nodes(&self) -> &Vec<Node> {
-        self.nodes.as_ref()
-    }
-    /// Get parameters, return a shared reference
-    fn parameters(&self) -> &DecisionTreeRegressorParameters {
-        self.parameters.as_ref().unwrap()
-    }
-    /// Get estimate of intercept, return value
-    fn depth(&self) -> u16 {
-        self.depth
-    }
-}
-
-#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-#[derive(Debug, Clone)]
-struct Node {
-    output: f64,
-    split_feature: usize,
-    split_value: Option<f64>,
-    split_score: Option<f64>,
-    true_child: Option<usize>,
-    false_child: Option<usize>,
+    tree_regressor: Option<BaseTreeRegressor<TX, TY, X, Y>>,
 }

 impl DecisionTreeRegressorParameters {
@@ -296,87 +257,11 @@ impl Default for DecisionTreeRegressorSearchParameters {
    }
 }

-impl Node {
-    fn new(output: f64) -> Self {
-        Node {
-            output,
-            split_feature: 0,
-            split_value: Option::None,
-            split_score: Option::None,
-            true_child: Option::None,
-            false_child: Option::None,
-        }
-    }
-}
-
-impl PartialEq for Node {
-    fn eq(&self, other: &Self) -> bool {
-        (self.output - other.output).abs() < std::f64::EPSILON
-            && self.split_feature == other.split_feature
-            && match (self.split_value, other.split_value) {
-                (Some(a), Some(b)) => (a - b).abs() < std::f64::EPSILON,
-                (None, None) => true,
-                _ => false,
-            }
-            && match (self.split_score, other.split_score) {
-                (Some(a), Some(b)) => (a - b).abs() < std::f64::EPSILON,
-                (None, None) => true,
-                _ => false,
-            }
-    }
-}
-
 impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> PartialEq
    for DecisionTreeRegressor<TX, TY, X, Y>
 {
    fn eq(&self, other: &Self) -> bool {
-        if self.depth != other.depth || self.nodes().len() != other.nodes().len() {
-            false
-        } else {
-            self.nodes()
-                .iter()
-                .zip(other.nodes().iter())
-                .all(|(a, b)| a == b)
-        }
-    }
-}
-
-struct NodeVisitor<'a, TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> {
-    x: &'a X,
-    y: &'a Y,
-    node: usize,
-    samples: Vec<usize>,
-    order: &'a [Vec<usize>],
-    true_child_output: f64,
-    false_child_output: f64,
-    level: u16,
-    _phantom_tx: PhantomData<TX>,
-    _phantom_ty: PhantomData<TY>,
-}
-
-impl<'a, TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
-    NodeVisitor<'a, TX, TY, X, Y>
-{
-    fn new(
-        node_id: usize,
-        samples: Vec<usize>,
-        order: &'a [Vec<usize>],
-        x: &'a X,
-        y: &'a Y,
-        level: u16,
-    ) -> Self {
-        NodeVisitor {
-            x,
-            y,
-            node: node_id,
-            samples,
-            order,
-            true_child_output: 0f64,
-            false_child_output: 0f64,
-            level,
-            _phantom_tx: PhantomData,
-            _phantom_ty: PhantomData,
-        }
+        self.tree_regressor == other.tree_regressor
    }
 }

@@ -386,13 +271,7 @@ impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
 {
    fn new() -> Self {
        Self {
-            nodes: vec![],
-            parameters: Option::None,
-            depth: 0u16,
-            _phantom_tx: PhantomData,
-            _phantom_ty: PhantomData,
-            _phantom_x: PhantomData,
-            _phantom_y: PhantomData,
+            tree_regressor: None,
        }
    }

@@ -420,285 +299,23 @@ impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
        y: &Y,
        parameters: DecisionTreeRegressorParameters,
    ) -> Result<DecisionTreeRegressor<TX, TY, X, Y>, Failed> {
-        let (x_nrows, num_attributes) = x.shape();
-        if x_nrows != y.shape() {
-            return Err(Failed::fit("Size of x should equal size of y"));
-        }
-
-        let samples = vec![1; x_nrows];
-        DecisionTreeRegressor::fit_weak_learner(x, y, samples, num_attributes, parameters)
-    }
-
-    pub(crate) fn fit_weak_learner(
-        x: &X,
-        y: &Y,
-        samples: Vec<usize>,
-        mtry: usize,
-        parameters: DecisionTreeRegressorParameters,
-    ) -> Result<DecisionTreeRegressor<TX, TY, X, Y>, Failed> {
-        let y_m = y.clone();
-
-        let y_ncols = y_m.shape();
-        let (_, num_attributes) = x.shape();
-
-        let mut nodes: Vec<Node> = Vec::new();
-        let mut rng = get_rng_impl(parameters.seed);
-
-        let mut n = 0;
-        let mut sum = 0f64;
-        for (i, sample_i) in samples.iter().enumerate().take(y_ncols) {
-            n += *sample_i;
-            sum += *sample_i as f64 * y_m.get(i).to_f64().unwrap();
-        }
-
-        let root = Node::new(sum / (n as f64));
-        nodes.push(root);
-        let mut order: Vec<Vec<usize>> = Vec::new();
-
-        for i in 0..num_attributes {
-            let mut col_i: Vec<TX> = x.get_col(i).iterator(0).copied().collect();
-            order.push(col_i.argsort_mut());
-        }
-
-        let mut tree = DecisionTreeRegressor {
-            nodes,
-            parameters: Some(parameters),
-            depth: 0u16,
-            _phantom_tx: PhantomData,
-            _phantom_ty: PhantomData,
-            _phantom_x: PhantomData,
-            _phantom_y: PhantomData,
+        let tree_parameters = BaseTreeRegressorParameters {
+            max_depth: parameters.max_depth,
+            min_samples_leaf: parameters.min_samples_leaf,
+            min_samples_split: parameters.min_samples_split,
+            seed: parameters.seed,
+            splitter: Splitter::Best,
        };
-
-        let mut visitor = NodeVisitor::<TX, TY, X, Y>::new(0, samples, &order, x, &y_m, 1);
-
-        let mut visitor_queue: LinkedList<NodeVisitor<'_, TX, TY, X, Y>> = LinkedList::new();
-
-        if tree.find_best_cutoff(&mut visitor, mtry, &mut rng) {
-            visitor_queue.push_back(visitor);
-        }
-
-        while tree.depth() < tree.parameters().max_depth.unwrap_or(std::u16::MAX) {
-            match visitor_queue.pop_front() {
-                Some(node) => tree.split(node, mtry, &mut visitor_queue, &mut rng),
-                None => break,
-            };
-        }
-
-        Ok(tree)
+        let tree = BaseTreeRegressor::fit(x, y, tree_parameters)?;
+        Ok(Self {
+            tree_regressor: Some(tree),
+        })
    }

    /// Predict regression value for `x`.
    /// * `x` - _KxM_ data where _K_ is number of observations and _M_ is number of features.
    pub fn predict(&self, x: &X) -> Result<Y, Failed> {
-        let mut result = Y::zeros(x.shape().0);
-
-        let (n, _) = x.shape();
-
-        for i in 0..n {
-            result.set(i, self.predict_for_row(x, i));
-        }
-
-        Ok(result)
-    }
-
-    pub(crate) fn predict_for_row(&self, x: &X, row: usize) -> TY {
-        let mut result = 0f64;
-        let mut queue: LinkedList<usize> = LinkedList::new();
-
-        queue.push_back(0);
-
-        while !queue.is_empty() {
-            match queue.pop_front() {
-                Some(node_id) => {
-                    let node = &self.nodes()[node_id];
-                    if node.true_child.is_none() && node.false_child.is_none() {
-                        result = node.output;
-                    } else if x.get((row, node.split_feature)).to_f64().unwrap()
-                        <= node.split_value.unwrap_or(std::f64::NAN)
-                    {
-                        queue.push_back(node.true_child.unwrap());
-                    } else {
-                        queue.push_back(node.false_child.unwrap());
-                    }
-                }
-                None => break,
-            };
-        }
-
-        TY::from_f64(result).unwrap()
-    }
-
-    fn find_best_cutoff(
-        &mut self,
-        visitor: &mut NodeVisitor<'_, TX, TY, X, Y>,
-        mtry: usize,
-        rng: &mut impl Rng,
-    ) -> bool {
-        let (_, n_attr) = visitor.x.shape();
-
-        let n: usize = visitor.samples.iter().sum();
-
-        if n < self.parameters().min_samples_split {
-            return false;
-        }
-
-        let sum = self.nodes()[visitor.node].output * n as f64;
-
-        let mut variables = (0..n_attr).collect::<Vec<_>>();
-
-        if mtry < n_attr {
-            variables.shuffle(rng);
-        }
-
-        let parent_gain =
-            n as f64 * self.nodes()[visitor.node].output * self.nodes()[visitor.node].output;
-
-        for variable in variables.iter().take(mtry) {
-            self.find_best_split(visitor, n, sum, parent_gain, *variable);
-        }
-
-        self.nodes()[visitor.node].split_score.is_some()
-    }
-
-    fn find_best_split(
-        &mut self,
-        visitor: &mut NodeVisitor<'_, TX, TY, X, Y>,
-        n: usize,
-        sum: f64,
-        parent_gain: f64,
-        j: usize,
-    ) {
-        let mut true_sum = 0f64;
-        let mut true_count = 0;
-        let mut prevx = Option::None;
-
-        for i in visitor.order[j].iter() {
-            if visitor.samples[*i] > 0 {
-                let x_ij = *visitor.x.get((*i, j));
-
-                if prevx.is_none() || x_ij == prevx.unwrap() {
-                    prevx = Some(x_ij);
-                    true_count += visitor.samples[*i];
-                    true_sum += visitor.samples[*i] as f64 * visitor.y.get(*i).to_f64().unwrap();
-                    continue;
-                }
-
-                let false_count = n - true_count;
-
-                if true_count < self.parameters().min_samples_leaf
-                    || false_count < self.parameters().min_samples_leaf
-                {
-                    prevx = Some(x_ij);
-                    true_count += visitor.samples[*i];
-                    true_sum += visitor.samples[*i] as f64 * visitor.y.get(*i).to_f64().unwrap();
-                    continue;
-                }
-
-                let true_mean = true_sum / true_count as f64;
-                let false_mean = (sum - true_sum) / false_count as f64;
-
-                let gain = (true_count as f64 * true_mean * true_mean
-                    + false_count as f64 * false_mean * false_mean)
-                    - parent_gain;
-
-                if self.nodes()[visitor.node].split_score.is_none()
-                    || gain > self.nodes()[visitor.node].split_score.unwrap()
-                {
-                    self.nodes[visitor.node].split_feature = j;
-                    self.nodes[visitor.node].split_value =
-                        Option::Some((x_ij + prevx.unwrap()).to_f64().unwrap() / 2f64);
-                    self.nodes[visitor.node].split_score = Option::Some(gain);
-
-                    visitor.true_child_output = true_mean;
-                    visitor.false_child_output = false_mean;
-                }
-
-                prevx = Some(x_ij);
-                true_sum += visitor.samples[*i] as f64 * visitor.y.get(*i).to_f64().unwrap();
-                true_count += visitor.samples[*i];
-            }
-        }
-    }
-
-    fn split<'a>(
-        &mut self,
-        mut visitor: NodeVisitor<'a, TX, TY, X, Y>,
-        mtry: usize,
-        visitor_queue: &mut LinkedList<NodeVisitor<'a, TX, TY, X, Y>>,
-        rng: &mut impl Rng,
-    ) -> bool {
-        let (n, _) = visitor.x.shape();
-        let mut tc = 0;
-        let mut fc = 0;
-        let mut true_samples: Vec<usize> = vec![0; n];
-
-        for (i, true_sample) in true_samples.iter_mut().enumerate().take(n) {
-            if visitor.samples[i] > 0 {
-                if visitor
-                    .x
-                    .get((i, self.nodes()[visitor.node].split_feature))
-                    .to_f64()
-                    .unwrap()
-                    <= self.nodes()[visitor.node]
-                        .split_value
-                        .unwrap_or(std::f64::NAN)
-                {
-                    *true_sample = visitor.samples[i];
-                    tc += *true_sample;
-                    visitor.samples[i] = 0;
-                } else {
-                    fc += visitor.samples[i];
-                }
-            }
-        }
-
-        if tc < self.parameters().min_samples_leaf || fc < self.parameters().min_samples_leaf {
-            self.nodes[visitor.node].split_feature = 0;
-            self.nodes[visitor.node].split_value = Option::None;
-            self.nodes[visitor.node].split_score = Option::None;
-
-            return false;
-        }
-
-        let true_child_idx = self.nodes().len();
-
-        self.nodes.push(Node::new(visitor.true_child_output));
-        let false_child_idx = self.nodes().len();
-        self.nodes.push(Node::new(visitor.false_child_output));
-
-        self.nodes[visitor.node].true_child = Some(true_child_idx);
-        self.nodes[visitor.node].false_child = Some(false_child_idx);
-
-        self.depth = u16::max(self.depth, visitor.level + 1);
-
-        let mut true_visitor = NodeVisitor::<TX, TY, X, Y>::new(
-            true_child_idx,
-            true_samples,
-            visitor.order,
-            visitor.x,
-            visitor.y,
-            visitor.level + 1,
-        );
-
-        if self.find_best_cutoff(&mut true_visitor, mtry, rng) {
-            visitor_queue.push_back(true_visitor);
-        }
-
-        let mut false_visitor = NodeVisitor::<TX, TY, X, Y>::new(
-            false_child_idx,
-            visitor.samples,
-            visitor.order,
-            visitor.x,
-            visitor.y,
-            visitor.level + 1,
-        );
-
-        if self.find_best_cutoff(&mut false_visitor, mtry, rng) {
-            visitor_queue.push_back(false_visitor);
-        }
-
-        true
+        self.tree_regressor.as_ref().unwrap().predict(x)
    }
 }

@@ -19,6 +19,7 @@
 //! <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
 //! <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

+pub(crate) mod base_tree_regressor;
 /// Classification tree for dependent variables that take a finite number of unordered values.
 pub mod decision_tree_classifier;
 /// Regression tree for for dependent variables that take continuous or ordered discrete values.
@@ -0,0 +1,16 @@
+//! # XGBoost
+//!
+//! XGBoost, which stands for Extreme Gradient Boosting, is a powerful and efficient implementation of the gradient boosting framework. Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.
+//!
+//! The core idea of boosting is to build the model in a stage-wise fashion. It learns from its mistakes by sequentially adding new models that correct the errors of the previous ones. Unlike bagging, which trains models in parallel, boosting is a sequential process. Each new tree is fit on a modified version of the original data set, specifically focusing on the instances where the previous models performed poorly.
+//!
+//! XGBoost enhances this process through several key innovations. It employs a more regularized model formalization to control over-fitting, which gives it better performance. It also has a highly optimized and parallelized tree construction process, making it significantly faster and more scalable than traditional gradient boosting implementations.
+//!
+//! ## References:
+//!
+//! * "Elements of Statistical Learning", Hastie T., Tibshirani R., Friedman J., 10. Boosting and Additive Trees
+//! *  XGBoost: A Scalable Tree Boosting System, Chen T., Guestrin C.
+
+// xgboost implementation
+pub mod xgb_regressor;
+pub use xgb_regressor::{XGRegressor, XGRegressorParameters};
@@ -0,0 +1,771 @@
+//! # Extreme Gradient Boosting (XGBoost)
+//!
+//! XGBoost is a highly efficient and effective implementation of the gradient boosting framework.
+//! Like other boosting models, it builds an ensemble of sequential decision trees, where each new tree
+//! is trained to correct the errors of the previous ones.
+//!
+//! What makes XGBoost powerful is its use of both the first and second derivatives (gradient and hessian)
+//! of the loss function, which allows for more accurate approximations and faster convergence. It also
+//! includes built-in regularization techniques (L1/`alpha` and L2/`lambda`) to prevent overfitting.
+//!
+//! This implementation was ported to Rust from the concepts and algorithm explained in the blog post
+//! ["XGBoost from Scratch"](https://randomrealizations.com/posts/xgboost-from-scratch/). It is designed
+//! to be a general-purpose regressor that can be used with any objective function that provides a gradient
+//! and a hessian.
+//!
+//! Example:
+//!
+//! ```
+//! use smartcore::linalg::basic::matrix::DenseMatrix;
+//! use smartcore::xgboost::{XGRegressor, XGRegressorParameters};
+//!
+//! // Simple dataset: predict y = 2*x
+//! let x = DenseMatrix::from_2d_array(&[
+//!     &[1.0], &[2.0], &[3.0], &[4.0], &[5.0]
+//! ]).unwrap();
+//! let y = vec![2.0, 4.0, 6.0, 8.0, 10.0];
+//!
+//! // Use default parameters, but set a few for demonstration
+//! let parameters = XGRegressorParameters::default()
+//!     .with_n_estimators(50)
+//!     .with_max_depth(3)
+//!     .with_learning_rate(0.1);
+//!
+//! // Train the model
+//! let model = XGRegressor::fit(&x, &y, parameters).unwrap();
+//!
+//! // Make predictions
+//! let x_test = DenseMatrix::from_2d_array(&[&[6.0], &[7.0]]).unwrap();
+//! let y_hat = model.predict(&x_test).unwrap();
+//!
+//! // y_hat should be close to [12.0, 14.0]
+//! ```
+//!
+
+use rand::{seq::SliceRandom, Rng};
+use std::{iter::zip, marker::PhantomData};
+
+use crate::{
+    api::{PredictorBorrow, SupervisedEstimatorBorrow},
+    error::{Failed, FailedError},
+    linalg::basic::arrays::{Array1, Array2},
+    numbers::basenum::Number,
+    rand_custom::get_rng_impl,
+};
+
+#[cfg(feature = "serde")]
+use serde::{Deserialize, Serialize};
+
+/// Defines the objective function to be optimized.
+/// The objective function provides the loss, gradient (first derivative), and
+/// hessian (second derivative) required for the XGBoost algorithm.
+#[derive(Clone, Debug)]
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+pub enum Objective {
+    /// The objective for regression tasks using Mean Squared Error.
+    /// Loss: 0.5 * (y_true - y_pred)^2
+    MeanSquaredError,
+}
+
+impl Objective {
+    /// Calculates the loss for each sample given the true and predicted values.
+    ///
+    /// # Arguments
+    /// * `y_true` - A vector of the true target values.
+    /// * `y_pred` - A vector of the predicted values.
+    ///
+    /// # Returns
+    /// The mean of the calculated loss values.
+    pub fn loss_function<TY: Number, Y: Array1<TY>>(&self, y_true: &Y, y_pred: &Vec<f64>) -> f64 {
+        match self {
+            Objective::MeanSquaredError => {
+                zip(y_true.iterator(0), y_pred)
+                    .map(|(true_val, pred_val)| {
+                        0.5 * (true_val.to_f64().unwrap() - pred_val).powi(2)
+                    })
+                    .sum::<f64>()
+                    / y_true.shape() as f64
+            }
+        }
+    }
+
+    /// Calculates the gradient (first derivative) of the loss function.
+    ///
+    /// # Arguments
+    /// * `y_true` - A vector of the true target values.
+    /// * `y_pred` - A vector of the predicted values.
+    ///
+    /// # Returns
+    /// A vector of gradients for each sample.
+    pub fn gradient<TY: Number, Y: Array1<TY>>(&self, y_true: &Y, y_pred: &Vec<f64>) -> Vec<f64> {
+        match self {
+            Objective::MeanSquaredError => zip(y_true.iterator(0), y_pred)
+                .map(|(true_val, pred_val)| *pred_val - true_val.to_f64().unwrap())
+                .collect(),
+        }
+    }
+
+    /// Calculates the hessian (second derivative) of the loss function.
+    ///
+    /// # Arguments
+    /// * `y_true` - A vector of the true target values.
+    /// * `y_pred` - A vector of the predicted values.
+    ///
+    /// # Returns
+    /// A vector of hessians for each sample.
+    #[allow(unused_variables)]
+    pub fn hessian<TY: Number, Y: Array1<TY>>(&self, y_true: &Y, y_pred: &[f64]) -> Vec<f64> {
+        match self {
+            Objective::MeanSquaredError => vec![1.0; y_true.shape()],
+        }
+    }
+}
+
+/// Represents a single decision tree in the XGBoost ensemble.
+///
+/// This is a recursive data structure where each `TreeRegressor` is a node
+/// that can have a left and a right child, also of type `TreeRegressor`.
+#[allow(dead_code)]
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug)]
+struct TreeRegressor<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> {
+    left: Option<Box<TreeRegressor<TX, TY, X, Y>>>,
+    right: Option<Box<TreeRegressor<TX, TY, X, Y>>>,
+    /// The output value of this node. If it's a leaf, this is the final prediction.
+    value: f64,
+    /// The feature value threshold used to split this node.
+    threshold: f64,
+    /// The index of the feature used for splitting.
+    split_feature_idx: usize,
+    /// The gain in score achieved by this split.
+    split_score: f64,
+    _phantom_tx: PhantomData<TX>,
+    _phantom_ty: PhantomData<TY>,
+    _phantom_x: PhantomData<X>,
+    _phantom_y: PhantomData<Y>,
+}
+
+impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    TreeRegressor<TX, TY, X, Y>
+{
+    /// Recursively builds a decision tree (a `TreeRegressor` node).
+    ///
+    /// This function determines the optimal split for the given set of samples (`idxs`)
+    /// and then recursively calls itself to build the left and right child nodes.
+    ///
+    /// # Arguments
+    /// * `data` - The full training dataset.
+    /// * `g` - Gradients for all samples.
+    /// * `h` - Hessians for all samples.
+    /// * `idxs` - The indices of the samples belonging to the current node.
+    /// * `max_depth` - The maximum remaining depth for this branch.
+    /// * `min_child_weight` - The minimum sum of hessians required in a child node.
+    /// * `lambda` - L2 regularization term on weights.
+    /// * `gamma` - Minimum loss reduction required to make a further partition.
+    pub fn fit(
+        data: &X,
+        g: &Vec<f64>,
+        h: &Vec<f64>,
+        idxs: &[usize],
+        max_depth: u16,
+        min_child_weight: f64,
+        lambda: f64,
+        gamma: f64,
+    ) -> Self {
+        let g_sum = idxs.iter().map(|&i| g[i]).sum::<f64>();
+        let h_sum = idxs.iter().map(|&i| h[i]).sum::<f64>();
+        let value = -g_sum / (h_sum + lambda);
+
+        let mut best_feature_idx = usize::MAX;
+        let mut best_split_score = 0.0;
+        let mut best_threshold = 0.0;
+        let mut left = Option::None;
+        let mut right = Option::None;
+
+        if max_depth > 0 {
+            Self::insert_child_nodes(
+                data,
+                g,
+                h,
+                idxs,
+                &mut best_feature_idx,
+                &mut best_split_score,
+                &mut best_threshold,
+                &mut left,
+                &mut right,
+                max_depth,
+                min_child_weight,
+                lambda,
+                gamma,
+            );
+        }
+
+        Self {
+            left,
+            right,
+            value,
+            threshold: best_threshold,
+            split_feature_idx: best_feature_idx,
+            split_score: best_split_score,
+            _phantom_tx: PhantomData,
+            _phantom_ty: PhantomData,
+            _phantom_x: PhantomData,
+            _phantom_y: PhantomData,
+        }
+    }
+
+    /// Finds the best split and creates child nodes if a valid split is found.
+    fn insert_child_nodes(
+        data: &X,
+        g: &Vec<f64>,
+        h: &Vec<f64>,
+        idxs: &[usize],
+        best_feature_idx: &mut usize,
+        best_split_score: &mut f64,
+        best_threshold: &mut f64,
+        left: &mut Option<Box<Self>>,
+        right: &mut Option<Box<Self>>,
+        max_depth: u16,
+        min_child_weight: f64,
+        lambda: f64,
+        gamma: f64,
+    ) {
+        let (_, n_features) = data.shape();
+        for i in 0..n_features {
+            Self::find_best_split(
+                data,
+                g,
+                h,
+                idxs,
+                i,
+                best_feature_idx,
+                best_split_score,
+                best_threshold,
+                min_child_weight,
+                lambda,
+                gamma,
+            );
+        }
+
+        // A split is only valid if it results in a positive gain.
+        if *best_split_score > 0.0 {
+            let mut left_idxs = Vec::new();
+            let mut right_idxs = Vec::new();
+            for idx in idxs.iter() {
+                if data.get((*idx, *best_feature_idx)).to_f64().unwrap() <= *best_threshold {
+                    left_idxs.push(*idx);
+                } else {
+                    right_idxs.push(*idx);
+                }
+            }
+
+            *left = Some(Box::new(TreeRegressor::fit(
+                data,
+                g,
+                h,
+                &left_idxs,
+                max_depth - 1,
+                min_child_weight,
+                lambda,
+                gamma,
+            )));
+            *right = Some(Box::new(TreeRegressor::fit(
+                data,
+                g,
+                h,
+                &right_idxs,
+                max_depth - 1,
+                min_child_weight,
+                lambda,
+                gamma,
+            )));
+        }
+    }
+
+    /// Iterates through a single feature to find the best possible split point.
+    fn find_best_split(
+        data: &X,
+        g: &[f64],
+        h: &[f64],
+        idxs: &[usize],
+        feature_idx: usize,
+        best_feature_idx: &mut usize,
+        best_split_score: &mut f64,
+        best_threshold: &mut f64,
+        min_child_weight: f64,
+        lambda: f64,
+        gamma: f64,
+    ) {
+        let mut sorted_idxs = idxs.to_owned();
+        sorted_idxs.sort_by(|a, b| {
+            data.get((*a, feature_idx))
+                .partial_cmp(data.get((*b, feature_idx)))
+                .unwrap()
+        });
+
+        let sum_g = sorted_idxs.iter().map(|&i| g[i]).sum::<f64>();
+        let sum_h = sorted_idxs.iter().map(|&i| h[i]).sum::<f64>();
+
+        let mut sum_g_right = sum_g;
+        let mut sum_h_right = sum_h;
+        let mut sum_g_left = 0.0;
+        let mut sum_h_left = 0.0;
+
+        for i in 0..sorted_idxs.len() - 1 {
+            let idx = sorted_idxs[i];
+            let next_idx = sorted_idxs[i + 1];
+
+            let g_i = g[idx];
+            let h_i = h[idx];
+            let x_i = data.get((idx, feature_idx)).to_f64().unwrap();
+            let x_i_next = data.get((next_idx, feature_idx)).to_f64().unwrap();
+
+            sum_g_left += g_i;
+            sum_h_left += h_i;
+            sum_g_right -= g_i;
+            sum_h_right -= h_i;
+
+            if sum_h_left < min_child_weight || x_i == x_i_next {
+                continue;
+            }
+            if sum_h_right < min_child_weight {
+                break;
+            }
+
+            let gain = 0.5
+                * ((sum_g_left * sum_g_left / (sum_h_left + lambda))
+                    + (sum_g_right * sum_g_right / (sum_h_right + lambda))
+                    - (sum_g * sum_g / (sum_h + lambda)))
+                - gamma;
+
+            if gain > *best_split_score {
+                *best_split_score = gain;
+                *best_threshold = (x_i + x_i_next) / 2.0;
+                *best_feature_idx = feature_idx;
+            }
+        }
+    }
+
+    /// Predicts the output values for a dataset.
+    pub fn predict(&self, data: &X) -> Vec<f64> {
+        let (n_samples, n_features) = data.shape();
+        (0..n_samples)
+            .map(|i| {
+                self.predict_for_row(&Vec::from_iterator(
+                    data.get_row(i).iterator(0).copied(),
+                    n_features,
+                ))
+            })
+            .collect()
+    }
+
+    /// Predicts the output value for a single row of data by traversing the tree.
+    pub fn predict_for_row(&self, row: &Vec<TX>) -> f64 {
+        // A leaf node is identified by having no children.
+        if self.left.is_none() {
+            return self.value;
+        }
+
+        // Recurse down the appropriate branch.
+        let child = if row[self.split_feature_idx].to_f64().unwrap() <= self.threshold {
+            self.left.as_ref().unwrap()
+        } else {
+            self.right.as_ref().unwrap()
+        };
+
+        child.predict_for_row(row)
+    }
+}
+
+/// Parameters for the `jRegressor` model.
+///
+/// This struct holds all the hyperparameters that control the training process.
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Clone, Debug)]
+pub struct XGRegressorParameters {
+    /// The number of boosting rounds or trees to build.
+    pub n_estimators: usize,
+    /// The maximum depth of each tree.
+    pub max_depth: u16,
+    /// Step size shrinkage used to prevent overfitting.
+    pub learning_rate: f64,
+    /// Minimum sum of instance weight (hessian) needed in a child.
+    pub min_child_weight: usize,
+    /// L2 regularization term on weights.
+    pub lambda: f64,
+    /// Minimum loss reduction required to make a further partition on a leaf node.
+    pub gamma: f64,
+    /// The initial prediction score for all instances.
+    pub base_score: f64,
+    /// The fraction of samples to be used for fitting the individual base learners.
+    pub subsample: f64,
+    /// The seed for the random number generator for reproducibility.
+    pub seed: u64,
+    /// The objective function to be optimized.
+    pub objective: Objective,
+}
+
+impl Default for XGRegressorParameters {
+    /// Creates a new set of `XGRegressorParameters` with default values.
+    fn default() -> Self {
+        Self {
+            n_estimators: 100,
+            learning_rate: 0.3,
+            max_depth: 6,
+            min_child_weight: 1,
+            lambda: 1.0,
+            gamma: 0.0,
+            base_score: 0.5,
+            subsample: 1.0,
+            seed: 0,
+            objective: Objective::MeanSquaredError,
+        }
+    }
+}
+
+// Builder pattern for XGRegressorParameters
+impl XGRegressorParameters {
+    /// Sets the number of boosting rounds or trees to build.
+    pub fn with_n_estimators(mut self, n_estimators: usize) -> Self {
+        self.n_estimators = n_estimators;
+        self
+    }
+
+    /// Sets the step size shrinkage used to prevent overfitting.
+    ///
+    /// Also known as `eta`. A smaller value makes the model more robust by preventing
+    /// too much weight being given to any single tree.
+    pub fn with_learning_rate(mut self, learning_rate: f64) -> Self {
+        self.learning_rate = learning_rate;
+        self
+    }
+
+    /// Sets the maximum depth of each individual tree.
+    // A lower value helps prevent overfitting.*
+    pub fn with_max_depth(mut self, max_depth: u16) -> Self {
+        self.max_depth = max_depth;
+        self
+    }
+
+    /// Sets the minimum sum of instance weight (hessian) needed in a child node.
+    ///
+    /// If the tree partition step results in a leaf node with the sum of
+    // instance weight less than `min_child_weight`, then the building process*
+    /// will give up further partitioning.
+    pub fn with_min_child_weight(mut self, min_child_weight: usize) -> Self {
+        self.min_child_weight = min_child_weight;
+        self
+    }
+
+    /// Sets the L2 regularization term on weights (`lambda`).
+    ///
+    /// Increasing this value will make the model more conservative.
+    pub fn with_lambda(mut self, lambda: f64) -> Self {
+        self.lambda = lambda;
+        self
+    }
+
+    /// Sets the minimum loss reduction required to make a further partition on a leaf node.
+    ///
+    /// The larger `gamma` is, the more conservative the algorithm will be.
+    pub fn with_gamma(mut self, gamma: f64) -> Self {
+        self.gamma = gamma;
+        self
+    }
+
+    /// Sets the initial prediction score for all instances.
+    pub fn with_base_score(mut self, base_score: f64) -> Self {
+        self.base_score = base_score;
+        self
+    }
+
+    /// Sets the fraction of samples to be used for fitting individual base learners.
+    ///
+    /// A value of less than 1.0 introduces randomness and helps prevent overfitting.
+    pub fn with_subsample(mut self, subsample: f64) -> Self {
+        self.subsample = subsample;
+        self
+    }
+
+    /// Sets the seed for the random number generator for reproducibility.
+    pub fn with_seed(mut self, seed: u64) -> Self {
+        self.seed = seed;
+        self
+    }
+
+    /// Sets the objective function to be optimized during training.
+    pub fn with_objective(mut self, objective: Objective) -> Self {
+        self.objective = objective;
+        self
+    }
+}
+
+/// An Extreme Gradient Boosting (XGBoost) model for regression and classification tasks.
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[derive(Debug)]
+pub struct XGRegressor<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> {
+    regressors: Option<Vec<TreeRegressor<TX, TY, X, Y>>>,
+    parameters: Option<XGRegressorParameters>,
+    _phantom_ty: PhantomData<TY>,
+    _phantom_tx: PhantomData<TX>,
+    _phantom_y: PhantomData<Y>,
+    _phantom_x: PhantomData<X>,
+}
+
+impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> XGRegressor<TX, TY, X, Y> {
+    /// Fits the XGBoost model to the training data.
+    pub fn fit(data: &X, y: &Y, parameters: XGRegressorParameters) -> Result<Self, Failed> {
+        if parameters.subsample > 1.0 || parameters.subsample <= 0.0 {
+            return Err(Failed::because(
+                FailedError::ParametersError,
+                "Subsample ratio must be in (0, 1].",
+            ));
+        }
+
+        let (n_samples, _) = data.shape();
+        let learning_rate = parameters.learning_rate;
+        let mut predictions = vec![parameters.base_score; n_samples];
+
+        let mut regressors = Vec::new();
+        let mut rng = get_rng_impl(Some(parameters.seed));
+
+        for _ in 0..parameters.n_estimators {
+            let gradients = parameters.objective.gradient(y, &predictions);
+            let hessians = parameters.objective.hessian(y, &predictions);
+
+            let sample_idxs = if parameters.subsample < 1.0 {
+                Self::sample_without_replacement(n_samples, parameters.subsample, &mut rng)
+            } else {
+                (0..n_samples).collect::<Vec<usize>>()
+            };
+
+            let regressor = TreeRegressor::fit(
+                data,
+                &gradients,
+                &hessians,
+                &sample_idxs,
+                parameters.max_depth,
+                parameters.min_child_weight as f64,
+                parameters.lambda,
+                parameters.gamma,
+            );
+
+            let corrections = regressor.predict(data);
+            predictions = zip(predictions, corrections)
+                .map(|(pred, correction)| pred + (learning_rate * correction))
+                .collect();
+
+            regressors.push(regressor);
+        }
+
+        Ok(Self {
+            regressors: Some(regressors),
+            parameters: Some(parameters),
+            _phantom_ty: PhantomData,
+            _phantom_y: PhantomData,
+            _phantom_tx: PhantomData,
+            _phantom_x: PhantomData,
+        })
+    }
+
+    /// Predicts target values for the given input data.
+    pub fn predict(&self, data: &X) -> Result<Vec<TX>, Failed> {
+        let (n_samples, _) = data.shape();
+
+        let parameters = self.parameters.as_ref().unwrap();
+        let mut predictions = vec![parameters.base_score; n_samples];
+        let regressors = self.regressors.as_ref().unwrap();
+
+        for regressor in regressors.iter() {
+            let corrections = regressor.predict(data);
+            predictions = zip(predictions, corrections)
+                .map(|(pred, correction)| pred + (parameters.learning_rate * correction))
+                .collect();
+        }
+
+        Ok(predictions
+            .into_iter()
+            .map(|p| TX::from_f64(p).unwrap())
+            .collect())
+    }
+
+    /// Creates a random sample of indices without replacement.
+    fn sample_without_replacement(
+        population_size: usize,
+        subsample_ratio: f64,
+        rng: &mut impl Rng,
+    ) -> Vec<usize> {
+        let mut indices: Vec<usize> = (0..population_size).collect();
+        indices.shuffle(rng);
+        indices.truncate((population_size as f64 * subsample_ratio) as usize);
+        indices
+    }
+}
+
+// Boilerplate implementation for the smartcore traits
+impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>>
+    SupervisedEstimatorBorrow<'_, X, Y, XGRegressorParameters> for XGRegressor<TX, TY, X, Y>
+{
+    fn new() -> Self {
+        Self {
+            regressors: None,
+            parameters: None,
+            _phantom_ty: PhantomData,
+            _phantom_y: PhantomData,
+            _phantom_tx: PhantomData,
+            _phantom_x: PhantomData,
+        }
+    }
+
+    fn fit(x: &X, y: &Y, parameters: &XGRegressorParameters) -> Result<Self, Failed> {
+        XGRegressor::fit(x, y, parameters.clone())
+    }
+}
+
+impl<TX: Number + PartialOrd, TY: Number, X: Array2<TX>, Y: Array1<TY>> PredictorBorrow<'_, X, TX>
+    for XGRegressor<TX, TY, X, Y>
+{
+    fn predict(&self, x: &X) -> Result<Vec<TX>, Failed> {
+        self.predict(x)
+    }
+}
+
+// ------------------- TESTS -------------------
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::linalg::basic::{arrays::Array, matrix::DenseMatrix};
+
+    /// Tests the gradient and hessian calculations for MeanSquaredError.
+    #[test]
+    fn test_mse_objective() {
+        let objective = Objective::MeanSquaredError;
+        let y_true = vec![1.0, 2.0, 3.0];
+        let y_pred = vec![1.5, 2.5, 2.5];
+
+        let gradients = objective.gradient(&y_true, &y_pred);
+        let hessians = objective.hessian(&y_true, &y_pred);
+
+        // Gradients should be (pred - true)
+        assert_eq!(gradients, vec![0.5, 0.5, -0.5]);
+        // Hessians should be all 1.0 for MSE
+        assert_eq!(hessians, vec![1.0, 1.0, 1.0]);
+    }
+
+    #[test]
+    fn test_find_best_split_multidimensional() {
+        // Data has two features. The second feature is a better predictor.
+        let data = vec![
+            vec![1.0, 10.0], // g = -0.5
+            vec![1.0, 20.0], // g = -1.0
+            vec![1.0, 30.0], // g = 1.0
+            vec![1.0, 40.0], // g = 1.5
+        ];
+        let data = DenseMatrix::from_2d_vec(&data).unwrap();
+        let g = vec![-0.5, -1.0, 1.0, 1.5];
+        let h = vec![1.0, 1.0, 1.0, 1.0];
+        let idxs = (0..4).collect::<Vec<usize>>();
+
+        let mut best_feature_idx = usize::MAX;
+        let mut best_split_score = 0.0;
+        let mut best_threshold = 0.0;
+
+        // Manually calculated expected gain for the best split (on feature 1, with lambda=1.0).
+        // G_left = -1.5, H_left = 2.0
+        // G_right = 2.5, H_right = 2.0
+        // G_total = 1.0, H_total = 4.0
+        // Gain = 0.5 * (G_l^2/(H_l+λ) + G_r^2/(H_r+λ) - G_t^2/(H_t+λ))
+        // Gain = 0.5 * ((-1.5)^2/(2+1) + (2.5)^2/(2+1) - (1.0)^2/(4+1))
+        // Gain = 0.5 * (2.25/3 + 6.25/3 - 1.0/5) = 0.5 * (0.75 + 2.0833 - 0.2) = 1.3166...
+        let expected_gain = 1.3166666666666667;
+
+        // Search both features. The algorithm must find the best split on feature 1.
+        let (_, n_features) = data.shape();
+        for i in 0..n_features {
+            TreeRegressor::<f64, f64, DenseMatrix<f64>, Vec<f64>>::find_best_split(
+                &data,
+                &g,
+                &h,
+                &idxs,
+                i,
+                &mut best_feature_idx,
+                &mut best_split_score,
+                &mut best_threshold,
+                1.0,
+                1.0,
+                0.0,
+            );
+        }
+
+        assert_eq!(best_feature_idx, 1); // Should choose the second feature
+        assert!((best_split_score - expected_gain).abs() < 1e-9);
+        assert_eq!(best_threshold, 25.0); // (20 + 30) / 2
+    }
+
+    /// Tests that the TreeRegressor can build a simple one-level tree on multidimensional data.
+    #[test]
+    fn test_tree_regressor_fit_multidimensional() {
+        let data = vec![
+            vec![1.0, 10.0],
+            vec![1.0, 20.0],
+            vec![1.0, 30.0],
+            vec![1.0, 40.0],
+        ];
+        let data = DenseMatrix::from_2d_vec(&data).unwrap();
+        let g = vec![-0.5, -1.0, 1.0, 1.5];
+        let h = vec![1.0, 1.0, 1.0, 1.0];
+        let idxs = (0..4).collect::<Vec<usize>>();
+
+        let tree = TreeRegressor::<f64, f64, DenseMatrix<f64>, Vec<f64>>::fit(
+            &data, &g, &h, &idxs, 2, 1.0, 1.0, 0.0,
+        );
+
+        // Check that the root node was split on the correct feature
+        assert!(tree.left.is_some());
+        assert!(tree.right.is_some());
+        assert_eq!(tree.split_feature_idx, 1); // Should split on the second feature
+        assert_eq!(tree.threshold, 25.0);
+
+        // Check leaf values (G/H+lambda)
+        // Left leaf: G = -1.5, H = 2.0 => value = -(-1.5)/(2+1) = 0.5
+        // Right leaf: G = 2.5, H = 2.0 => value = -(2.5)/(2+1) = -0.8333
+        assert!((tree.left.unwrap().value - 0.5).abs() < 1e-9);
+        assert!((tree.right.unwrap().value - (-0.833333333)).abs() < 1e-9);
+    }
+
+    /// A "smoke test" to ensure the main XGRegressor can fit and predict on multidimensional data.
+    #[test]
+    fn test_xgregressor_fit_predict_multidimensional() {
+        // Simple 2D data where y is roughly 2*x1 + 3*x2
+        let x_vec = vec![
+            vec![1.0, 1.0],
+            vec![2.0, 1.0],
+            vec![1.0, 2.0],
+            vec![2.0, 2.0],
+        ];
+        let x = DenseMatrix::from_2d_vec(&x_vec).unwrap();
+        let y = vec![5.0, 7.0, 8.0, 10.0];
+
+        let params = XGRegressorParameters::default()
+            .with_n_estimators(10)
+            .with_max_depth(2);
+
+        let fit_result = XGRegressor::fit(&x, &y, params);
+        assert!(
+            fit_result.is_ok(),
+            "Fit failed with error: {:?}",
+            fit_result.err()
+        );
+
+        let model = fit_result.unwrap();
+        let predict_result = model.predict(&x);
+        assert!(
+            predict_result.is_ok(),
+            "Predict failed with error: {:?}",
+            predict_result.err()
+        );
+
+        let predictions = predict_result.unwrap();
+        assert_eq!(predictions.len(), 4);
+    }
+}
Author	SHA1	Message	Date
Konstantin Hirschfeld	f53cb36b9d	allow for sparse predictions CI / tests (map[os:macos target:aarch64-apple-darwin]) (push) Has been cancelled Details CI / tests (map[os:ubuntu target:i686-unknown-linux-gnu]) (push) Has been cancelled Details CI / tests (map[os:ubuntu target:wasm32-unknown-unknown]) (push) Has been cancelled Details CI / tests (map[os:ubuntu target:x86_64-unknown-linux-gnu]) (push) Has been cancelled Details CI / tests (map[os:windows target:i686-pc-windows-msvc]) (push) Has been cancelled Details CI / tests (map[os:windows target:x86_64-pc-windows-msvc]) (push) Has been cancelled Details CI / check_features (, map[os:ubuntu]) (push) Has been cancelled Details CI / check_features (--features datasets, map[os:ubuntu]) (push) Has been cancelled Details CI / check_features (--features serde, map[os:ubuntu]) (push) Has been cancelled Details Coverage / coverage (push) Has been cancelled Details Lint checks / lint (push) Has been cancelled Details	2026-02-09 13:25:50 +01:00
Lorenzo Mec-iS	c57a4370ba	bump version tp 0.4.9	2026-01-09 06:14:44 +00:00
Georeth Chow	78f18505b1	fix LASSO (#346 ) * fix lasso doc typo * fix lasso optimizer bug	2025-12-05 17:49:07 +09:00
Lorenzo	58a8624fa9	v0.4.8 (#345 )	2025-11-29 02:54:35 +00:00
Georeth Chow	18de2aa244	add fit_intercept to LASSO (#344 ) * add fit_intercept to LASSO * lasso: intercept=None if fit_intercept is false * update CHANGELOG.md to reflect lasso changes * lasso: minor	2025-11-29 02:46:14 +00:00
Georeth Chow	2bf5f7a1a5	Fix LASSO (first two of #342 ) (#343 ) * Fix LASSO (#342) * change loss function in doc to match code * allow `n == p` case * lasso add test_full_rank_x --------- Co-authored-by: Zhou Xiaozhou <zxz@jiweifund.com>	2025-11-28 12:15:43 +09:00
Lorenzo	0caa8306ff	Modernise CI toolchain to avoid deprecation (#341 ) * fix cache failing to find Cargo.toml	2025-11-24 02:25:36 +00:00
Lorenzo	2f63148de4	fix CI (#340 ) * fix CI workflow	2025-11-24 02:07:49 +00:00
Lorenzo	f9e473c919	v0.4.7 (#339 )	2025-11-24 01:57:25 +00:00
Charlie Martin	70d8a0f34b	fix precision and recall calculations (#338 ) * fix precision and recall calculations	2025-11-24 01:46:56 +00:00
Charlie Martin	0e42a97514	add serde support for XGRegressor (#337 ) * add serde support for XGBoostRegressor * add traits to dependent structs	2025-11-16 19:31:21 +09:00
Lorenzo	36efd582a5	Fix is_empty method logic in matrix.rs (#336 ) * Fix is_empty method logic in matrix.rs * bump to 0.4.6 * silence some clippy	2025-11-15 05:22:42 +00:00
Lorenzo	70212c71e0	Update Cargo.toml (#333 )	2025-10-09 17:37:02 +01:00
Lorenzo	63f86f7bc9	Add with_top_k to CosineSimilarity (#332 ) * Implement cosine similarity and cosinepair * formatting * fix clippy * Add top k CosinePair * fix distance computation * set min similarity for constant zeros * bump version to 0.4.5	2025-10-09 17:27:54 +01:00
Lorenzo	e633afa520	set min similarity for constant zeros (#331 ) * set min similarity for constant zeros * bump version	2025-10-02 15:41:18 +01:00
Lorenzo	b6e32fb328	Update README.md (#330 )	2025-09-28 16:04:12 +01:00
Lorenzo	948d78a4d0	Create CITATION.cff (#329 )	2025-09-28 15:50:50 +01:00
Lorenzo	448b6f77e3	Update README.md (#328 )	2025-09-28 15:43:46 +01:00
Lorenzo	09be4681cf	Implement cosine similarity and cosinepair (#327 ) * Implement cosine similarity and cosinepair	2025-09-27 11:08:57 +01:00
Daniel Lacina	4841791b7e	implemented extra trees (#320 ) * implemented extra trees * implemented extra trees	2025-07-12 18:37:11 +01:00
Daniel Lacina	9fef05ecc6	refactored random forest regressor into reusable compoennts (#318 )	2025-07-12 15:56:49 +01:00
Daniel Lacina	c5816b0e1b	refactored decision tree into reusable components (#316 ) * refactored decision tree into reusable components * got rid of api code from base tree because its an implementation detail * got rid of api code from base tree because its an implementation detail * changed name	2025-07-12 11:25:53 +01:00
Daniel Lacina	5cc5528367	implemented xgdboost_regression (#314 ) * implemented xgd_regression	2025-07-09 15:25:45 +01:00
Daniel Lacina	d459c48372	implemented single linkage clustering (#313 ) * implemented single linkage clustering --------- Co-authored-by: Lorenzo Mec-iS <tunedconsulting@gmail.com>	2025-07-03 18:05:54 +01:00
Daniel Lacina	730c0d64df	implemented multiclass for svc (#308 ) * implemented multiclass for svc * modified the multiclass svc so it doesnt modify the current api	2025-06-16 11:00:11 +01:00
Lorenzo	44424807a0	Implement SVR and SVR kernels with Enum. Add tests for argsort_mut (#303 ) * Add tests for argsort_mut * Add formatting and cleaning up .github directory * fix clippy error. suggestion to use .contains() * define type explicitly for variable jstack * Implement kernel as enumerator * basic svr and svr_params implementation * Complete enum implementation for Kernels. Implement search grid for SVR. Add documentation. * Fix serde configuration in cargo clippy * Implement search parameters (#304) * Implement SVR kernels as enumerator * basic svr and svr_params implementation * Implement search grid for SVR. Add documentation. * Fix serde configuration in cargo clippy * Fix wasm32 typetag * fix typetag * Bump to version 0.4.2	2025-06-02 11:01:46 +01:00
morenol	76d1ef610d	Update Cargo.toml (#299 ) * Update Cargo.toml * chore: fix clippy * chore: bump actions * chore: fix clippy * chore: update target name --------- Co-authored-by: Luis Moreno <morenol@users.noreply.github.com>	2025-04-24 23:24:29 -04:00
Lorenzo	4092e24c2a	Update README.md	2025-02-04 14:26:53 +00:00
Lorenzo	17dc9f3bbf	Add ordered pairs for FastPair (#252 ) * Add ordered_pairs method to FastPair * add tests to fastpair	2025-01-28 00:48:08 +00:00
Lorenzo	c8ec8fec00	Fix #245 : return error for NaN in naive bayes (#246 ) * Fix #245: return error for NaN in naive bayes * Implement error handling for NaN values in NBayes predict: * general behaviour has been kept unchanged according to original tests in `mod.rs` * aka: error is returned only if all the predicted probabilities are NaN * Add tests * Add test with static values * Add test for numerical stability with numpy	2025-01-27 23:17:55 +00:00
Lorenzo	3da433f757	Implement predict_proba for DecisionTreeClassifier (#287 ) * Implement predict_proba for DecisionTreeClassifier * Some automated fixes suggested by cargo clippy --fix	2025-01-20 18:50:00 +00:00
dependabot[bot]	4523ac73ff	Update itertools requirement from 0.12.0 to 0.13.0 (#280 ) Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-25 11:47:23 -04:00
morenol	ba75f9ffad	chore: fix clippy (#283 ) * chore: fix clippy Co-authored-by: Luis Moreno <morenol@users.noreply.github.com>	2024-11-25 11:34:29 -04:00