kmeansfit

Syntax

parameters = kmeansfit(X)

parameters = kmeansfit(X,options)

Inputs

X

Training data.

Type: double

Dimension: vector | matrix

options

Type: struct

n_clusters: Number of clusters to find (default: 8).; Type: integer; Dimension: scalar
init: Method for initialization of centroids. 'k-means++' (default): Selects initial cluster centers in a smart way to speedup convergence. 'random': Choose k observations (rows) at random from data for the initial centroids.; Type: char; Dimension: string
n_init: Number of times the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia (default: 10).; Type: integer; Dimension: scalar
max_iter: Maximum number of iterations of the k-means algorithm for a single run (default: 300).; Type: integer; Dimension: scalar
tol: Relative tolerance with regard to inertia to declare convergence (default: 1e-4).; Type: double; Dimension: scalar
random_state: Determines random number generation for centroid initialization. Set this parameter to make randomness deterministic.; Type: integer; Dimension: scalar
algorithm: K-means algorithm to use.; 'full': classical EM-style algorithm; 'elkan': more efficient variant of classical by using triangle inequality, but currently doesn't support sparse data.; 'auto' (default): chooses 'elkan' for dense data and 'full' for sparse data.; Type: char; Dimension: string

Outputs

parameters

Contains all the values passed to kmeansfit method as options. Additionally it has below key-value pairs.

Type: struct

labels: Labels of each point.; Type: double; Dimension: vector
inertia: Sum of squared distances of samples to their closest cluster center.; Type: double; Dimension: scalar
n_iter: Number of interations run.; Type: integer; Dimension: scalar
n_samples: Number of rows in the training data.; Type: integer; Dimension: scalar
n_features: Number of columns in the training data.; Type: integer; Dimension: scalar

Example

Usage of kmeansfit with options

rand('seed', 2);
XTrain = rand(14, 5);
XTest 	= rand(2, 5);

options = struct;
options.n_clusters = 2; 
parameters = kmeansfit(XTrain, options);

> parameters
parameters = struct [
  algorithm: auto
  cluster_centers: [Matrix] 2 x 5
  0.25669  0.37129  0.78008  0.28967  0.55561
  0.57313  0.57262  0.30554  0.31799  0.40330
  init: k-means++
  interia: 2.4113899
  ...

Comments

If the algorithm stops before fully converging (because of tol or max_iter), labels and cluster_centers will not be consistent, i.e. the cluster_centers will not be the means of the points in each cluster. Also, the estimator will reassign labels after the last iteration to make labels consistent with predict on the training set. Output 'parameters' should be passed as input to kmeanspredict function.