K-Means clustering. It's an unsupervised learning algorithm.
Syntax
parameters = kmeansfit(X)
parameters = kmeansfit(X,options)
Inputs
- X
- Training data.
- Type: double
- Dimension: vector | matrix
- options
- Type: struct
-
- n_clusters
- Number of clusters to find (default: 8).
- Type: integer
- Dimension: scalar
- init
- Method for initialization of centroids. 'k-means++' (default): Selects initial cluster centers in a smart way to speedup convergence. 'random': Choose k observations (rows) at random from data for the initial centroids.
- Type: char
- Dimension: string
- n_init
- Number of times the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia (default: 10).
- Type: integer
- Dimension: scalar
- max_iter
- Maximum number of iterations of the k-means algorithm for a single run (default: 300).
- Type: integer
- Dimension: scalar
- tol
- Relative tolerance with regard to inertia to declare convergence (default: 1e-4).
- Type: double
- Dimension: scalar
- random_state
- Determines random number generation for centroid initialization. Set this parameter to make randomness deterministic.
- Type: integer
- Dimension: scalar
- algorithm
- K-means algorithm to use.
- 'full': classical EM-style algorithm
- 'elkan': more efficient variant of classical by using triangle inequality, but currently doesn't support sparse data.
- 'auto' (default): chooses 'elkan' for dense data and 'full' for sparse data.
- Type: char
- Dimension: string
Outputs
- parameters
- Contains all the values passed to kmeansfit method as options. Additionally it has below key-value pairs.
- Type: struct
-
- labels
- Labels of each point.
- Type: double
- Dimension: vector
- inertia
- Sum of squared distances of samples to their closest cluster center.
- Type: double
- Dimension: scalar
- n_iter
- Number of interations run.
- Type: integer
- Dimension: scalar
- n_samples
- Number of rows in the training data.
- Type: integer
- Dimension: scalar
- n_features
- Number of columns in the training data.
- Type: integer
- Dimension: scalar
Example
Usage of kmeansfit with options
rand('seed', 2);
XTrain = rand(14, 5);
XTest = rand(2, 5);
options = struct;
options.n_clusters = 2;
parameters = kmeansfit(XTrain, options);
> parameters
parameters = struct [
algorithm: auto
cluster_centers: [Matrix] 2 x 5
0.25669 0.37129 0.78008 0.28967 0.55561
0.57313 0.57262 0.30554 0.31799 0.40330
init: k-means++
interia: 2.4113899
...
Comments
If the algorithm stops before fully converging (because of tol or max_iter), labels and cluster_centers will not be consistent, i.e. the cluster_centers will not be the means of the points in each cluster. Also, the estimator will reassign labels after the last iteration to make labels consistent with predict on the training set. Output 'parameters' should be passed as input to kmeanspredict function.