Principal Component Analysis.
Syntax
parameters = pcafit(X)
parameters = pcafit(X,options)
Inputs
- X
- Training data.
- Type: double
- Dimension: vector | matrix
- options
- Type: struct
-
- n_components
- Number of components to keep. If n_components is not set, all components are kept: min(n_samples, n_features).
- If n_components is between 0 and 1 (exclusive) and svd_solver = 'full', number of components is selected such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.
- If svd_solver = 'arpack', the number of components must be strictly less than the min(n_samples, n_features). So n_components will be min(n_samples, n_features) – 1.
- Type: integer
- Dimension: scalar
- svd_solver
- 'auto' (default): solver is selected by a default policy based on dimension of X and n_components. If the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient 'randomized' method is enabled. Otherwise exact full SVD is computed and optionally truncated afterwards.
- 'full': runs exact full SVD calling the standard LAPACK solver and selects the components by postprocessing.
- 'arpack': runs SVD truncated to n_components calling ARPACK solver. It requires strictly that n_components is between 0 and min(n_samples, n_features).
- 'randomized': runs randomized SVD by the method of Halko et al.
- Type: char
- Dimension: string
- tol
- Tolerance for singular values computed by svd_solver = 'arpack' (default: 0).
- Type: double
- Dimension: scalar
- iterated_power
- Number of iterations for the power method computed by svd_solver = 'randomized'. Must be of range [0, infinity). If not set, it is computed automatically.
- Type: integer
- Dimension: scalar
- random_state
- Determines random number generation for svd_solver = 'arpack' or 'randomized'.
- Type: integer
- Dimension: scalar
Outputs
- parameters
- Contains all the values passed to pcafit method as options. Additionally it has below key-value pairs.
- Type: struct
-
- components
- Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance.
- Type: double
- Dimension: matrix
- explained_variance
- Amount of variance explained by each of the selected components. It is equal to n_components largest eigenvalues of the covariance matrix of X.
- Type: double
- Dimension: vector
- explained_variance_ratio
- Percentage of variance explained by each of the selected components. If n_components is not set, then all components are stored and the sum of the ratios is equal to 1.0.
- Type: double
- Dimension: vector
- singular_values
- The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the n_components variable in the lower-dimensional space.
- Type: double
- Dimension: vector
- mean
- Per feature empirical mean, estimated from the training set.
- Type: double
- Dimension: vector
- n_components
- Estimated number of components when, n_components is set between 0 and 1 while fitting with svd_solver = 'full'.
- Type: integer
- Dimension: scalar
Example
Usage of pcafit with options
X = [-1, -1; -2, -1; -3, -2; 1, 1; 2, 1; 3, 2];
options = struct;
options.n_components = 2;
parameters = pcafit(X, options);
> parameters
parameters = struct [
components: [Matrix] 2 x 2
-0.83849 -0.54491
0.54491 -0.83849
explained_variance: [Matrix] 1 x 2
7.93954 0.06046
explained_variance_ratio: [Matrix] 1 x 2
0.99244 0.00756
]
Comments
Output 'parameters' should be passed as input to pcatransform function.