dtcfit

Syntax

parameters = dtcfit(X,y)

parameters = dtcfit(X,y,options)

Inputs

X

Training data.

Type: double

Dimension: vector | matrix

y

Target values.

Type: double

Dimension: vector | matrix

options

Type: struct

criterion: Function to measure quality of a split. 'gini' for Gini Impurity (default) or 'entropy' for Information Gain.; Type: char; Dimension: string
splitter: Strategy used to choose the split at each node. 'best' to choose best split (default) or 'random' to choose random split.; Type: char; Dimension: string
max_depth: The maximum depth of the tree. If not assigned, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split.; Type: integer; Dimension: scalar
max_samples_split: The minimum number of samples required to split an internal node (default: 2). If integer, consider it as the minimum number; if double, (min_samples_split * n_samples) is taken as the minimum number of samples for each split.; Type: double | integer; Dimension: scalar
min_samples_leaf: The minimum number of samples required to be at a leaf node (default: 1). If number of samples are less than min_samples_leaf at any node, tree is not built further under that node. If integer, consider it as the minimum number; if double, (min_samples_leaf * number of samples) is taken as the minimum number of samples for each node.; Type: double | integer; Dimension: scalar
min_weight_fraction_leaf: The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node (default: 0).; Type: double; Dimension: scalar
max_features: The number of features to consider when looking for the best split (default: number of features in training data). If integer, at each split, consider max_features; if double, at each split, consider floor(max_features * n_features); Type: double | integer; Dimension: scalar
random_state: Controls the randomness of the model. At each split, features are randomly permuted. random_state is the seed used by the random number generator.; Type: integer; Dimension: scalar
max_leaf_nodes: Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined by its reduction in impurity. If not assigned, then unlimited number of leaf nodes.; Type: integer; Dimension: scalar
min_impurity_decrease: A node will be split if this split reduces the impurity >= this value (default: 0).; Type: double; Dimension: scalar

Outputs

parameters

contains all the values passed to dtcfit method as options. Additionally it has below key-value pairs.

Type: struct

scorer: Function handle pointing to 'accuracy' function.; Type: function handle
classes: The class labels (single output problem), or a matrix of class labels (multi-output problem).; Type: double; Dimension: vector | matrix
max_features: The inferred value of max_features.; Type: integer; Dimension: scalar
feature_importances: Feature importances. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as Gini Importance.; Type: double; Dimension: n_features
n_samples: Number of rows in the training data.; Type: integer; Dimension: scalar
n_features: Number of columns in the training data.; Type: integer; Dimension: scalar

Example

Usage of dtcfit without options

data = dlmread('iris.csv', ',', 1);
X = data(:,1:end-1);
y = data(:,end);

parameters = dtcfit(X, y);

> parameters
parameters = struct [
  classes: [Matrix] 1 x 3
  0  1  2
  criterion: gini
  feature_importances: [Matrix] 1 x 4
  0.00000  0.01333  0.06406  0.92261
  max_features: 4
  min_impurity_decrease: 0
  min_samples_leaf: 1
  min_samples_split: 2
  min_weight_fraction_leaf: 0
  n_features: 4
  n_samples: 150
  splitter: best
]

Comments

It performs classification by constructing a Decision Tree. Once the tree construction is over, it can be used for prediction using dtcpredict function.