A decision tree regressor.
Syntax
parameters = dtrfit(X,y)
parameters = dtrfit(X,y,options)
Inputs
- X
- Training data.
- Type: double
- Dimension: vector | matrix
- y
- Target values.
- Type: double
- Dimension: vector | matrix
- options
- Type: struct
-
- criterion
- Function to measure quality of a split. 'mse' (default): mean squared error, 'friedman_mse': mean squared error with Friedman's improvement score for potential splits, 'mae': mean absolute error.
- Type: char
- Dimension: string
- splitter
- Strategy used to choose the split at each node. 'best' (default): chooses best split, 'random': chooses random split.
- Type: char
- Dimension: string
- max_depth
- The maximum depth of the tree. If not assigned, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split.
- Type: integer
- Dimension: scalar
- min_samples_split
- The minimum number of samples required to split an internal node (default: 2). If integer, consider it as the minimum number. If float: (min_samples_split * n_samples) is taken as the minimum number of samples for each split.
- Type: double | integer
- Dimension: scalar
- min_samples_leaf
- The minimum number of samples required to be at a leaf node (default: 1). If number of samples are less than min_samples_leaf at any node, tree is not built further under that node. If integer: consider it as the minimum number. If float: (min_samples_leaf * number of samples) is taken as the minimum number of samples for each node.
- Type: double | integer
- Dimension: scalar
- min_weight_fraction_leaf
- The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node (default: 0).
- Type: double
- Dimension: scalar
- max_features
- The number of features to consider when looking for the best split (default: number of features in training data). If integer: At each split, consider max_features. If float: At each split, consider floor(max_features * n_features).
- Type: double | integer
- Dimension: scalar
- random_state
- Controls the randomness of the model. At each split, features are randomly permuted. random_state is the seed used by the random number generator.
- Type: integer
- Dimension: scalar
- max_leaf_nodes
- Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined by its reduction in impurity. If not assigned, then unlimited number of leaf nodes.
- Type: integer
- Dimension: scalar
- min_impurity_decrease
- A node will be split if this split reduces the impurity >= this value (default: 0).
- Type: double
- Dimension: scalar
Outputs
- parameters
- Contains all the values passed to dtrfit method as options. Additionally it has below key-value pairs.
- Type: struct
-
- scorer
- Function handle pointing to r2 function (R2 Coefficient of Determination).
- Type: function handle
- n_samples
- Number of rows in the training data.
- Type: integer
- Dimension: scalar
- n_features
- Number of columns in the training data.
- Type: integer
- Dimension: scalar
Example
Usage of dtrfit with options
X = [1 2 3; 4 5 6; 7 8 9; 10 11 12; 13 14 15; 16 17 18; 19 20 21];
y = [1, 2, 3, 4, 5, 6, 7];
options = struct;
options.random_state = 3; options.criterion = 'mae';
parameters = dtrfit(X, y, options);
parameters = struct [
criterion: mae
min_impurity_decrease: 0
min_samples_leaf: 1
min_samples_split: 2
min_weight_fraction_leaf: 0
model_name: model_1635312674663063
n_features: 3
n_samples: 7
random_state: 3
scorer: @r2
splitter: best
]
Comments
It performs regression by constructing a Decision Tree. Once the tree construction is over, it can be used for prediction using dtrpredict method. Output 'parameters' can be passed to dtrpredict function.