A Random Forest Regressor. Random Forest is an estimator that fits number of regression decision trees on various subsamples of the dataset and uses averaging to improve predictive accuracy and control over fitting.
Syntax
parameters = rfcfit(X,y)
parameters = rfcfit(X,y,options)
Inputs
 X
 Training data.
 Type: double
 Dimension: vector  matrix
 y
 Target values.
 Type: double
 Dimension: vector  matrix
 options
 Type: struct

 n_estimators
 The number of trees in the forest (default: 100).
 Type: integer
 Dimension: scalar
 criterion
 Function to measure quality of a split. 'gini' for Gini Impurity (default) and 'entropy' for Information Gain.
 Type: char
 Dimension: string
 max_depth
 The maximum depth of the tree. If not assigned, the nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split.
 Type: integer
 Dimension: scalar
 min_samples_split
 The minimum number of samples required to split an internal node (default: 2). If integer, consider it as the minimum number; if float, (min_samples_split * number of samples) is taken as the minimum number of samples for each split.
 Type: double  integer
 Dimension: scalar
 min_samples_leaf
 The minimum number of samples required to be at a leaf node (default: 1). If number of samples are less than min_samples_leaf at any node, tree is not built further under that node. If integer, consider it as the minimum number; if float, (min_samples_leaf * number of samples) is taken as the minimum number of samples for each node.
 Type: double  integer
 Dimension: scalar
 min_weight_fraction_leaf
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node (default: 0).
 Type: double
 Dimension: scalar
 max_features
 The number of features to consider when looking for the best split (default: number of features in training data). If integer: at each split, consider max_features; if float: At each split, consider floor(max_features * n_features).
 Type: double  integer
 Dimension: scalar
 max_leaf_nodes
 Grow trees with max_leaf_nodes in bestfirst fashion. Best nodes are defined by its reduction in impurity. If not assigned, then trees have possible number of leaf nodes.
 Type: integer
 Dimension: scalar
 min_impurity_decrease
 A node will be split if this split reduces the impurity >= this value (default: 0).
 Type: double
 Dimension: scalar
 bootstrap
 Whether bootstrap samples are used when building trees. If false, the whole dataset is used to build each tree (default: true).
 Type: Boolean
 Dimension: logical
 oob_score
 Whether to use outofbag samples to estimate the generalization accuracy (default: false).
 Type: Boolean
 Dimension: logical
 random_state
 Controls the randomness of the model. random_state is the seed used by the random number generator.
 Type: integer
 Dimension: scalar
Outputs
 parameters
 Contains all the values passed to rfcfit method as options. Additionally it has below keyvalue pairs.
 Type: struct

 scorer
 Function handle pointing to 'accuracy' function.
 Type: function handle
 oob_score
 Score of the training dataset obtained using an outofbag estimate. It is set only when oob_score = true in options.
 Type: double
 Dimension: scalar
 classes
 The class labels (single output problem), or a matrix of class labels (multioutput problem).
 Type: double
 Dimension: vector  matrix
 n_samples
 Number of rows in the training data.
 Type: integer
 Dimension: scalar
 n_features
 Number of columns in the training data.
 Type: integer
 Dimension: scalar
Example
Usage of rfcfit
data = dlmread(‘iris.csv', ',', 1);
X = data(:,1:end1);
y = data(:,end);
parameters = rfcfit(X, y, options);
> parameters
parameters = struct [
bootstrap: 1
classes: [Matrix] 1 x 3
0 1 2
criterion: gini
min_impurity_decrease: 0
min_samples_leaf: 1
min_samples_split: 2
min_weight_fraction_leaf: 0
n_estimators: 100
n_features: 4
n_samples: 150
oob_score: oob_score not set to true while training
]
Comments
The subsample size is always the same as original input size but samples are drawn with replacement if bootstrap is set to true (default). If parameters like max_depth, min_samples_leaf are unassigned (default values are chosen), it leads to fully grown, unpruned trees which can be very large on some datasets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always randomly permuted at each split. Even when max_features = number of features in dataset and bootstrap = false, the best found split may vary. random_state has to be fixed to obtain a deterministic behaviour.
Output 'parameters' should be passed as input to rfcpredict function.