A Random Forest regressor. Random Forest is an estimator that fits number of regression decision trees on various sub-samples of the dataset and uses averaging to improve predictive accuracy and control over fitting.
    Syntax
      parameters = rfrfit(X,y)
      parameters = rfrfit(X,y,options)
    Inputs
      
      
        
          - X
- Training data.
- Type: double
- Dimension: vector | matrix
- y
- Target values.
- Type: double
- Dimension: vector | matrix
- options
- Type: struct
- 
            
			  
                - n_estimators
- The number of trees in the forest (default: 100).
- Type: integer
- Dimension: scalar
- criterion
- Function to measure quality of a split. 'mse' (default): mean squared error, 'mae': mean absolute error.
- Type: char
- Dimension: string
- max_depth
- The maximum depth of the tree. If not assigned, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split.
- Type: integer
- Dimension: scalar
- min_samples_split
- The minimum number of samples required to split an internal node (default: 2). If integer, consider it as the minimum number. If float: (min_samples_split * n_samples) is taken as the minimum number of samples for each split.
- Type: double | integer
- Dimension: scalar
- min_samples_leaf
- The minimum number of samples required to be at a leaf node (default: 1). If number of samples are less than min_samples_leaf  at any node, tree is not built further under that node. If integer: consider it as the minimum number. If float: (min_samples_leaf * number of samples) is taken as the minimum number of samples for each node.
- Type: double | integer
- Dimension: scalar
- min_weight_fraction_leaf
- The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node (default: 0).
- Type: double
- Dimension: scalar
- max_features
- The number of features to consider when looking for the best split (default: number of features in training data). If integer: At each split, consider max_features. If float: At each split, consider floor(max_features * n_features).
- Type: double | integer
- Dimension: scalar
- random_state
- Controls the randomness of the model. At each split, features are randomly permuted. random_state is the seed used by the random number generator.
- Type: integer
- Dimension: scalar
- max_leaf_nodes
- Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined by its reduction in impurity. If not assigned, then unlimited number of leaf nodes.
- Type: integer
- Dimension: scalar
- min_impurity_decrease
- A node will be split if this split reduces the impurity >= this value (default: 0).
- Type: double
- Dimension: scalar
- bootstrap
- Whether bootstrap samples are used when building trees (default: true). If false, the whole dataset is used to build each tree.
- Type: Boolean
- Dimension: logical
- oob_score
- Whether to use out-of-bag samples to estimate the R2 on unseen data (default: false).
- Type: Boolean
- Dimension: logical
 
Outputs
      
      
        
          - parameters
- Contains all the values passed to rfrfit method as options. Additionally it has below key-value pairs.
- Type: struct
- 
            
              
                - scorer
- Function handle pointing to r2 function (R2 Coefficient of Determination).
- Type: function handle
- n_samples
- Number of rows in the training data.
- Type: integer
- Dimension: scalar
- n_features
- Number of columns in the training data.
- Type: integer
- Dimension: scalar
- oob_score
- Score of the training dataset obtained using an out-of-bag estimate. It is set only when oob_score = true in options
- Type: double
- Dimension: scalar
- oob_prediction
- Prediction computed with out-of-bag estimate on training set. It is set only when oob_score = true in options.
- Type: double
- Dimension: vector
 
Example
      
      Usage of rfrfit with options
      X = [1 2 3; 4 5 6; 7 8 9; 10 11 12; 13 14 15; 16 17 18; 19 20 21];
y = [1, 2, 3, 4, 5, 6, 7];
options = struct;
options.random_state = 3; options.criterion = 'mae';
parameters = rfrfit(X, y, options)
      parameters = struct [
  bootstrap: 1
  criterion: mae
  min_impurity_decrease: 0
  min_samples_leaf: 1
  min_samples_split: 2
  min_weight_fraction_leaf: 0
  model_name: model_1635312796851985
  n_estimators: 100
  n_features: 3
  n_samples: 7
  oob_score: oob_score not set to true while training
  random_state: 3
  scorer: @r2
] 
  
    
	Comments
      
      The sub-sample size is always the same as original input size but samples are drawn with replacement if bootstrap is set to true (default). If parameters like max_depth, min_samples_leaf are unassigned (default values are chosen), it leads to fully grown, unpruned trees which can be very large on some datasets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always randomly permuted at each split. Even when max_features = number of features in dataset and bootstrap = false, the best found split may vary. random_state has to be fixed to obtain a deterministic behaviour. Output 'parameters' can be passed to rfrpredict function.