A convenient wrapper for ranger that completes its output by providing the Moran's I of the residuals for different distance thresholds, the rmse and nrmse (as computed by `root_mean_squared_error()`

), and variable importance scores based on a scaled version of the data generated by scale.

rf( data = NULL, dependent.variable.name = NULL, predictor.variable.names = NULL, distance.matrix = NULL, distance.thresholds = NULL, xy = NULL, ranger.arguments = NULL, scaled.importance = FALSE, seed = 1, verbose = TRUE, n.cores = parallel::detectCores() - 1, cluster = NULL )

data | Data frame with a response variable and a set of predictors. Default: |
---|---|

dependent.variable.name | Character string with the name of the response variable. Must be in the column names of |

predictor.variable.names | Character vector with the names of the predictive variables. Every element of this vector must be in the column names of |

distance.matrix | Squared matrix with the distances among the records in |

distance.thresholds | Numeric vector with neighborhood distances. All distances in the distance matrix below each value in |

xy | (optional) Data frame or matrix with two columns containing coordinates and named "x" and "y". It is not used by this function, but it is stored in the slot |

ranger.arguments | Named list with ranger arguments (other arguments of this function can also go here). All ranger arguments are set to their default values except for 'importance', that is set to 'permutation' rather than 'none'. The ranger arguments |

scaled.importance | Logical, if |

seed | Integer, random seed to facilitate reproducibility. If set to a given number, the returned model is always the same. Default: |

verbose | Boolean. If TRUE, messages and plots generated during the execution of the function are displayed. Default: |

n.cores | Integer, number of cores to use. Default: |

cluster | A cluster definition generated with |

A ranger model with several extra slots:

`ranger.arguments`

: Stores the values of the arguments used to fit the ranger model.`importance`

: A list containing a data frame with the predictors ordered by their importance, a ggplot showing the importance values, and local importance scores (difference in accuracy between permuted and non permuted variables for every case, computed on the out-of-bag data).`performance`

: performance scores: R squared on out-of-bag data, R squared (cor(observed, predicted) ^ 2), pseudo R squared (cor(observed, predicted)), RMSE, and normalized RMSE (NRMSE).`residuals`

: residuals, normality test of the residuals computed with`residuals_test()`

, and spatial autocorrelation of the residuals computed with`moran_multithreshold()`

.

Please read the help file of ranger for further details. Notice that the `formula`

interface of ranger is supported through `ranger.arguments`

, but variable interactions are not allowed (but check `the_feature_engineer()`

).

if(interactive()){ #loading example data data("plant_richness_df") data("distance_matrix") #fittind random forest model out <- rf( data = plant_richness_df, dependent.variable.name = "richness_species_vascular", predictor.variable.names = colnames(plant_richness_df)[5:21], distance.matrix = distance_matrix, distance.thresholds = 0, n.cores = 1 ) class(out) #data frame with ordered variable importance out$importance$per.variable #variable importance plot out$importance$per.variable.plot #performance out$performance #spatial correlation of the residuals out$spatial.correlation.residuals$per.distance #plot of the Moran's I of the residuals for different distance thresholds out$spatial.correlation.residuals$plot #predictions for new data as done with ranger models: predicted <- stats::predict( object = out, data = plant_richness_df, type = "response" )$predictions #alternative data input methods ############################### #ranger.arguments can contain ranger arguments and any other rf argument my.ranger.arguments <- list( data = plant_richness_df, dependent.variable.name = "richness_species_vascular", predictor.variable.names = colnames(plant_richness_df)[8:21], distance.matrix = distance_matrix, distance.thresholds = c(0, 1000) ) #fitting model with these ranger arguments out <- rf( ranger.arguments = my.ranger.arguments, n.cores = 1 ) }