VSURF: An R Package for Variable Selection Using Random Forests - Université Côte d'Azur Accéder directement au contenu
Article Dans Une Revue The R Journal Année : 2015

VSURF: An R Package for Variable Selection Using Random Forests

Résumé

This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.
Fichier principal
Vignette du fichier
genuer-poggi-tuleaumalot.pdf (403.18 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-01251924 , version 1 (20-09-2017)

Licence

Paternité

Identifiants

  • HAL Id : hal-01251924 , version 1

Citer

Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot. VSURF: An R Package for Variable Selection Using Random Forests. The R Journal, 2015, 7 (2), pp.19-33. ⟨hal-01251924⟩
11412 Consultations
3682 Téléchargements

Partager

Gmail Facebook X LinkedIn More