Header

UZH-Logo

Maintenance Infos

Conditional permutation importance revisited


Debeer, Dries; Strobl, Carolin (2020). Conditional permutation importance revisited. BMC Bioinformatics, 21(1):307.

Abstract

Background: Random forest based variable importance measures have become popular tools for assessing the contributions of the predictor variables in a fitted random forest. In this article we reconsider a frequently used variable importance measure, the Conditional Permutation Importance (CPI). We argue and illustrate that the CPI corresponds to a more partial quantification of variable importance and suggest several improvements in its methodology and implementation that enhance its practical value. In addition, we introduce the threshold value in the CPI algorithm as a parameter that can make the CPI more partial or more marginal.
Results: By means of extensive simulations, where the original version of the CPI is used as the reference, we examine the impact of the proposed methodological improvements. The simulation results show how the improved CPI methodology increases the interpretability and stability of the computations. In addition, the newly proposed implementation decreases the computation times drastically and is more widely applicable. The improved CPI algorithm is made freely available as an add-on package to the open-source software R.
Conclusion: The proposed methodology and implementation of the CPI is computationally faster and leads to more stable results. It has a beneficial impact on practical research by making random forest analyses more interpretable.

Abstract

Background: Random forest based variable importance measures have become popular tools for assessing the contributions of the predictor variables in a fitted random forest. In this article we reconsider a frequently used variable importance measure, the Conditional Permutation Importance (CPI). We argue and illustrate that the CPI corresponds to a more partial quantification of variable importance and suggest several improvements in its methodology and implementation that enhance its practical value. In addition, we introduce the threshold value in the CPI algorithm as a parameter that can make the CPI more partial or more marginal.
Results: By means of extensive simulations, where the original version of the CPI is used as the reference, we examine the impact of the proposed methodological improvements. The simulation results show how the improved CPI methodology increases the interpretability and stability of the computations. In addition, the newly proposed implementation decreases the computation times drastically and is more widely applicable. The improved CPI algorithm is made freely available as an add-on package to the open-source software R.
Conclusion: The proposed methodology and implementation of the CPI is computationally faster and leads to more stable results. It has a beneficial impact on practical research by making random forest analyses more interpretable.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

4 downloads since deposited on 03 Nov 2020
4 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Psychology
08 Research Priority Programs > Digital Society Initiative
Dewey Decimal Classification:150 Psychology
Scopus Subject Areas:Life Sciences > Structural Biology
Life Sciences > Biochemistry
Life Sciences > Molecular Biology
Physical Sciences > Computer Science Applications
Physical Sciences > Applied Mathematics
Language:English
Date:14 July 2020
Deposited On:03 Nov 2020 16:02
Last Modified:28 Feb 2021 08:02
Publisher:BioMed Central
ISSN:1471-2105
OA Status:Gold
Free access at:PubMed ID. An embargo period may apply.
Publisher DOI:https://doi.org/10.1186/s12859-020-03622-2
PubMed ID:32664864

Download

Gold Open Access