Header

UZH-Logo

Maintenance Infos

(Psycho-)analysis of benchmark experiments : a formal framework for investigating the relationship between data sets and learning algorithms


Eugster, Manuel J A; Leisch, Friedrich; Strobl, Carolin (2014). (Psycho-)analysis of benchmark experiments : a formal framework for investigating the relationship between data sets and learning algorithms. Computational Statistics & Data Analysis, 71:986-1000.

Abstract

It is common knowledge that the performance of different learning algorithms depends on certain characteristics of the data-such as dimensionality, linear separability or sample size. However, formally investigating this relationship in an objective and reproducible way is not trivial. A new formal framework for describing the relationship between data set characteristics and the performance of different learning algorithms is proposed. The framework combines the advantages of benchmark experiments with the formal description of data set characteristics by means of statistical and information-theoretic measures and with the recursive partitioning of Bradley-Terry models for comparing the algorithms' performances. The formal aspects of each component are introduced and illustrated by means of an artificial example. Its real-world usage is demonstrated with an application example consisting of thirteen widely-used data sets and six common learning algorithms. The Appendix provides information on the implementation and the usage of the framework within the R language.

Abstract

It is common knowledge that the performance of different learning algorithms depends on certain characteristics of the data-such as dimensionality, linear separability or sample size. However, formally investigating this relationship in an objective and reproducible way is not trivial. A new formal framework for describing the relationship between data set characteristics and the performance of different learning algorithms is proposed. The framework combines the advantages of benchmark experiments with the formal description of data set characteristics by means of statistical and information-theoretic measures and with the recursive partitioning of Bradley-Terry models for comparing the algorithms' performances. The formal aspects of each component are introduced and illustrated by means of an artificial example. Its real-world usage is demonstrated with an application example consisting of thirteen widely-used data sets and six common learning algorithms. The Appendix provides information on the implementation and the usage of the framework within the R language.

Statistics

Citations

Dimensions.ai Metrics
4 citations in Web of Science®
4 citations in Scopus®
8 citations in Microsoft Academic
Google Scholar™

Altmetrics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Psychology
Dewey Decimal Classification:150 Psychology
Language:English
Date:2014
Deposited On:14 Jan 2015 13:19
Last Modified:14 Feb 2018 22:44
Publisher:Elsevier
ISSN:0167-9473
OA Status:Closed
Publisher DOI:https://doi.org/10.1016/j.csda.2013.08.007

Download

Full text not available from this repository.
View at publisher

Get full-text in a library