Header

UZH-Logo

Maintenance Infos

(Psycho-)analysis of benchmark experiments : a formal framework for investigating the relationship between data sets and learning algorithms - Zurich Open Repository and Archive


Eugster, Manuel J A; Leisch, Friedrich; Strobl, Carolin (2014). (Psycho-)analysis of benchmark experiments : a formal framework for investigating the relationship between data sets and learning algorithms. Computational Statistics & Data Analysis, 71:986-1000.

Abstract

It is common knowledge that the performance of different learning algorithms depends on certain characteristics of the data-such as dimensionality, linear separability or sample size. However, formally investigating this relationship in an objective and reproducible way is not trivial. A new formal framework for describing the relationship between data set characteristics and the performance of different learning algorithms is proposed. The framework combines the advantages of benchmark experiments with the formal description of data set characteristics by means of statistical and information-theoretic measures and with the recursive partitioning of Bradley-Terry models for comparing the algorithms' performances. The formal aspects of each component are introduced and illustrated by means of an artificial example. Its real-world usage is demonstrated with an application example consisting of thirteen widely-used data sets and six common learning algorithms. The Appendix provides information on the implementation and the usage of the framework within the R language.

Abstract

It is common knowledge that the performance of different learning algorithms depends on certain characteristics of the data-such as dimensionality, linear separability or sample size. However, formally investigating this relationship in an objective and reproducible way is not trivial. A new formal framework for describing the relationship between data set characteristics and the performance of different learning algorithms is proposed. The framework combines the advantages of benchmark experiments with the formal description of data set characteristics by means of statistical and information-theoretic measures and with the recursive partitioning of Bradley-Terry models for comparing the algorithms' performances. The formal aspects of each component are introduced and illustrated by means of an artificial example. Its real-world usage is demonstrated with an application example consisting of thirteen widely-used data sets and six common learning algorithms. The Appendix provides information on the implementation and the usage of the framework within the R language.

Citations

4 citations in Web of Science®
4 citations in Scopus®
Google Scholar™

Altmetrics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Psychology
Dewey Decimal Classification:150 Psychology
Language:English
Date:2014
Deposited On:14 Jan 2015 13:19
Last Modified:05 Apr 2016 18:50
Publisher:Elsevier
ISSN:0167-9473
Publisher DOI:https://doi.org/10.1016/j.csda.2013.08.007

Download

Full text not available from this repository.
View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations