Header

UZH-Logo

Maintenance Infos

Multi-dimensional histograms with tight bounds for the error


Baltrunas, Linas; Mazeika, Arturas; Böhlen, Michael H (2006). Multi-dimensional histograms with tight bounds for the error. In: IDEAS 2006, Delhi, India, 11 December 2006 - 14 December 2006, 105-112.

Abstract

Histograms are being used as non-parametric selectivity estimators for one-dimensional data. For highdimensional data it is common to either compute onedimensional histograms for each attribute or to compute a multi-dimensional equi-width histogram for a set of attributes. This either yields small low-quality or large highquality histograms.In this paper we introduce HIRED (High-dimensional histograms with dimensionality REDuction): small highquality histograms for multi-dimensional data. HIRED histograms are adaptive, and they are based on the shape error and directional splits. The shape error permits a precise control of the estimation error of the histogram and, together with directional splits, yields a memory complexity that does not depend on the number of uniform attributes in the dataset. We provide extensive experimental results with synthetic and real world datasets. The experiments confirm that our method is as precise as state-of-the-art techniques and uses orders of magnitude less memory.

Abstract

Histograms are being used as non-parametric selectivity estimators for one-dimensional data. For highdimensional data it is common to either compute onedimensional histograms for each attribute or to compute a multi-dimensional equi-width histogram for a set of attributes. This either yields small low-quality or large highquality histograms.In this paper we introduce HIRED (High-dimensional histograms with dimensionality REDuction): small highquality histograms for multi-dimensional data. HIRED histograms are adaptive, and they are based on the shape error and directional splits. The shape error permits a precise control of the estimation error of the histogram and, together with directional splits, yields a memory complexity that does not depend on the number of uniform attributes in the dataset. We provide extensive experimental results with synthetic and real world datasets. The experiments confirm that our method is as precise as state-of-the-art techniques and uses orders of magnitude less memory.

Statistics

Citations

3 citations in Web of Science®
3 citations in Scopus®
Google Scholar™

Altmetrics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Event End Date:14 December 2006
Deposited On:29 May 2012 09:25
Last Modified:12 Aug 2017 15:49
Publisher:IEEE
ISBN:0-7695-2577-6
Publisher DOI:https://doi.org/10.1109/IDEAS.2006.31
Related URLs:http://opac.nebis.ch/F/?local_base=NEBIS&CON_LNG=GER&func=find-b&find_code=SYS&request=005442657
Other Identification Number:merlin-id:6160

Download

Full text not available from this repository.
View at publisher