UZH-Logo

Maintenance Infos

Lattice histograms: a resilient synopsis structure


Karras, P; Mamoulis, N (2008). Lattice histograms: a resilient synopsis structure. In: 24th International Conference on Data Engineering (ICDE 2008), Cancun, Mexico, 7 April 2008 - 12 April 2008, 247-256.

Abstract

Despite the surge of interest in data reduction techniques over the past years, no method has been proposed to date that can always achieve approximation quality preferable
to that of the optimal plain histogram for a target error metric. In this paper, we introduce the Lattice Histogram: a novel data reduction method that discovers and exploits any arbitrary hierarchy in the data, and achieves approximation quality provably at least as high as an optimal histogram for any data reduction problem. We formulate LH construction techniques with approximation guarantees for general error metrics. We show that the case of minimizing a maximum-error metric can be solved by a specialized, memory-sparing approach; we exploit this solution to design reduced-space heuristics for the generalerror case. We develop a mixed synopsis approach, applicable to the space-efficient high-quality summarization of very large data sets. We experimentally corroborate the superiority of LHs in approximation quality over previous techniques with representative error metrics and diverse data sets.

Abstract

Despite the surge of interest in data reduction techniques over the past years, no method has been proposed to date that can always achieve approximation quality preferable
to that of the optimal plain histogram for a target error metric. In this paper, we introduce the Lattice Histogram: a novel data reduction method that discovers and exploits any arbitrary hierarchy in the data, and achieves approximation quality provably at least as high as an optimal histogram for any data reduction problem. We formulate LH construction techniques with approximation guarantees for general error metrics. We show that the case of minimizing a maximum-error metric can be solved by a specialized, memory-sparing approach; we exploit this solution to design reduced-space heuristics for the generalerror case. We develop a mixed synopsis approach, applicable to the space-efficient high-quality summarization of very large data sets. We experimentally corroborate the superiority of LHs in approximation quality over previous techniques with representative error metrics and diverse data sets.

Altmetrics

Downloads

151 downloads since deposited on 09 Jan 2009
10 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Event End Date:12 April 2008
Deposited On:09 Jan 2009 07:39
Last Modified:05 Apr 2016 12:45
Publisher:IEEE
ISBN:978-1-4244-1836-7
Additional Information:This paper was presented at the 24th International Conference on Data Engineering (ICDE 2008), Cancun, Mexico, April 7 - 12, 2008. © 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Publisher DOI:https://doi.org/10.1109/ICDE.2008.4497433
Official URL:http://www.ifi.uzh.ch/pax/web/index.php/publication/show/id/823
Related URLs:http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4497433
http://www.icde2008.org/

Download

[img]
Preview
Filetype: PDF (Original publication)
Size: 4MB
View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations