Header

UZH-Logo

Maintenance Infos

Disjoint interval partitioning


Cafagna, Francesco; Böhlen, Michael Hanspeter (2017). Disjoint interval partitioning. VLDB Journal, 26(3):447-466.

Abstract

In databases with time interval attributes, query processing techniques that are based on sort-merge or sort-aggregate deteriorate. This happens because for intervals no total order exists and either the start or end point is used for the sorting. Doing so leads to inefficient solutions with lots of unproductive comparisons that do not produce an output tuple. Even if just one tuple with a long interval is present in the data, the number of unproductive comparisons of sort-merge and sort-aggregate gets quadratic. In this paper we propose disjoint interval partitioning (\(\mathcal {DIP}\)), a technique to efficiently perform sort-based operators on interval data. \(\mathcal {DIP}\) divides an input relation into the minimum number of partitions, such that all tuples in a partition are non-overlapping. The absence of overlapping tuples guarantees efficient sort-merge computations without backtracking. With \(\mathcal {DIP}\) the number of unproductive comparisons is linear in the number of partitions. In contrast to current solutions with inefficient random accesses to the active tuples, \(\mathcal {DIP}\) fetches the tuples in a partition sequentially. We illustrate the generality and efficiency of \(\mathcal {DIP}\) by describing and evaluating three basic database operators over interval data: join, anti-join and aggregation.

Abstract

In databases with time interval attributes, query processing techniques that are based on sort-merge or sort-aggregate deteriorate. This happens because for intervals no total order exists and either the start or end point is used for the sorting. Doing so leads to inefficient solutions with lots of unproductive comparisons that do not produce an output tuple. Even if just one tuple with a long interval is present in the data, the number of unproductive comparisons of sort-merge and sort-aggregate gets quadratic. In this paper we propose disjoint interval partitioning (\(\mathcal {DIP}\)), a technique to efficiently perform sort-based operators on interval data. \(\mathcal {DIP}\) divides an input relation into the minimum number of partitions, such that all tuples in a partition are non-overlapping. The absence of overlapping tuples guarantees efficient sort-merge computations without backtracking. With \(\mathcal {DIP}\) the number of unproductive comparisons is linear in the number of partitions. In contrast to current solutions with inefficient random accesses to the active tuples, \(\mathcal {DIP}\) fetches the tuples in a partition sequentially. We illustrate the generality and efficiency of \(\mathcal {DIP}\) by describing and evaluating three basic database operators over interval data: join, anti-join and aggregation.

Statistics

Altmetrics

Downloads

12 downloads since deposited on 06 Jun 2017
12 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Date:22 February 2017
Deposited On:06 Jun 2017 10:37
Last Modified:23 Nov 2017 18:54
Publisher:Springer
ISSN:1066-8888
Publisher DOI:https://doi.org/10.1007/s00778-017-0456-7
Other Identification Number:merlin-id:14867

Download

Download PDF  'Disjoint interval partitioning'.
Preview
Filetype: PDF
Size: 627kB
View at publisher