Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Category- and selection-enabled nearest neighbor joins

Cafagna, Francesco; Böhlen, Michael Hanspeter; Bracher, Annelies (2017). Category- and selection-enabled nearest neighbor joins. Information Systems, 68:3-16.

Abstract

This paper proposes a category- and selection-enabled nearest neighbor join (NNJ) between relation r and relation s, with similarity on T and support for category attributes C and selection predicate θ. Our solution does not suffer from redundant fetches and index false hits, which are the main performance bottlenecks of current nearest neighbor join techniques.
A category-enabled NNJ leverages the category attributes C for query evaluation. For example, the categories of relation r can be used to limit relation s accessed at most once. Solutions that are not category-enabled must process each category independently and end up fetching, either from disk or memory, the blocks of the input relations multiple times. A selection-enabled NNJ performs well independent of whether the DBMS optimizer pushes the selection down or evaluates it on the fly. In contrast, index-based solutions suffer from many index false hits or end up in an expensive nested loop.
Our solution does not constrain the physical design, and is efficient for row- as well as column-stores. Current solutions for column-stores use late materialization, which is only efficient if the data is clustered on the category attributes C. Our evaluation algorithm finds, for each outer tuple r, the inner tuples that satisfy the equality on the category and have the smallest distance to r with only one scan of both inputs. We experimentally evaluate our solution using a data warehouse that manages analyses of animal feeds.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > Software
Physical Sciences > Information Systems
Physical Sciences > Hardware and Architecture
Scope:Discipline-based scholarship (basic research)
Language:English
Date:August 2017
Deposited On:18 Aug 2017 12:47
Last Modified:16 Mar 2025 02:41
Publisher:Elsevier
ISSN:0306-4379
OA Status:Green
Publisher DOI:https://doi.org/10.1016/j.is.2017.01.006
Other Identification Number:merlin-id:15013
Project Information:
  • Funder: SNSF
  • Grant ID: 200021_135361
  • Project Title: Tameus: Managing Time-Varying Measurement Sets in Databases
Download PDF  'Category- and selection-enabled nearest neighbor joins'.
Preview
  • Content: Accepted Version
  • Licence: Creative Commons: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
2 citations in Web of Science®
3 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

47 downloads since deposited on 18 Aug 2017
8 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications