Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Approximate Joins for Data-Centric XML

Augsten, Nikolaus; Böhlen, Michael; Dyreson, Curtis; Gamper, Johann (2008). Approximate Joins for Data-Centric XML. In: ICDE 2008: 24th International Conference on 7-12 April, Auckland, New Zealand, 23 August 2008 - 28 August 2008, 814-823.

Abstract

In data integration applications, a join matches elements thatare common to two data sources. Often, however, elements are represented slightly different in each source, so an approximate join must be used. For XML data, most approximate join strategies are based on some ordered tree matching technique. But in data-centric XML the order is irrelevant: two elements should match even if their subelement order varies. In this paper we give a solution for the approximate join of unordered trees. Our solution is based on windowed pq-grams. We develop an efficient technique to systematically generate windowed pq-grams in a three-step process: sorting the unordered tree, extending the sorted tree with dummy nodes, and computing the windowed pq-grams on the extended tree. The windowed pq-gram distance between two sorted trees approximates the tree edit distance between the respective unordered trees. The approximate join algorithm based on windowed pq-grams is implemented as an equality join on strings which avoids the costly computation of the distance between every pair of input trees. Our experiments with synthetic and real world data confirm the analytic results and suggest that our technique is both useful and scalable.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > Software
Physical Sciences > Signal Processing
Physical Sciences > Information Systems
Scope:Discipline-based scholarship (basic research)
Language:English
Event End Date:28 August 2008
Deposited On:14 Mar 2013 11:36
Last Modified:03 Dec 2024 19:58
OA Status:Closed
Publisher DOI:https://doi.org/10.1109/ICDE.2008.4497490
Other Identification Number:merlin-id:2297

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
24 citations in Web of Science®
35 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

1 download since deposited on 14 Mar 2013
0 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications