Header

UZH-Logo

Maintenance Infos

Eviction strategies for semantic flow processing


Nguyen, M K; Scharrenbach, Thomas; Bernstein, Abraham (2013). Eviction strategies for semantic flow processing. In: SSWS 2013 - Scalable Semantic Web Knowledge Base Systems 2013, Sydney, Australia, 21 October 2013. CEUR-WS, 66-80.

Abstract

In order to cope with the ever-increasing data volume continuous processing of incoming data via Semantic Flow Processing systems have been proposed. These systems allow to answer queries on streams of RDF triples. To achieve this goal they match (triple) patterns against the incoming stream and generate/update variable bindings. Yet, given the continuous nature of the stream the number of bindings can explode and exceed memory; in particular when computing aggregates. To make the information processing practical Semantic Flow Processing systems, therefore, typically limit the considered data to a (moving) window. Whilst this technique is simple it may not be able to nd patterns spread further than the window or may still cause memory overruns when data is highly bursty. In this paper we propose to maintain bindings (and thus memory) not on recency (i.e., a window) but on the likelihood of contributing to a complete match. We propose to base the decision on the matching likelihood and not creation time (fo) or at random. Furthermore we propose to drop variable bindings instead of data as do load shedding approaches. Specically, we systematically investigate deterministic and the matching-likelihood based probabilistic eviction strategy for dropping variable bindings in terms of recall. We find that a matching likelihood based eviction can outperform fo and random eviction strategies on synthetic as well as real world data.

Abstract

In order to cope with the ever-increasing data volume continuous processing of incoming data via Semantic Flow Processing systems have been proposed. These systems allow to answer queries on streams of RDF triples. To achieve this goal they match (triple) patterns against the incoming stream and generate/update variable bindings. Yet, given the continuous nature of the stream the number of bindings can explode and exceed memory; in particular when computing aggregates. To make the information processing practical Semantic Flow Processing systems, therefore, typically limit the considered data to a (moving) window. Whilst this technique is simple it may not be able to nd patterns spread further than the window or may still cause memory overruns when data is highly bursty. In this paper we propose to maintain bindings (and thus memory) not on recency (i.e., a window) but on the likelihood of contributing to a complete match. We propose to base the decision on the matching likelihood and not creation time (fo) or at random. Furthermore we propose to drop variable bindings instead of data as do load shedding approaches. Specically, we systematically investigate deterministic and the matching-likelihood based probabilistic eviction strategy for dropping variable bindings in terms of recall. We find that a matching likelihood based eviction can outperform fo and random eviction strategies on synthetic as well as real world data.

Statistics

Citations

Downloads

26 downloads since deposited on 17 Oct 2013
7 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > General Computer Science
Language:English
Event End Date:21 October 2013
Deposited On:17 Oct 2013 06:13
Last Modified:22 Mar 2022 08:41
Publisher:CEUR-WS
Series Name:CEUR Workshop Proceedings
Number:1046
ISSN:1613-0073
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://ceur-ws.org/Vol-1046/SSWS2013_paper6.pdf
Other Identification Number:merlin-id:8472