UZH-Logo

Maintenance Infos

Random-walk triplerush: asynchronous graph querying and sampling


Stutz, Philip; Paudel, Bibek; Verman, Mihaela; Bernstein, Abraham (2015). Random-walk triplerush: asynchronous graph querying and sampling. In: 24th International World Wide Web Conference (WWW 2015), Florence, Italy, 18 May 2015 - 22 May 2015, 1034-1044.

Abstract

Most Semantic Web applications rely on querying graphs, typically by using SPARQL with a triple store. Increasingly, applications also analyze properties of the graph structure to compute statistical inferences. The current Semantic Web infrastructure, however, does not efficiently support such operations. Hence, developers have to painstakingly retrieve the relevant data for statistical post-processing.

In this paper we propose to rethink query execution in a triple store as a highly parallelized asynchronous graph ex- ploration on an active index data structure. This approach also allows to integrate SPARQL-querying with the sam- pling of graph properties.

To evaluate this architecture we implemented Random Walk TripleRush, which is built on a distributed graph processing system and operates by routing query and path descriptions through a novel active index data structure. In experiments we find that our architecture can be used to build a competitive distributed graph store. It can often return first results quickly, thanks to its asynchronous architecture. We show that our architecture supports the execution of various types of random walks with restarts that sample interesting graph properties. We also evaluate the scalability and show that the architecture supports fast answer times even on a dataset with more than a billion triples.

Most Semantic Web applications rely on querying graphs, typically by using SPARQL with a triple store. Increasingly, applications also analyze properties of the graph structure to compute statistical inferences. The current Semantic Web infrastructure, however, does not efficiently support such operations. Hence, developers have to painstakingly retrieve the relevant data for statistical post-processing.

In this paper we propose to rethink query execution in a triple store as a highly parallelized asynchronous graph ex- ploration on an active index data structure. This approach also allows to integrate SPARQL-querying with the sam- pling of graph properties.

To evaluate this architecture we implemented Random Walk TripleRush, which is built on a distributed graph processing system and operates by routing query and path descriptions through a novel active index data structure. In experiments we find that our architecture can be used to build a competitive distributed graph store. It can often return first results quickly, thanks to its asynchronous architecture. We show that our architecture supports the execution of various types of random walks with restarts that sample interesting graph properties. We also evaluate the scalability and show that the architecture supports fast answer times even on a dataset with more than a billion triples.

Citations

Altmetrics

Downloads

267 downloads since deposited on 07 Jul 2015
195 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Event End Date:22 May 2015
Deposited On:07 Jul 2015 12:18
Last Modified:05 Apr 2016 19:17
Publisher:International World Wide Web Conferences Steering Committee Republic and Canton of Geneva
ISBN:978-1-4503-3469-3
Publisher DOI:https://doi.org/10.1145/2736277.2741687
Related URLs:http://dl.acm.org/citation.cfm?id=2741687 (Publisher)
Other Identification Number:merlin-id:11663
Permanent URL: https://doi.org/10.5167/uzh-111243

Download

[img]
Preview
Content: Published Version
Filetype: PDF
Size: 822kB
View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations