Most Semantic Web applications rely on querying graphs, typically by using SPARQL with a triple store. Increasingly, applications also analyze properties of the graph structure to compute statistical inferences. The current Semantic Web infrastructure, however, does not efficiently support such operations. Hence, developers have to painstakingly retrieve the relevant data for statistical post-processing.
In this paper we propose to rethink query execution in a triple store as a highly parallelized asynchronous graph exploration on an active index data structure. This approach also allows to integrate SPARQL-querying with the sampling of graph properties.
To evaluate this architecture we implemented Random Walk TripleRush, which is built on a distributed graph processing system and operates by routing query and path descriptions through a novel active index data structure. In experiments we find that our architecture can be used to build a competitive distributed graph store. It can often return first results quickly, thanks to its asynchronous architecture. We show that our architecture supports the execution of various types of random walks with restarts that sample interesting graph properties. We also evaluate the scalability and show that the architecture supports fast answer times even on a dataset with more than a billion triples.