Header

UZH-Logo

Maintenance Infos

Scalable linked data stream processing via network-aware workload scheduling


Fischer, Lorenz; Scharrenbach, Thomas; Bernstein, Abraham (2013). Scalable linked data stream processing via network-aware workload scheduling. In: 9th International Workshop on Scalable Semantic Web Knowledge Base Systems, Sydney, Australia, 21 October 2013 - 21 October 2013.

Abstract

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy — a uniform distribution of computation load among available machines — typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of intermachine communication. In this paper we propose a graph-partitioning based approach for workload scheduling within stream processing systems. We implemented a distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets. We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.

Abstract

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy — a uniform distribution of computation load among available machines — typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of intermachine communication. In this paper we propose a graph-partitioning based approach for workload scheduling within stream processing systems. We implemented a distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets. We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.

Statistics

Citations

Downloads

251 downloads since deposited on 17 Oct 2013
22 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Event End Date:21 October 2013
Deposited On:17 Oct 2013 06:14
Last Modified:06 Aug 2017 08:30
Other Identification Number:merlin-id:8337

Download

Download PDF  'Scalable linked data stream processing via network-aware workload scheduling'.
Preview
Filetype: PDF
Size: 728kB