UZH-Logo

Maintenance Infos

Scalable Graph Processing With SIGNAL/COLLECT


Stutz, P. Scalable Graph Processing With SIGNAL/COLLECT. 2015, University of Zurich, Faculty of Economics.

Abstract

Our ability to process large amounts of data and the size and number of data sets are growing at an incredible pace. This development presents us with the opportunity to build systems that perform complex analyses of increasingly dense networks of data. These opportunities include computing recommendations, analysing social networks, finding patterns in transaction networks, scheduling tasks, or inferencing probabilistic models. Many of these tasks involve processing data that has a natural graph representation.

Whilst the opportunities are there in the form of access to processing resources and data sets, the way we write software has largely not caught up. Many use MapReduce for scalable processing, but this abstraction has shortcomings with regard to processing graph structured data, especially with iterative and asynchronous processing.

This thesis introduces the SIGNAL/COLLECT programming model and framework for efficient parallel and distributed large-scale graph processing. We show that this abstraction captures the essence of many algorithms on graphs in a concise and elegant way. Beyond that, we also show implementations of two complex systems built on SIGNAL/COLLECT: The first system is TripleRush, a distributed in-memory triple store with a novel architecture. The second system is foxPSL, a distributed proba- bilistic inferencing system. Our evaluations show that the SIGNAL/COLLECT framework can efficiently execute simple graph algorithms such as PageRank and that the two complex systems also have competitive performance relative to the respective state-of-the-art.

For this reason we believe that SIGNAL/COLLECT is more generally suitable for designing scalable dynamic and complex systems that process large networks of data.

Our ability to process large amounts of data and the size and number of data sets are growing at an incredible pace. This development presents us with the opportunity to build systems that perform complex analyses of increasingly dense networks of data. These opportunities include computing recommendations, analysing social networks, finding patterns in transaction networks, scheduling tasks, or inferencing probabilistic models. Many of these tasks involve processing data that has a natural graph representation.

Whilst the opportunities are there in the form of access to processing resources and data sets, the way we write software has largely not caught up. Many use MapReduce for scalable processing, but this abstraction has shortcomings with regard to processing graph structured data, especially with iterative and asynchronous processing.

This thesis introduces the SIGNAL/COLLECT programming model and framework for efficient parallel and distributed large-scale graph processing. We show that this abstraction captures the essence of many algorithms on graphs in a concise and elegant way. Beyond that, we also show implementations of two complex systems built on SIGNAL/COLLECT: The first system is TripleRush, a distributed in-memory triple store with a novel architecture. The second system is foxPSL, a distributed proba- bilistic inferencing system. Our evaluations show that the SIGNAL/COLLECT framework can efficiently execute simple graph algorithms such as PageRank and that the two complex systems also have competitive performance relative to the respective state-of-the-art.

For this reason we believe that SIGNAL/COLLECT is more generally suitable for designing scalable dynamic and complex systems that process large networks of data.

Additional indexing

Item Type:Dissertation
Referees:Bernstein Abraham
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Date:2015
Deposited On:15 Jan 2016 07:11
Last Modified:05 Apr 2016 19:55
Number of Pages:125
Other Identification Number:merlin-id:12957

Download

Full text not available from this repository.

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations