Header

UZH-Logo

Maintenance Infos

Efficient Processing and Reasoning of Semantic Streams


Gao, Shen. Efficient Processing and Reasoning of Semantic Streams. 2018, University of Zurich, Faculty of Economics.

Abstract

The digitalization of our society creates a large number of data streams, such as stock tickers, tweets, and sensor data. Making use of these streams has tremendous values. In the Semantic Web context, live information is queried from the streams in real-time. Knowledge is discovered by integrating streams with data from heterogeneous sources. Moreover, insights hidden in the streams are inferred and extracted by logical reasoning.
Handling large and complex streams in real-time challenges the capabilities of current systems. Therefore, this thesis studies how to improve the efficiency of processing and reasoning over semantic streams. It is composed of three projects that deal with different research problems motivated by real-world use cases. We propose new methods to address these problems and implement systems to test our hypotheses based on real datasets.
The first project focuses on the problem that sudden increases in the input stream rate overload the system, causing a reduced or unacceptable performance. We propose an eviction technique that, when a spike in the input data rate happens, discards data from the system to ensure the response latency at the cost of a lower recall. The novelty of our solution lies in a data-aware approach that carefully prioritizes the data and evicts the less important ones to achieve a high result recall.
The second project studies complex queries that need to integrate streams with remote and external background data (BGD). Accessing remote BGD is a very expensive process in terms of both latency and financial cost. We propose several methods to minimize the cost by exploiting the query and the data patterns. Our system only needs to retrieve data that are more critical to answer the query and avoids wasting resources on the remaining data in BGD.
Lastly, as noise is inevitable in real-world semantic streams, the third project inves- tigates how to use logical reasoning to identify and exclude the noise from high-volume streams. We adopt a distributed stream processing engine (DSPE) to achieve scalability. On top of a DSPE, we optimize the reasoning procedures by balancing the costs of com- putation and communication. Therefore, reasoning tasks are compiled into efficient DSPE workflows that can be deployed across large-scale computing clusters.

Abstract

The digitalization of our society creates a large number of data streams, such as stock tickers, tweets, and sensor data. Making use of these streams has tremendous values. In the Semantic Web context, live information is queried from the streams in real-time. Knowledge is discovered by integrating streams with data from heterogeneous sources. Moreover, insights hidden in the streams are inferred and extracted by logical reasoning.
Handling large and complex streams in real-time challenges the capabilities of current systems. Therefore, this thesis studies how to improve the efficiency of processing and reasoning over semantic streams. It is composed of three projects that deal with different research problems motivated by real-world use cases. We propose new methods to address these problems and implement systems to test our hypotheses based on real datasets.
The first project focuses on the problem that sudden increases in the input stream rate overload the system, causing a reduced or unacceptable performance. We propose an eviction technique that, when a spike in the input data rate happens, discards data from the system to ensure the response latency at the cost of a lower recall. The novelty of our solution lies in a data-aware approach that carefully prioritizes the data and evicts the less important ones to achieve a high result recall.
The second project studies complex queries that need to integrate streams with remote and external background data (BGD). Accessing remote BGD is a very expensive process in terms of both latency and financial cost. We propose several methods to minimize the cost by exploiting the query and the data patterns. Our system only needs to retrieve data that are more critical to answer the query and avoids wasting resources on the remaining data in BGD.
Lastly, as noise is inevitable in real-world semantic streams, the third project inves- tigates how to use logical reasoning to identify and exclude the noise from high-volume streams. We adopt a distributed stream processing engine (DSPE) to achieve scalability. On top of a DSPE, we optimize the reasoning procedures by balancing the costs of com- putation and communication. Therefore, reasoning tasks are compiled into efficient DSPE workflows that can be deployed across large-scale computing clusters.

Statistics

Downloads

48 downloads since deposited on 25 Jan 2019
37 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Dissertation (monographical)
Referees:Bernstein Abraham
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Date:2018
Deposited On:25 Jan 2019 11:45
Last Modified:25 Sep 2019 00:09
Number of Pages:93
OA Status:Green
Other Identification Number:merlin-id:16449

Download

Green Open Access

Download PDF  'Efficient Processing and Reasoning of Semantic Streams'.
Preview
Content: Published Version
Filetype: PDF
Size: 2MB