Header

UZH-Logo

Maintenance Infos

On-disk storage techniques for semantic web data - are B-trees always the optimal solution?


Weiss, C; Bernstein, A (2009). On-disk storage techniques for semantic web data - are B-trees always the optimal solution? In: 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, Washington D.C., USA, October 2009.

Abstract

Since its introduction in 1971, the B-tree has become the dominant index structure in database systems.
Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results.
The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored.

In this paper we show that in the case of triple-stores the usage of B-trees is actually highly detrimental to query performance.
Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets.

Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage.
The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments.

As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases.

Abstract

Since its introduction in 1971, the B-tree has become the dominant index structure in database systems.
Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results.
The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored.

In this paper we show that in the case of triple-stores the usage of B-trees is actually highly detrimental to query performance.
Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets.

Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage.
The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments.

As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases.

Statistics

Citations

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > General Computer Science
Language:English
Event End Date:October 2009
Deposited On:04 Feb 2010 11:53
Last Modified:07 Mar 2022 08:07
OA Status:Closed
Full text not available from this repository.