UZH-Logo

Maintenance Infos

Data provenance: A Cctegorization of existing approaches


Glavic, B; Dittrich, K R (2007). Data provenance: A Cctegorization of existing approaches. In: 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" , Aachen, Germany, 7 March 2007 - 9 March 2007, 227-241.

Abstract

In many application areas like e-science and data-warehousing detailed
information about the origin of data is required. This kind of information is often referred
to as data provenance or data lineage. The provenance of a data item includes
information about the processes and source data items that lead to its creation and
current representation. The diversity of data representation models and application
domains has lead to a number of more or less formal definitions of provenance. Most
of them are limited to a special application domain, data representation model or data
processing facility. Not surprisingly, the associated implementations are also restricted
to some application domain and depend on a special data model. In this paper we give
a survey of data provenance models and prototypes, present a general categorization
scheme for provenance models and use this categorization scheme to study the properties
of the existing approaches. This categorization enables us to distinguish between
different kinds of provenance information and could lead to a better understanding of
provenance in general. Besides the categorization of provenance types, it is important
to include the storage, transformation and query requirements for the different kinds of
provenance information and application domains in our considerations. The analysis
of existing approaches will assist us in revealing open research problems in the area of
data provenance.

In many application areas like e-science and data-warehousing detailed
information about the origin of data is required. This kind of information is often referred
to as data provenance or data lineage. The provenance of a data item includes
information about the processes and source data items that lead to its creation and
current representation. The diversity of data representation models and application
domains has lead to a number of more or less formal definitions of provenance. Most
of them are limited to a special application domain, data representation model or data
processing facility. Not surprisingly, the associated implementations are also restricted
to some application domain and depend on a special data model. In this paper we give
a survey of data provenance models and prototypes, present a general categorization
scheme for provenance models and use this categorization scheme to study the properties
of the existing approaches. This categorization enables us to distinguish between
different kinds of provenance information and could lead to a better understanding of
provenance in general. Besides the categorization of provenance types, it is important
to include the storage, transformation and query requirements for the different kinds of
provenance information and application domains in our considerations. The analysis
of existing approaches will assist us in revealing open research problems in the area of
data provenance.

Citations

Altmetrics

Downloads

213 downloads since deposited on 16 Dec 2009
70 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, further contribution
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Uncontrolled Keywords:provenance, survey
Language:English
Event End Date:9 March 2007
Deposited On:16 Dec 2009 08:59
Last Modified:05 Apr 2016 13:34
Publisher:Gesellschaft für Informatik (GI)
Series Name:GI-Edition - Lecture notes in informatics (LNI). Proceedings
Number:103
ISBN:978-3-88579-197-3
Additional Information:12. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web 5. bis 9. März 2007 – Aachen
Free access at:Official URL. An embargo period may apply.
Official URL:http://www.btw2007.de/paper/p227.pdf
Related URLs:http://opac.nebis.ch/F/?local_base=NEBIS&con_lng=GER&func=find-b&find_code=SYS&request=005515364
http://www.gi.de/service/publikationen/lni/gi-edition-lecture-notes-in-informatics-lni-p-103.html
Permanent URL: https://doi.org/10.5167/uzh-24450

Download

[img]
Preview
Filetype: PDF
Size: 1MB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations