Header

UZH-Logo

Maintenance Infos

Missing value imputation in time series using Top-K case matching


Wellenzohn, Kevin; Mitterer, Hannes; Gamper, Johann; Böhlen, Michael Hanspeter; Khayati, Mourad (2014). Missing value imputation in time series using Top-K case matching. In: 26th GI-Workshop Grundlagen von Datenbanken, Bozen-Bolzano, Italy, 21 October 2014 - 24 October 2014, 77-82.

Abstract

In this paper, we present a simple yet effective algorithm, called the Top-k Case Matching algorithm, for the imputation of miss- ing values in streams of time series data that are similar to each other. The key idea of the algorithm is to look for the k situations in the historical data that are most similar to the current situation and to derive the missing value from the measured values at these k time points. To efficiently identify the top-k most similar historical situations, we adopt Fagin’s Threshold Algorithm, yielding an al- gorithm with sub-linear runtime complexity with high probability, and linear complexity in the worst case (excluding the initial sort- ing of the data, which is done only once). We provide the results of a first experimental evaluation using real-world meteorological data. Our algorithm achieves a high accuracy and is more accurate and efficient than two more complex state of the art solutions.

Abstract

In this paper, we present a simple yet effective algorithm, called the Top-k Case Matching algorithm, for the imputation of miss- ing values in streams of time series data that are similar to each other. The key idea of the algorithm is to look for the k situations in the historical data that are most similar to the current situation and to derive the missing value from the measured values at these k time points. To efficiently identify the top-k most similar historical situations, we adopt Fagin’s Threshold Algorithm, yielding an al- gorithm with sub-linear runtime complexity with high probability, and linear complexity in the worst case (excluding the initial sort- ing of the data, which is done only once). We provide the results of a first experimental evaluation using real-world meteorological data. Our algorithm achieves a high accuracy and is more accurate and efficient than two more complex state of the art solutions.

Statistics

Citations

Downloads

9 downloads since deposited on 22 Jan 2015
7 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Event End Date:24 October 2014
Deposited On:22 Jan 2015 15:39
Last Modified:15 Aug 2017 23:16
Publisher:CEUR-WS
Series Name:CEUR Workshop Proceedings
ISSN:1613-0073
Official URL:http://ceur-ws.org/Vol-1313/paper_14.pdf
Related URLs:http://ceur-ws.org/Vol-1313/
Other Identification Number:merlin-id:11586

Download

Preview Icon on Download
Preview
Filetype: PDF
Size: 295kB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations