Header

UZH-Logo

Maintenance Infos

Scalable recovery of missing blocks in time series with high and low cross-correlations


Khayati, Mourad; Cudré-Mauroux, Philippe; Böhlen, Michael Hanspeter (2020). Scalable recovery of missing blocks in time series with high and low cross-correlations. Knowledge and Information Systems (KAIS), 62(6):2257-2280.

Abstract

Missing values are very common in real-world data including time-series data. Failures in power, communication or storage can leave occasional blocks of data missing in multiple series, affecting not only real-time monitoring but also compromising the quality of data analysis. Traditional recovery (imputation) techniques often leverage the correlation across time series to recover missing blocks in multiple series. These recovery techniques, however, assume high correlation and fall short in recovering missing blocks when the series exhibit variations in correlation. In this paper, we introduce a novel approach called CDRec to recover large missing blocks in time series with high and low correlations. CDRec relies on the centroid decomposition (CD) technique to recover multiple time series at a time. We also propose and analyze a new algorithm called Incremental Scalable Sign Vector to efficiently compute CD in long time series. We empirically evaluate the accuracy and the efficiency of our recovery technique on several real-world datasets that represent a broad range of applications. The results show that our recovery is orders of magnitude faster than the most accurate algorithm while producing superior results in terms of recovery.

Abstract

Missing values are very common in real-world data including time-series data. Failures in power, communication or storage can leave occasional blocks of data missing in multiple series, affecting not only real-time monitoring but also compromising the quality of data analysis. Traditional recovery (imputation) techniques often leverage the correlation across time series to recover missing blocks in multiple series. These recovery techniques, however, assume high correlation and fall short in recovering missing blocks when the series exhibit variations in correlation. In this paper, we introduce a novel approach called CDRec to recover large missing blocks in time series with high and low correlations. CDRec relies on the centroid decomposition (CD) technique to recover multiple time series at a time. We also propose and analyze a new algorithm called Incremental Scalable Sign Vector to efficiently compute CD in long time series. We empirically evaluate the accuracy and the efficiency of our recovery technique on several real-world datasets that represent a broad range of applications. The results show that our recovery is orders of magnitude faster than the most accurate algorithm while producing superior results in terms of recovery.

Statistics

Citations

Dimensions.ai Metrics
6 citations in Web of Science®
8 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

140 downloads since deposited on 01 Mar 2021
23 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > Software
Physical Sciences > Information Systems
Physical Sciences > Human-Computer Interaction
Physical Sciences > Hardware and Architecture
Physical Sciences > Artificial Intelligence
Scope:Discipline-based scholarship (basic research)
Language:English
Date:2020
Deposited On:01 Mar 2021 09:15
Last Modified:25 May 2024 01:49
Publisher:Springer
ISSN:0219-3116
OA Status:Hybrid
Publisher DOI:https://doi.org/10.1007/s10115-019-01421-7
Related URLs:https://link.springer.com/article/10.1007/s10115-019-01421-7
Other Identification Number:merlin-id:20735
  • Content: Accepted Version