Recovery of Missing Values using Matrix Decomposition Techniques

Khayati, Mourad. Recovery of Missing Values using Matrix Decomposition Techniques. 2015, University of Zurich, Faculty of Economics.

Abstract

Time series data is prominent in many real world applications, e.g., hydrology or finance stock market. In many of these applications, time series data is missing in blocks, i.e., multiple consecutive values are missing. For example, in the hydrology field around 20% of the data is missing in blocks. However, many time series analysis tasks, such as prediction, require the existence of complete data. The recovery of blocks of missing values in time series is challenging if the missing block is a peak or a valley. The problem is more challenging in real world time series because of the irregularity in the data. The state-of-the-art recovery techniques are suitable either for the recovery of single missing values or for the recovery of blocks of missing values in regular time series. The goal of this thesis is to propose an accurate recovery of blocks of missing values in irregular time series. The recovery solution we propose is based on matrix decomposition techniques. The main idea of the recovery is to represent correlated time series as columns of an input matrix where missing values have been initialized and iteratively apply matrix decomposition technique to refine the initialized missing values. A key property of our recovery solution is that it learns the shape, the width and the amplitude of the missing blocks from the history of the time series that contains the missing blocks and the history of its correlated time series. Our experiments on real world hydrological time series show that our approach outperforms the state-of-the-art recovery techniques for the recovery of missing blocks in irregular time series. The recovery solution is implemented as a graphical tool that displays, browses and accurately recovers missing blocks in irregular time series. The proposed approach supports learning from highly and lowly correlated time series. This is important since lowly correlated time series, e.g., shifted time series, that exhibit shape and/or trend similarities are beneficial for the recovery process. We reduce the space complexity of the proposed solution from quadratic to linear. This allows to use time series with long histories without prior segmentation. We prove the scalability and the correctness of the solution.

Abstract

Time series data is prominent in many real world applications, e.g., hydrology or finance stock market. In many of these applications, time series data is missing in blocks, i.e., multiple consecutive values are missing. For example, in the hydrology field around 20% of the data is missing in blocks. However, many time series analysis tasks, such as prediction, require the existence of complete data. The recovery of blocks of missing values in time series is challenging if the missing block is a peak or a valley. The problem is more challenging in real world time series because of the irregularity in the data. The state-of-the-art recovery techniques are suitable either for the recovery of single missing values or for the recovery of blocks of missing values in regular time series. The goal of this thesis is to propose an accurate recovery of blocks of missing values in irregular time series. The recovery solution we propose is based on matrix decomposition techniques. The main idea of the recovery is to represent correlated time series as columns of an input matrix where missing values have been initialized and iteratively apply matrix decomposition technique to refine the initialized missing values. A key property of our recovery solution is that it learns the shape, the width and the amplitude of the missing blocks from the history of the time series that contains the missing blocks and the history of its correlated time series. Our experiments on real world hydrological time series show that our approach outperforms the state-of-the-art recovery techniques for the recovery of missing blocks in irregular time series. The recovery solution is implemented as a graphical tool that displays, browses and accurately recovers missing blocks in irregular time series. The proposed approach supports learning from highly and lowly correlated time series. This is important since lowly correlated time series, e.g., shifted time series, that exhibit shape and/or trend similarities are beneficial for the recovery process. We reduce the space complexity of the proposed solution from quadratic to linear. This allows to use time series with long histories without prior segmentation. We prove the scalability and the correctness of the solution.