Abstract
‘Big’ high-dimensional data are commonly analyzed in low-dimensions, after performing a dimensionality reduction step that inherently distorts the data structure. For a similar analysis, clustering methods are also often used. These methods introduce a bias as well, either by starting from the assumption of a particular, often geometric, property of the clusters, or by using iterative schemes to enhance cluster contours, with consequences that are hard to control. The goal of data analysis should, however, be to encode and detect structural data features at all scales and densities simultaneously, without assuming a parametric form of data point distances, or modifying them. Here, we propose a novel approach that directly encodes data point neighborhood similarities as a sparse graph. Our non-iterative framework permits a transparent interpretation of data, without altering the original data dimension and metric. Several natural and synthetic data applications demonstrate the efficacy of our novel method.