Header

UZH-Logo

Maintenance Infos

Weighted distance functions improve analysis of high-dimensional data: application to molecular dynamics simulations


Blöchliger, Nicolas; Caflisch, Amedeo; Vitalis, Andreas (2015). Weighted distance functions improve analysis of high-dimensional data: application to molecular dynamics simulations. Journal of Chemical Theory and Computation, 11(11):5481-5492.

Abstract

Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propose to globally or locally weight simulation features based on effective rates. This emphasizes, in a data-driven manner, slow degrees of freedom that often report on the metastable states sampled by the molecular system. We couple this idea to several unsupervised learning protocols. Our approach unmasks slow side chain dynamics within the native state of a miniprotein and reveals additional metastable conformations of a protein. The approach can be combined with most algorithms for clustering or dimensionality reduction.

Abstract

Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propose to globally or locally weight simulation features based on effective rates. This emphasizes, in a data-driven manner, slow degrees of freedom that often report on the metastable states sampled by the molecular system. We couple this idea to several unsupervised learning protocols. Our approach unmasks slow side chain dynamics within the native state of a miniprotein and reveals additional metastable conformations of a protein. The approach can be combined with most algorithms for clustering or dimensionality reduction.

Statistics

Citations

2 citations in Web of Science®
2 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

1 download since deposited on 12 Jan 2016
0 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > Department of Biochemistry
07 Faculty of Science > Department of Biochemistry
Dewey Decimal Classification:570 Life sciences; biology
610 Medicine & health
Language:English
Date:10 November 2015
Deposited On:12 Jan 2016 15:13
Last Modified:08 Dec 2017 16:44
Publisher:American Chemical Society (ACS)
ISSN:1549-9618
Publisher DOI:https://doi.org/10.1021/acs.jctc.5b00618
PubMed ID:26574336

Download