Publication: Combining Molecular Dynamics and Machine Learning for Drug Design
Combining Molecular Dynamics and Machine Learning for Drug Design
Date
Date
Date
| dc.contributor.institution | University of Zurich | |
| dc.date.accessioned | 2024-11-08T13:36:54Z | |
| dc.date.available | 2024-11-08T13:36:54Z | |
| dc.date.issued | 2024-11-08 | |
| dc.description.abstract | Drug discovery is a capital-intensive and time-consuming process that requires significant human resources to progress from early discovery to market approval. To accelerate this process, numerous computational techniques have been developed to design potential drug candidates or decipher the recognition patterns between drug molecules and their targets. Molecular dynamics (MD) simulation offers valuable atomic-level mechanistic insights that enhance understanding of molecular interactions, which are critical for drug development. However, MD simulation is a computationally demanding technique with a steep learning curve, necessitating substantial resources, advanced data management, and considerable programming skills. The user-friendliness of scientific software has become a crucial factor in making advanced techniques accessible to a broader audience. Moreover, the conventional, artisanal approach prevalent in MD research is increasingly at odds with modern demands for scientific data management and the growing focus on open and reproducible science. This emphasizes the importance of establishing robust data protocols and transparent methodologies to improve the credibility and applicability of MD research. Recently, significant advancements in information technology have significantly improved the accessibility of personal computers to specialized computer-aided drug discovery processes, providing a unique opportunity to democratize this technique. Due to the data-intensive and high-dimensional nature of MD trajectory, the post-processing and characterization of structural dynamics remain challenging. Deep learning, featured by its capability to extract hidden patterns from data and its “data-hungry” nature, has been successfully applied to non-Euclidean manifolds and is well-suited to complement MD simulations. However, the majority of deep learning research in drug discovery primarily focuses on static molecular structures, often overlooking the dynamic aspects of molecular interactions. Integrating molecular dynamics trajectories with deep learning offers a promising strategy to not only facilitate the structural perception of machine learning models but also to improve the reusability of MD trajectories. Effective numerical interpretation and comprehension of molecular structures in machine learning necessitate their abstraction into certain representations, such as graphs, surfaces, point clouds and 3D mesh grids, which are similar to or directly borrowed from the field of computer vision and graphics. This similarity draws intriguing parallels between the two fields, suggesting that methodologies and algorithms from computer vision could be adapted for molecular science. However, unlike computer vision and graphics, which deal with macroscopic objects and visual signal processing, molecular science focuses on atomic-level structures and interactions, necessitating specialized feature engineering. However, feature engineering in molecular science often relies on empirical knowledge and lacks systematic methods for feature validation. Understanding the expressiveness of geometric and topological features is essential to inform the development of more expressive molecular representations. In this thesis, I concentrate on two primary objectives: firstly, to democratize and modernize computer-aided drug discovery and molecular dynamics simulations through the application of contemporary web and database technologies; and secondly, to explore the methods for extracting structural dynamic features from molecular dynamics trajectories and integrating them into modern machine learning approaches. In Chapter 1, I provide an overview of the foundational topics related to my doctoral research. This includes introductions to MD simulation, machine learning, recent innovations in computer-aided and AI-assisted drug discovery, high-performance computing, contemporary open science practices, scientific software development, and modern scientific data management. In Chapter 2, I first discuss the development of ACGui, a versatile web-based drug discovery platform to democratize structure-based drug discovery approaches. I describe my role in integrating a closed-loop MD simulation workflow within ACGui, which featured batch MD system preparation, comprehensive meta-data collection on MD simulations, online trajectory visualization and analysis in Section 2.1. Additionally, I detail the implementation of a standardized SQL database to manage MD trajectories, adhering to modern FAIR principles for scientific data management in Section 2.2. Chapter 3 introduces a dataset of molecular fragments representing flexible objects to transform the 3D pose perception as a classification challenge. Through the evaluation of various vision models using four different molecular representations with obscured topological, the results demonstrate that even pure 3D geometry inherently encodes essential topological information (e.g., bond length, atomic radius), underscoring the importance of robust feature validation in feature engineering. Chapter 4 describes the development of two algorithms designed to extract dynamic features from molecular dynamic trajectories. These features are designed to be seamlessly integrated into 3D convolutional neural networks for predicting ligand binding affinity, showcasing the potential of dynamic information in enhancing machine learning models. The appendix presents an applied study on the impact of an electric field on the stability of β-amyloid dimers. Using structures predicted by AlphaFold2 and further analyzed through MD simulation, the study examines the flexibility and secondary structure content of Aβ42 dimers, contributing to the understanding of amyloid aggregation dynamics. | |
| dc.identifier.uri | https://www.zora.uzh.ch/handle/20.500.14742/222744 | |
| dc.language.iso | eng | |
| dc.subject.ddc | 610 Medicine & health | |
| dc.subject.ddc | 570 Life sciences; biology | |
| dc.title | Combining Molecular Dynamics and Machine Learning for Drug Design | |
| dc.type | dissertation | |
| dcterms.accessRights | info:eu-repo/semantics/openAccess | |
| dcterms.bibliographicCitation.originalpublisherplace | Zürich | |
| dspace.entity.type | Publication | en |
| uzh.agreement.thesis | YES | |
| uzh.contributor.author | Zhang, Yang | |
| uzh.contributor.correspondence | Yes | |
| uzh.contributor.examiner | Caflisch, Amedeo | |
| uzh.contributor.examiner | Hutter, Jürg | |
| uzh.contributor.examiner | Zoete, Vincent | |
| uzh.contributor.examiner | Vitalis, Andreas | |
| uzh.contributor.examinercorrespondence | Yes | |
| uzh.contributor.examinercorrespondence | No | |
| uzh.contributor.examinercorrespondence | No | |
| uzh.contributor.examinercorrespondence | No | |
| uzh.document.availability | published_version | |
| uzh.eprint.datestamp | 2024-11-08 13:36:54 | |
| uzh.eprint.lastmod | 2024-11-08 13:37:20 | |
| uzh.eprint.statusChange | 2024-11-08 13:36:54 | |
| uzh.harvester.eth | Yes | |
| uzh.harvester.nb | Yes | |
| uzh.identifier.doi | 10.5167/uzh-264338 | |
| uzh.oastatus.zora | Green | |
| uzh.publication.citation | Zhang, Yang . Combining Molecular Dynamics and Machine Learning for Drug Design. 2024, University of Zurich, Faculty of Science. | |
| uzh.publication.faculty | science | |
| uzh.publication.pageNumber | 189 | |
| uzh.publication.thesisType | cumulative | |
| uzh.workflow.eprintid | 264338 | |
| uzh.workflow.fulltextStatus | public | |
| uzh.workflow.revisions | 4 | |
| uzh.workflow.rightsCheck | keininfo | |
| uzh.workflow.status | archive | |
| Files | ||
| Publication available in collections: |