Header

UZH-Logo

Maintenance Infos

enviRule: an end-to-end system for automatic extraction of reaction patterns from environmental contaminant biotransformation pathways


Zhang, Kunyang; Fenner, Kathrin (2023). enviRule: an end-to-end system for automatic extraction of reaction patterns from environmental contaminant biotransformation pathways. Bioinformatics, 39(7):btad407.

Abstract

Transformation products (TPs) of man-made chemicals, formed through microbially mediated transformation in the environment, can have serious adverse environmental effects, yet the analytical identification of TPs is challenging. Rule-based prediction tools are successful in predicting TPs, especially in environmental chemistry applications that typically have to rely on small datasets, by imparting the existing knowledge on enzyme-mediated biotransformation reactions. However, the rules extracted from biotransformation reaction databases usually face the issue of being over/under-generalized and are not flexible to be updated with new reactions.

We developed an automatic rule extraction tool called enviRule. It clusters biotransformation reactions into different groups based on the similarities of reaction fingerprints, and then automatically extracts and generalizes rules for each reaction group in SMARTS format. It optimizes the genericity of automatic rules against the downstream TP prediction task. Models trained with automatic rules outperformed the models trained with manually curated rules by 30% in the area under curve (AUC) scores. Moreover, automatic rules can be easily updated with new reactions, highlighting enviRule’s strengths for both automatic extraction of optimized reactions rules and automated updating thereof.

Abstract

Transformation products (TPs) of man-made chemicals, formed through microbially mediated transformation in the environment, can have serious adverse environmental effects, yet the analytical identification of TPs is challenging. Rule-based prediction tools are successful in predicting TPs, especially in environmental chemistry applications that typically have to rely on small datasets, by imparting the existing knowledge on enzyme-mediated biotransformation reactions. However, the rules extracted from biotransformation reaction databases usually face the issue of being over/under-generalized and are not flexible to be updated with new reactions.

We developed an automatic rule extraction tool called enviRule. It clusters biotransformation reactions into different groups based on the similarities of reaction fingerprints, and then automatically extracts and generalizes rules for each reaction group in SMARTS format. It optimizes the genericity of automatic rules against the downstream TP prediction task. Models trained with automatic rules outperformed the models trained with manually curated rules by 30% in the area under curve (AUC) scores. Moreover, automatic rules can be easily updated with new reactions, highlighting enviRule’s strengths for both automatic extraction of optimized reactions rules and automated updating thereof.

Statistics

Citations

Dimensions.ai Metrics
1 citation in Web of Science®
1 citation in Scopus®
Google Scholar™

Altmetrics

Downloads

0 downloads since deposited on 21 Feb 2024
0 downloads since 12 months

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Department of Chemistry
Dewey Decimal Classification:540 Chemistry
Scopus Subject Areas:Physical Sciences > Statistics and Probability
Life Sciences > Biochemistry
Life Sciences > Molecular Biology
Physical Sciences > Computer Science Applications
Physical Sciences > Computational Theory and Mathematics
Physical Sciences > Computational Mathematics
Uncontrolled Keywords:Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
Language:English
Date:1 July 2023
Deposited On:21 Feb 2024 08:42
Last Modified:30 Jun 2024 03:33
Publisher:Oxford University Press
ISSN:1367-4803
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/bioinformatics/btad407
PubMed ID:37354527
Project Information:
  • : FunderH2020
  • : Grant ID956496
  • : Project TitleThe European Industry - Academia Network for RevIsing and Advancing the Assessment of the Soil Microbial TOxicity of Pesticides
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)