Project Details
Machine learning approaches for faster discovery and adaptation of enzymes for difficult chemical reactions. (MacBioSyn 2.0) Phase II: predicting and expanding the enzymatic reaction scope to include new-to-nature reactions
Applicant
Dr. Mehdi Davari Dolatabadi, Ph.D.
Subject Area
Biological and Biomimetic Chemistry
Organic Molecular Chemistry - Synthesis and Characterisation
Organic Molecular Chemistry - Synthesis and Characterisation
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 497207454
The biocatalytic synthesis of chemicals is essential for green, sustainable chemistry but remains underutilized due to limited enzyme activity and diversity. 2-oxoglutarate-dependent (2OGD) enzymes, an underexplored family, can catalyze challenging oxidative reactions like oxyfunctionalization of non-activated carbons and demethylations—difficult to achieve through traditional synthesis. Identifying novel 2OGD enzymes with expanded substrate scopes could unlock new biocatalytic pathways for producing valuable natural products, aiding fields such as medicinal chemistry and drug design. A prevalent challenge in enzyme development is the accurate prediction of catalytic activities, substrate, and reaction scopes, necessitating a.o. extensive exploration of biodiversity through genome mining. Leveraging machine learning (ML) techniques offers a promising alternative avenue to harness vast and diverse enzyme datasets, enabling precise predictions of enzyme functions and activities while facilitating the identification of advanced biocatalysts. In the phase I of the MacBioSyn project, we successfully developed an ML platform for extensive mining of 2OGD enzymes. This platform integrates a comprehensive ML framework capable of predicting enzymatic function and substrate specificity through high-throughput (HT) methodologies. To address the critical challenge of dataset size and quality, our Phase I efforts focused on generating a substantial dataset via HT screening of over 750 enzymes from the 2OGD superfamily and 30 diverse substrates. Upon completion of Phase I, we will establish an optimized workflow and software for identifying active enzymes, their substrate scope, alongside a novel set of biocatalysts capable of executing valuable 2OGD reactions. In Phase II of MacBioSyn, we aim to advance ML methodologies to predict and discover novel enzymatic reactions. Our focus will extend to the development of substrate-promiscuous biocatalysts capable of halogenation and non-native C-H functionalization reactions using pseudohalkogeides asalternative nucleophiles. We will implement a general, HT ML-based framework to predict enzyme activity, substrate, and reaction scopes. This ML framework will serve as a vital tool for biocatalyst discovery, enhancing our understanding of 2OGD enzyme functionality. Our synergistic approach promises to accelerate the development of improved enzymes for diverse biocatalytic applications beyond the native oxyfunctionalization reactions catalyzed by these enzymes, specifically new-to-nature pseudohalogenations. The fundamental design principles derived from our studies on 2OGD enzymes will significantly broaden their utility in chemistry by providing methods for challenging reactions, advancing synthetic and medicinal chemistry by “green” access to halogenated natural products and simple direct (i.e. one-step) oxidative functionalization to e.g. nitriles, nitro compounds, azides or substituted alkynes.
DFG Programme
Priority Programmes
Subproject of
SPP 2363:
Utilization and Development of Machine Learning for Molecular Applications – Molecular Machine Learning
Co-Investigators
Martin Dippe, Ph.D.; Professor Dr. Ludger A. Wessjohann
