Project Details
SAFE: Synthetically Accessible Fragment Space Extensions by Machine Learning-Based Approaches
Subject Area
Organic Molecular Chemistry - Synthesis and Characterisation
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 497017145
The overall aim of the project is the creation of predictive models enabling the extension of synthetically accessible chemical fragment spaces. A key question arising today in drug discovery, materials design and also synthetic chemistry is how to precisely map the space of synthetically accessible organic compounds with reasonable efforts. Large pharmaceutical companies as well as several large compound vendors addressed this question with the definition of synthetically accessible chemical fragment spaces. Since fragment spaces are reaction-pattern driven, the extension problem can be broken down to the prediction of building blocks compatible with a certain reaction and compatible with each other in a reaction. In the proposed project a chemoinformatics framework will be developed for the extraction of data on reaction schemes and reactants from fragment spaces. In addition, problem-specific molecular descriptors will be designed for the application in reaction predictions. Based on these data and representations, the identification of tolerated reactants will be aspired for selected reaction schemes. For this purpose, state-of-the-art machine learning models will be evaluated and trained with data from the fragment spaces and the literature. A large variety of techniques from random forests, support vector machines, gradient boosting, to neural networks will be tested for their ability to generalize from the training data. For the targeted improvement of the prediction performance, an active learning strategy will be combined with screening techniques to generate large amounts of experimental data. To achieve this goal, new convolutional screening techniques will be developed and evaluated. The predictive power of the model will be continuously validated experimentally to ensure the synthesizability of the spatial extensions. Software tools for transferring reactions identified as productive into fragment space, and an optimization scheme for grouping reactants with maximum compatibility will be developed. In the final phase of the project, trials for the partial automation of the workflow will be performed.
DFG Programme
Priority Programmes