Project Details
Overcoming the limits of remote functionalization through machine learning guided catalyst identification
Applicant
Professorin Dr. Franziska Schoenebeck
Subject Area
Organic Molecular Chemistry - Synthesis and Characterisation
Term
since 2025
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 561335463
Remote functionalization is an important strategy in homogeneous catalysis and synthesis that allows for the functionalization of a molecule at a site distant from its initial activation. A so-called chain walking process facilitates the migration of the metal from the initiation to the remote site. Although significant progress has been made in the field, several challenges remain with regard to scope, selectivity and generality of the transformation and catalysts. In this context, we have previously shown an unprecedented speed of a Pd(I) dimer catalyst in remote cross-coupling (arylation). While most catalysts require reaction times of at least 12 h and often elevated temperatures, the di-nuclear Pd(I) accomplished this feat in 10 min at room temperature and exclusive selectivity. The current limitation of this method is the requirement of an ortho-fluorinated substrate, without which the selectivity for remote coupling cannot be reached. Here, we propose a machine learning (ML) approach to find suitable candidates within a large ligand library that are not only able to form Pd(I) dimers (which will allow for a speedy reaction) but also trigger high selectivity in terms of promoting the product of remote functionalization independent of any substrate restrictions. Building on our expertise in the field we propose to conduct this search for general and selective catalysts for remote functionalization in the following stages: In the first stage we will build on our previous work applying a sequential clustering strategy to an extended ligand library. This will involve initial clustering of all ligands according to general features followed by a second clustering based on DFT-based, speciation-oriented features. The latter serves as a means to further distinguish and filter the ligands with regard to their ability to form Pd(I) dimers. Alternatively, we will also explore resource effective semi-supervised learning. The second stage will then use a small set of DFT-computed selectivities for a semi-supervised self-training approach to iteratively label the remaining ligand set by retraining the model with high confidence predictions. In the third stage the identified selective and Pd(I) dimer-forming ligand candidates will be explored experimentally. The ligands and their corresponding Pd(I) dimers will be synthesized and tested for their reactivity in remote arylation. In the final stage we will apply the gained knowledge to Ni(I) dimers. By analyzing feature importance at the different ML stages, we intend to directly compare the two metals with regard to their ligand requirements and how these affect speciation and selectivity. Ultimately, our aim is to arrive at a deeper understanding about the differences and commonalities of the studied Pd(I) and Ni(I) complexes, which we believe to be instrumental for future catalyst development and advancing the field of homogeneous catalysis and synthesis.
DFG Programme
Priority Programmes
