Project Details
Projekt Print View

Molecular Descriptors in Matrix Completion Methods

Subject Area Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Term since 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 497201843
 
Knowledge of the physicochemical properties of mixtures is essential for process design and optimization in chemistry and related fields, but experimental data are scarce, making reliable prediction methods paramount. Matrix completion methods (MCMs) are a fascinating class of machine learning (ML) models for this purpose, taking advantage of the fact that the properties of binary mixtures can conveniently be arranged in matrices. Since these matrices are usually only sparsely populated with experimental data, the prediction of the missing entries is a matrix completion problem. MCMs are thereby unique as they, in their pure form, do not use any molecular descriptors as their input; by contrast, they learn, in a collaborative-filtering way, solely from the available mixture data. The main goal of this project is to significantly enhance MCMs for property prediction by additionally incorporating molecular descriptors in their training process, whereby two routes will be followed: the integration of molecular class affiliations learned by clustering based on molecular similarities and the integration of molecular graphs via coupling MCMs with graph neural networks (GNNs). Building on the results of the first funding period of the SPP 2363, where enhanced MCMs were successfully developed for predicting activity coefficients, we will substantially extend the approaches in three directions: we will develop (i) enhanced models for predicting further physicochemical properties of mixtures, specifically Henry's law constants and diffusion coefficients, (ii) multi-task models for the joint prediction of multiple mixture properties, and (iii) enhanced hybrid models combining physical models with MCMs, where the MCMs are used for predicting fundamental pair interactions. Furthermore, in collaborations within the SPP, we will explore a transfer of the MCM approach to predicting other properties like chemical reactions and affinities between ligands and metal complexes. Achieving this goal requires developing an interactive data analysis framework, e.g., for the unbiased definition of molecular class affiliations based on mixture data, which will also create an understanding of the ML models and of what matters on the molecular level for describing mixture behavior, generating trust in and acceptance of the developed prediction methods. We will furthermore develop tools for systematically analyzing and explaining molecular graph embeddings from the developed GNN-MCM combinations trained on mixture data. We will implement all developed models into software tools, which we will make freely available within the SPP and beyond.
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung