Project Details
Multi-fidelity, active learning strategies for exciton transfer in cryptophyte antenna complexes
Subject Area
Methods in Artificial Intelligence and Machine Learning
Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 496900167
Multiscale simulation of light-harvesting complexes is key to fundamental research in photosynthesis and solar cell design. Among the biological species performing light harvesting are cryptophyte algae. The absorption of the sunlight in these algae takes place in phycobiliproteins by pigment molecules termed bilins. Due to the flexible nature of the proteins, the simulation of the light-harvesting process requires accurately calculating excitonic properties (excitation energies, couplings, transition dipole moments, etc.) for over hundreds of thousands to millions of bilin conformations. Nowadays, large parallel computers allow for initial studies at full scale. Still, such studies are only computationally feasible, if excitonic properties are evaluated at rather low, i.e., cheap to compute, levels of quantum chemical theory. This strongly limits the expressiveness of the results. The overarching goal of this project is to enable highly accurate multiscale simulations of light-harvesting complexes by replacing computationally expensive quantum chemical calculations at high level of theory by cheap to evaluate machine learning (ML) models. To assure true efficiency of the approach, all ML models have to be constructed such that a minimum computational effort is required to build training data giving models of low prediction error. The target is to go away from investing arbitrary amounts of computing time into training data generation, while reporting fast model predictions, over to an approach of efficiency in both the model construction and model evaluations. The main objective of the second funding phase is to bring multi-fidelity machine learning (MFML) into “production'', that is the approach is generalized to more diverse chemical properties, improved in efficiency in presence of less clear data hierarchies and automatized in an active learning (AL) based target-error adaptive construction. MFML techniques are then provided as a community-wide available software package for further exploration beyond the original intended application. In addition, for bimolecular learning, the aim is to go beyond an explorative study to show true impact on challenging data with massive cost reductions and better generalizability of models for, e.g., coupling energies. Within this project, both MFML and bimolecular learning will be applied and tested for three phycobiliproteins from cryptophyte algae, namely PC612, PC645, and PE566.
DFG Programme
Priority Programmes
