Project Details
Projekt Print View

Graph Grammars for Molecular Structure Search and Classification

Subject Area Theoretical Computer Science
Term from 2019 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 416768284
 
Numerous fields of study focus on small molecules. A prominent example is the field of drug design, where small molecules are used to inhibit or activate proteins to achieve a desired biological function. In these fields, we often want to scan databases for molecules containing certain substructures. Traditionally, these substructures are modelled in chemical description languages such as Daylight’s SMARTS. These languages tend to be very complex and are very restricted in their ability to describe the topological patterns of the underlying graphs. Parsing and matching patterns against a database of molecules is NP-complete. To circumvent these problems, we propose a simple graph grammar to describe substructures. Even very simple graph rewriting systems allow a high expressive power that almost reaches that of SMARTS. To use these graph grammars for molecular structure search, we have to solve the subgraph matching problem. Although this problem remains NP-complete, it becomes polynomial if each minimal cut of the query graph has bounded size, which we empirically find to be true for most molecules contained in the standard databases. We will investigate the complexity of the problem for more known graph parameters and try to relate the maximal size of a minimal cut to other parameters and we will focus on parameters that are typically small for molecular graphs and we will make our basic algorithm more efficient in practice. Furthermore, we want to derive over-approximations of the class of graphs generated by a grammar for which the subgraph matching problem can be solved more efficiently. As a second research direction, we will develop and implement efficient algorithms for learning graph grammars from positive and negative examples. We aim to find a graph grammar that is as simple as possible and matches the positive examples but does not match the negative examples for the chemical group. A trivial grammar that interpolates the positive and negative examples is a grammar that creates positive examples that clearly overfit the positive examples. The underlying idea behind this learning task is to automatically identify aspects of the pharmacophore of these molecules. The challenge here is to simultaneously prevent overfitting and overgeneralization. We plan to develop constructive algorithms, i.e. algorithms that compute a simple graph grammar that interpolates the positive and negative examples and improvement algorithms, i.e. algorithms that try to simplify a graph grammar while preserving its interpolating property.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung