Personalisierte krebsspezifische Netzwerke

Antragsteller Professor Dr. Dmitrij Frishman; Professor Dr. Burkhard Rost

Fachliche Zuordnung Bioinformatik und Theoretische Biologie
Allgemeine Genetik und funktionelle Genomforschung
Strukturbiologie
Zellbiologie

Förderung Förderung von 2017 bis 2022

Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 326946590

Erstellungsjahr 2024

Zusammenfassung der Projektergebnisse

Bioinformatics finds biomarkers involved in tumorigenesis. One main goal of the project was to investigate isoform switching, which often translates to the loss or gain of domains mediating protein interactions and thus, re-wires the set of all interacting proteins in an organism (referred to as interactome). We found several probable novel biomarkers such as the THCA BRAF-like specific 4-gene signature biomarker (ODAM, APP, IKBKG, and TOLLIP) involved in tumorigenesis. The ability to determine uniquely distorted interactions whose participants are predictive of patient survival opens up the possibility to computationally obtain potential protein biomarkers for specific cancer types and subtypes. Better prediction of interaction sites in membrane proteins. Interactions between transmembrane (TM) proteins are fundamental for a wide spectrum of cellular functions. Across the main functional families of human transmembrane proteins, the percentage of amino acid sites predicted to be involved in interactions typically ranges between 10% and 25%, and up to 30% in ion channels. We have developed DeepTMInter that improved the sequence-based prediction of interaction sites in helical transmembrane proteins by deep learning. Family-specific analysis of variant pathogenicity prediction tools. Many methods predict the effect of single amino acid variants (SAVs) upon molecular protein function. Correlating such predictions with pathogenicity, we found each tool to have its Achilles heel, i.e., cases with unreliable predictions. Families more sensitive to changes were often involved in, e.g., regulating transcription, while those not leading to disease upon mutation were often related to immune and stress responses. These results suggest to consider functional annotation to improve effect predictions. Identification of disease-causing mutations in human transmembrane proteins. We developed a suite of sequence-based computational methods Pred-MutHTP. Discriminating between disease-causing and neutral SAVs that pushed the state-of-the-art with regard to predicting the effects of variants in the transmembrane protein segments. Protein Language Models (pLMs) change protein bioinformatics. AlphaFold2 has dramatically revolutionized the life sciences by reliably predicting protein 3D structure. Orthogonal to its success which builds upon multiple-sequence alignments (MSAs), we have developed so-called protein language models (pLMs) that bypass the increasingly expensive database search needed to generate MSAs. Instead, pLMs extract information from single protein sequences. Crucial for their success is their ability to learn information directly from large amounts of unlabeled protein sequence data. After an initial pre-training phase, the information learned can be transferred to develop downstream prediction tasks by inputting representations extracted from the pLM (referred to as embeddings). We used embeddings to predict sequence conservation and the effects of SAVs without the need for MSAs. Using embeddings from pLMs as input also reduced the amount of data needed to train subsequent prediction tasks. This improved performance for various tasks relevant for understanding the protein networks underlying most diseases, i.e., we successfully predicted various ligand binding classes, cellular location, disorder, and gene ontology. All methods and data sets are freely available online. As announced in our proposal, we have made predictions from our new pLM-based methods easily accessible via the new LambdaPP web server. The unexpected and vast potential of pLMs clearly constitutes a major game change in protein bioinformatics as attested to be the number of citations to several of the papers describing pLMs.

Projektbezogene Publikationen (Auswahl)

Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics, 20(1).
Heinzinger, Michael; Elnaggar, Ahmed; Wang, Yu; Dallago, Christian; Nechaev, Dmitrii; Matthes, Florian & Rost, Burkhard
Prediction of disease-associated mutations in the transmembrane regions of proteins with known 3D structure. PLOS ONE, 14(7), e0219452.
Popov, Petr; Bizin, Ilya; Gromiha, Michael; Kulandaisamy, A. & Frishman, Dmitrij
Pred‐MutHTP: Prediction of disease‐causing and neutral mutations in human transmembrane proteins. Human Mutation, 41(3), 581-590.
Kulandaisamy, A.; Zaucha, Jan; Sakthivel, Ramasamy; Frishman, Dmitrij & Michael, Gromiha M.
Edgetic perturbation signatures represent known and novel cancer biomarkers. Scientific Reports, 10(1).
Kataka, Evans; Zaucha, Jan; Frishman, Goar; Ruepp, Andreas & Frishman, Dmitrij
Family-specific analysis of variant pathogenicity prediction tools. NAR Genomics and Bioinformatics, 2(2).
Zaucha, Jan; Heinzinger, Michael; Tarnovskaya, Svetlana; Rost, Burkhard & Frishman, Dmitrij
Mutations in transmembrane proteins: diseases, evolutionary insights, prediction and comparison with globular proteins. Briefings in Bioinformatics, 22(3).
Zaucha, Jan; Heinzinger, Michael; Kulandaisamy, A.; Kataka, Evans; Salvádor, Óscar Llorian; Popov, Petr; Rost, Burkhard; Gromiha, M. Michael; Zhorov, Boris S. & Frishman, Dmitrij
Embeddings from protein language models predict conservation and variant effects. Human Genetics, 141(10), 1629-1647.
Marquet, Céline; Heinzinger, Michael; Olenyi, Tobias; Dallago, Christian; Erckert, Kyra; Bernhofer, Michael; Nechaev, Dmitrii & Rost, Burkhard
Improved sequence-based prediction of interaction sites in α-helical transmembrane proteins by deep learning. Computational and Structural Biotechnology Journal, 19, 1512-1530.
Sun, Jianfeng & Frishman, Dmitrij
Protein embeddings and deep learning predict binding residues for various ligand classes. Scientific Reports, 11(1).
Littmann, Maria; Heinzinger, Michael; Dallago, Christian; Weissenow, Konstantin & Rost, Burkhard
Protein matchmaking through representation learning. Cell Systems, 12(10), 948-950.
Heinzinger, Michael; Dallago, Christian & Rost, Burkhard
Refining Embedding-Based Binding Predictions by Leveraging AlphaFold2 Structures. Cold Spring Harbor Laboratory.
Endres, Leopold; Olenyi, Tobias; Erckert, Kyra; Weißenow, Konstantin; Rost, Burkhard & Littmann, Maria
Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, 48(4), 345-359.
Bordin, Nicola; Dallago, Christian; Heinzinger, Michael; Kim, Stephanie; Littmann, Maria; Rauer, Clemens; Steinegger, Martin; Rost, Burkhard & Orengo, Christine

Servicenavigation

Hauptnavigation

Personalisierte krebsspezifische Netzwerke

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Servicenavigation

Hauptnavigation

Personalisierte krebsspezifische Netzwerke

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Textvergrößerung und Kontrastanpassung