Personalized cancer-specific networks
General Genetics and Functional Genome Biology
Structural Biology
Cell Biology
Final Report Abstract
Bioinformatics finds biomarkers involved in tumorigenesis. One main goal of the project was to investigate isoform switching, which often translates to the loss or gain of domains mediating protein interactions and thus, re-wires the set of all interacting proteins in an organism (referred to as interactome). We found several probable novel biomarkers such as the THCA BRAF-like specific 4-gene signature biomarker (ODAM, APP, IKBKG, and TOLLIP) involved in tumorigenesis. The ability to determine uniquely distorted interactions whose participants are predictive of patient survival opens up the possibility to computationally obtain potential protein biomarkers for specific cancer types and subtypes. Better prediction of interaction sites in membrane proteins. Interactions between transmembrane (TM) proteins are fundamental for a wide spectrum of cellular functions. Across the main functional families of human transmembrane proteins, the percentage of amino acid sites predicted to be involved in interactions typically ranges between 10% and 25%, and up to 30% in ion channels. We have developed DeepTMInter that improved the sequence-based prediction of interaction sites in helical transmembrane proteins by deep learning. Family-specific analysis of variant pathogenicity prediction tools. Many methods predict the effect of single amino acid variants (SAVs) upon molecular protein function. Correlating such predictions with pathogenicity, we found each tool to have its Achilles heel, i.e., cases with unreliable predictions. Families more sensitive to changes were often involved in, e.g., regulating transcription, while those not leading to disease upon mutation were often related to immune and stress responses. These results suggest to consider functional annotation to improve effect predictions. Identification of disease-causing mutations in human transmembrane proteins. We developed a suite of sequence-based computational methods Pred-MutHTP. Discriminating between disease-causing and neutral SAVs that pushed the state-of-the-art with regard to predicting the effects of variants in the transmembrane protein segments. Protein Language Models (pLMs) change protein bioinformatics. AlphaFold2 has dramatically revolutionized the life sciences by reliably predicting protein 3D structure. Orthogonal to its success which builds upon multiple-sequence alignments (MSAs), we have developed so-called protein language models (pLMs) that bypass the increasingly expensive database search needed to generate MSAs. Instead, pLMs extract information from single protein sequences. Crucial for their success is their ability to learn information directly from large amounts of unlabeled protein sequence data. After an initial pre-training phase, the information learned can be transferred to develop downstream prediction tasks by inputting representations extracted from the pLM (referred to as embeddings). We used embeddings to predict sequence conservation and the effects of SAVs without the need for MSAs. Using embeddings from pLMs as input also reduced the amount of data needed to train subsequent prediction tasks. This improved performance for various tasks relevant for understanding the protein networks underlying most diseases, i.e., we successfully predicted various ligand binding classes, cellular location, disorder, and gene ontology. All methods and data sets are freely available online. As announced in our proposal, we have made predictions from our new pLM-based methods easily accessible via the new LambdaPP web server. The unexpected and vast potential of pLMs clearly constitutes a major game change in protein bioinformatics as attested to be the number of citations to several of the papers describing pLMs.
Publications
-
Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics, 20(1).
Heinzinger, Michael; Elnaggar, Ahmed; Wang, Yu; Dallago, Christian; Nechaev, Dmitrii; Matthes, Florian & Rost, Burkhard
-
Prediction of disease-associated mutations in the transmembrane regions of proteins with known 3D structure. PLOS ONE, 14(7), e0219452.
Popov, Petr; Bizin, Ilya; Gromiha, Michael; Kulandaisamy, A. & Frishman, Dmitrij
-
Pred‐MutHTP: Prediction of disease‐causing and neutral mutations in human transmembrane proteins. Human Mutation, 41(3), 581-590.
Kulandaisamy, A.; Zaucha, Jan; Sakthivel, Ramasamy; Frishman, Dmitrij & Michael, Gromiha M.
-
Edgetic perturbation signatures represent known and novel cancer biomarkers. Scientific Reports, 10(1).
Kataka, Evans; Zaucha, Jan; Frishman, Goar; Ruepp, Andreas & Frishman, Dmitrij
-
Family-specific analysis of variant pathogenicity prediction tools. NAR Genomics and Bioinformatics, 2(2).
Zaucha, Jan; Heinzinger, Michael; Tarnovskaya, Svetlana; Rost, Burkhard & Frishman, Dmitrij
-
Mutations in transmembrane proteins: diseases, evolutionary insights, prediction and comparison with globular proteins. Briefings in Bioinformatics, 22(3).
Zaucha, Jan; Heinzinger, Michael; Kulandaisamy, A.; Kataka, Evans; Salvádor, Óscar Llorian; Popov, Petr; Rost, Burkhard; Gromiha, M. Michael; Zhorov, Boris S. & Frishman, Dmitrij
-
Embeddings from protein language models predict conservation and variant effects. Human Genetics, 141(10), 1629-1647.
Marquet, Céline; Heinzinger, Michael; Olenyi, Tobias; Dallago, Christian; Erckert, Kyra; Bernhofer, Michael; Nechaev, Dmitrii & Rost, Burkhard
-
Improved sequence-based prediction of interaction sites in α-helical transmembrane proteins by deep learning. Computational and Structural Biotechnology Journal, 19, 1512-1530.
Sun, Jianfeng & Frishman, Dmitrij
-
Protein embeddings and deep learning predict binding residues for various ligand classes. Scientific Reports, 11(1).
Littmann, Maria; Heinzinger, Michael; Dallago, Christian; Weissenow, Konstantin & Rost, Burkhard
-
Protein matchmaking through representation learning. Cell Systems, 12(10), 948-950.
Heinzinger, Michael; Dallago, Christian & Rost, Burkhard
-
Refining Embedding-Based Binding Predictions by Leveraging AlphaFold2 Structures. Cold Spring Harbor Laboratory.
Endres, Leopold; Olenyi, Tobias; Erckert, Kyra; Weißenow, Konstantin; Rost, Burkhard & Littmann, Maria
-
Novel machine learning approaches revolutionize protein knowledge. Trends in Biochemical Sciences, 48(4), 345-359.
Bordin, Nicola; Dallago, Christian; Heinzinger, Michael; Kim, Stephanie; Littmann, Maria; Rauer, Clemens; Steinegger, Martin; Rost, Burkhard & Orengo, Christine
