Project Details
Projekt Print View

Context-based discovery of functional motifs in low complexity regions of protein sequences

Subject Area Bioinformatics and Theoretical Biology
Term from 2017 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 387883086
 
Final Report Year 2021

Final Report Abstract

Protein sequences contain regions that differ in their composition from the “normal” composition of proteins. These regions are termed low complexity regions (LCRs) and tend to lack structure and are often disordered. The evolution of LCRs is fast and this, together with their lack of structure, complicates the study of their properties and function. However, increasing experimental evidence indicates that LCRs have functions, particularly in the modulation of protein-protein interactions. LCRs have a tendency to be exposed and, as they evolve quickly, become very appropriate for having short linear motifs (LMs), many of them resulting in post-translational modification (PTMs) that have regulatory functions in the interaction of the protein. To facilitate the detection of such motifs we first approached the definition of low complexity regions, which overlaps very much to that of intrinsically disordered regions and of homorepeats (tracts of consecutive repeated amino acids which occur in many proteins). We then established evolutionary approaches to discover functional LCRs and LMs by comparing their sequences across different species in related genes (orthologs). In collaborations with experimental groups, we applied our methods to the characterization of the dynamic structure of polyQ in the human protein Huntingtin, and to find genes involved in mRNA modification. Finally, we found that avoided motifs (motifs missing in particular protein locations or species) can indicate function associated to these motifs. We found that the definitions of low complexity regions, disordered regions, and short tandem repeats and homorepeats overlap more than expected and that the community of researchers working on those had not defined consensus definitions for those. We established these standards in collaboration with the community. - We found that a particular type of LM, homorepeats, have a richer than expected variety of modes of conservation. This made evident the necessity of studying LMs and LCRs in different well-defined taxonomic contexts. - We found a higher frequency than expected of LCRs with LMs that are short tandem repeats. We suspect that while low complexity regions have a tendency to be disordered, repetitiveness in these regions might induce structure. This is a lead we will follow up.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung