Context-based discovery of functional motifs in low complexity regions of protein sequences
Final Report Abstract
Protein sequences contain regions that differ in their composition from the “normal” composition of proteins. These regions are termed low complexity regions (LCRs) and tend to lack structure and are often disordered. The evolution of LCRs is fast and this, together with their lack of structure, complicates the study of their properties and function. However, increasing experimental evidence indicates that LCRs have functions, particularly in the modulation of protein-protein interactions. LCRs have a tendency to be exposed and, as they evolve quickly, become very appropriate for having short linear motifs (LMs), many of them resulting in post-translational modification (PTMs) that have regulatory functions in the interaction of the protein. To facilitate the detection of such motifs we first approached the definition of low complexity regions, which overlaps very much to that of intrinsically disordered regions and of homorepeats (tracts of consecutive repeated amino acids which occur in many proteins). We then established evolutionary approaches to discover functional LCRs and LMs by comparing their sequences across different species in related genes (orthologs). In collaborations with experimental groups, we applied our methods to the characterization of the dynamic structure of polyQ in the human protein Huntingtin, and to find genes involved in mRNA modification. Finally, we found that avoided motifs (motifs missing in particular protein locations or species) can indicate function associated to these motifs. We found that the definitions of low complexity regions, disordered regions, and short tandem repeats and homorepeats overlap more than expected and that the community of researchers working on those had not defined consensus definitions for those. We established these standards in collaboration with the community. - We found that a particular type of LM, homorepeats, have a richer than expected variety of modes of conservation. This made evident the necessity of studying LMs and LCRs in different well-defined taxonomic contexts. - We found a higher frequency than expected of LCRs with LMs that are short tandem repeats. We suspect that while low complexity regions have a tendency to be disordered, repetitiveness in these regions might induce structure. This is a lead we will follow up.
Publications
-
Automated selection of homologs to track the evolutionary history of proteins. BMC Bioinformatics. 2018 Nov 19;19(1):431
Mier P, Pérez-Pulido AJ, Andrade-Navarro MA
-
Repeatability in protein sequences. J Struct Biol. 2019 Aug 10. pii: S1047-8477(19)30173-X
Kamel M, Mier P, Tari A, Andrade-Navarro MA
-
Assessing the low complexity of protein sequences via the low complexity triangle. PLoS One. 2020 Dec 30;15(12):e0239154
Mier P, Andrade-Navarro MA
-
Disentangling the complexity of low complexity proteins. Brief Bioinform. 2020 Mar 23;21(2):458-472
Mier P, Paladin L, Tamana S, Petrosian S, Hajdu-Soltész B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernadó P, Gáspári Z, Ouzounis CA, Promponas VJ, Kajava AV, Hancock JM, Tosatto SCE, Dosztanyi Z, Andrade-Navarro MA
-
MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments. Evol Bioinform Online. 2020 Apr 29;16:1176934320916199
Mier P, Andrade-Navarro MA.
-
The 18S ribosomal RNA m6A methyltransferase Mettl5 is required for normal walking behavior in Drosophila. EMBO Rep. 2020;21(7):e49443
Leismann J, Spagnuolo M, Pradhan M, Wacheul L, Vu MA, Musheev M, Mier P, Andrade- Navarro MA, Graille M, Niehrs C, Lafontaine DL, Roignant JY
-
The features of polyglutamine regions depend on their evolutionary stability. BMC Evol Biol. 2020 May 24;20(1):59
Mier P, Andrade-Navarro MA
-
The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J. 2020 Feb 4;18:306-313
Mier P, Elena-Real C, Urbanek A, Bernadó P, Andrade-Navarro MA
-
Avoided motifs: short amino acid strings missing from protein datasets. Biol Chem. 2021 Mar 2
Mier P, Andrade-Navarro MA
-
The Role of Low Complexity Regions in Protein Interaction Modes: An Illustration in Huntingtin. Int J Mol Sci. 2021 Feb 9;22(4):1727
Kastano K, Mier P, Andrade-Navarro MA