{"title":"<i>Patterny</i>: A Troupe of Decipherment Helpers for Intrinsic Disorder, Low Complexity and Compositional Bias in Proteins.","authors":"Paul M Harrison","doi":"10.3390/biom15091332","DOIUrl":null,"url":null,"abstract":"<p><p>Intrinsically disordered regions (IDRs) are sometimes considered parts of the 'dark proteomes', i.e., protein parts that have been largely under-appreciated, as are the overlapping phenomena of low-complexity or compositionally biased regions (LCRs/CBRs). Experimentalists and computationalists alike are still learning how to decrypt the functionally meaningful features of such regions. Here, I report the creation of the support troupe <b><i>Patterny</i></b> to aid such protein cryptanalysis. The current troupe members are named <i>Blocky</i>, <i>Bandy</i>, <i>Moduley</i>, <i>Repeaty</i>, and <i>Runny</i>. To discern important features, protein regions are compared to ideal assortments wherein everything is sampled proportionally and dispersed randomly. <i>Blocky</i> discerns the segregation of amino-acids by type, and scores them for it. <i>Bandy</i> is focused on picking out compositional bands and calculating their evenness. <i>Moduley</i> labels the boundaries of optimized compositional modules ('CModules') and other possible boundary sets for compositionally biased regions. <i>Repeaty</i> concisely summarizes repetitiveness using an information entropy of amino-acid interval diversity. <i>Runny</i> enumerates homopeptide content and assesses its significance. Both original whole sequences and CModules from <i>Moduley</i>, are fed into the other <b><i>Patterny</i></b> members. <b><i>Patterny</i></b> is applied to some illustrative sample data from yeast proteome and the DISPROT database. It is available at Github, and might aid those aiming to intensify light-shedding and hypothesis generation for protein regions with function encoded in a distributed manner, such as IDRs and LCRs/CBRs more generally.</p>","PeriodicalId":8943,"journal":{"name":"Biomolecules","volume":"15 9","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467476/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecules","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biom15091332","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Intrinsically disordered regions (IDRs) are sometimes considered parts of the 'dark proteomes', i.e., protein parts that have been largely under-appreciated, as are the overlapping phenomena of low-complexity or compositionally biased regions (LCRs/CBRs). Experimentalists and computationalists alike are still learning how to decrypt the functionally meaningful features of such regions. Here, I report the creation of the support troupe Patterny to aid such protein cryptanalysis. The current troupe members are named Blocky, Bandy, Moduley, Repeaty, and Runny. To discern important features, protein regions are compared to ideal assortments wherein everything is sampled proportionally and dispersed randomly. Blocky discerns the segregation of amino-acids by type, and scores them for it. Bandy is focused on picking out compositional bands and calculating their evenness. Moduley labels the boundaries of optimized compositional modules ('CModules') and other possible boundary sets for compositionally biased regions. Repeaty concisely summarizes repetitiveness using an information entropy of amino-acid interval diversity. Runny enumerates homopeptide content and assesses its significance. Both original whole sequences and CModules from Moduley, are fed into the other Patterny members. Patterny is applied to some illustrative sample data from yeast proteome and the DISPROT database. It is available at Github, and might aid those aiming to intensify light-shedding and hypothesis generation for protein regions with function encoded in a distributed manner, such as IDRs and LCRs/CBRs more generally.
BiomoleculesBiochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
9.40
自引率
3.60%
发文量
1640
审稿时长
18.28 days
期刊介绍:
Biomolecules (ISSN 2218-273X) is an international, peer-reviewed open access journal focusing on biogenic substances and their biological functions, structures, interactions with other molecules, and their microenvironment as well as biological systems. Biomolecules publishes reviews, regular research papers and short communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.