Patterny: A Troupe of Decipherment Helpers for Intrinsic Disorder, Low Complexity and Compositional Bias in Proteins.

IF 4.8 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Biomolecules Pub Date : 2025-09-18 DOI:10.3390/biom15091332

Paul M Harrison

{"title":"Patterny: A Troupe of Decipherment Helpers for Intrinsic Disorder, Low Complexity and Compositional Bias in Proteins.","authors":"Paul M Harrison","doi":"10.3390/biom15091332","DOIUrl":null,"url":null,"abstract":"Intrinsically disordered regions (IDRs) are sometimes considered parts of the 'dark proteomes', i.e., protein parts that have been largely under-appreciated, as are the overlapping phenomena of low-complexity or compositionally biased regions (LCRs/CBRs). Experimentalists and computationalists alike are still learning how to decrypt the functionally meaningful features of such regions. Here, I report the creation of the support troupe Patterny to aid such protein cryptanalysis. The current troupe members are named Blocky, Bandy, Moduley, Repeaty, and Runny. To discern important features, protein regions are compared to ideal assortments wherein everything is sampled proportionally and dispersed randomly. Blocky discerns the segregation of amino-acids by type, and scores them for it. Bandy is focused on picking out compositional bands and calculating their evenness. Moduley labels the boundaries of optimized compositional modules ('CModules') and other possible boundary sets for compositionally biased regions. Repeaty concisely summarizes repetitiveness using an information entropy of amino-acid interval diversity. Runny enumerates homopeptide content and assesses its significance. Both original whole sequences and CModules from Moduley, are fed into the other Patterny members. Patterny is applied to some illustrative sample data from yeast proteome and the DISPROT database. It is available at Github, and might aid those aiming to intensify light-shedding and hypothesis generation for protein regions with function encoded in a distributed manner, such as IDRs and LCRs/CBRs more generally.","PeriodicalId":8943,"journal":{"name":"Biomolecules","volume":"15 9","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467476/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecules","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biom15091332","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Intrinsically disordered regions (IDRs) are sometimes considered parts of the 'dark proteomes', i.e., protein parts that have been largely under-appreciated, as are the overlapping phenomena of low-complexity or compositionally biased regions (LCRs/CBRs). Experimentalists and computationalists alike are still learning how to decrypt the functionally meaningful features of such regions. Here, I report the creation of the support troupe Patterny to aid such protein cryptanalysis. The current troupe members are named Blocky, Bandy, Moduley, Repeaty, and Runny. To discern important features, protein regions are compared to ideal assortments wherein everything is sampled proportionally and dispersed randomly. Blocky discerns the segregation of amino-acids by type, and scores them for it. Bandy is focused on picking out compositional bands and calculating their evenness. Moduley labels the boundaries of optimized compositional modules ('CModules') and other possible boundary sets for compositionally biased regions. Repeaty concisely summarizes repetitiveness using an information entropy of amino-acid interval diversity. Runny enumerates homopeptide content and assesses its significance. Both original whole sequences and CModules from Moduley, are fed into the other Patterny members. Patterny is applied to some illustrative sample data from yeast proteome and the DISPROT database. It is available at Github, and might aid those aiming to intensify light-shedding and hypothesis generation for protein regions with function encoded in a distributed manner, such as IDRs and LCRs/CBRs more generally.

查看原文本刊更多论文

模式：蛋白质内在紊乱、低复杂性和成分偏差的一组解译助手。

内在无序区（idr）有时被认为是“暗蛋白质组”的一部分，即大部分未被充分认识的蛋白质部分，低复杂性或成分偏倚区（lcr / cbr）的重叠现象也是如此。实验学家和计算学家都还在学习如何解密这些区域的功能特征。在这里，我报告了支持剧团模式的创建，以帮助这种蛋白质密码分析。目前的剧团成员分别是布洛克、班迪、莫德利、重复性和Runny。为了辨别重要的特征，将蛋白质区域与理想的分类进行比较，其中所有的东西都按比例采样并随机分散。布洛克根据类型辨别出氨基酸的分离，并为此打分。班迪专注于挑选出成分波段并计算它们的均匀度。Moduley标记了优化的组合模块（'CModules'）的边界和组合偏置区域的其他可能的边界集。重复性用氨基酸间隔多样性的信息熵简明地概括了重复性。Runny列举同肽含量并评价其意义。原始的整个序列和来自Moduley的CModules都被输入到pattern的其他成员中。模式应用于酵母蛋白质组和DISPROT数据库的一些说明性样本数据。它可以在Github上获得，并且可能有助于那些旨在加强以分布式方式编码功能的蛋白质区域的光脱落和假设生成，例如更普遍的idr和lcr / cbr。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomolecules Biochemistry, Genetics and Molecular Biology-Molecular Biology

CiteScore

9.40

自引率

3.60%

发文量

1640

审稿时长

18.28 days

期刊介绍： Biomolecules (ISSN 2218-273X) is an international, peer-reviewed open access journal focusing on biogenic substances and their biological functions, structures, interactions with other molecules, and their microenvironment as well as biological systems. Biomolecules publishes reviews, regular research papers and short communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.