蛋白质内在紊乱的关键评估第3轮-在蛋白质语言模型时代预测紊乱。

IF 2.8 4区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Proteins-Structure Function and Bioinformatics Pub Date : 2025-08-26 DOI:10.1002/prot.70045

Mahta Mehdiabadi, Alessio Del Conte, Maria Victoria Nugnes, Maria Cristina Aspromonte, Silvio C E Tosatto, Damiano Piovesan

{"title":"蛋白质内在紊乱的关键评估第3轮-在蛋白质语言模型时代预测紊乱。","authors":"Mahta Mehdiabadi, Alessio Del Conte, Maria Victoria Nugnes, Maria Cristina Aspromonte, Silvio C E Tosatto, Damiano Piovesan","doi":"10.1002/prot.70045","DOIUrl":null,"url":null,"abstract":"Intrinsic disorder (ID) in proteins is a complex phenomenon, encompassing a continuum from entirely disordered regions to structured domains with flexible segments. The absence of a ground truth for all forms of disorder, combined with the possibility of structural transitions between ordered and disordered states under specific conditions, makes accurate prediction of ID especially challenging. The Critical Assessment of Protein Intrinsic Disorder (CAID) evaluates ID prediction methods using diverse benchmarks derived from DisProt, a manually curated database of experimentally validated annotations. This paper presents findings from the third (CAID3), in which 24 new methods were assessed along with the predictors from previous rounds. Compared to CAID2, the top-performing methods in CAID3 demonstrated significant gains in average precision: over 31% improvement in predicting linker regions, and 15% in disorder prediction. This round introduces a new binding sub-challenge focused on identifying binding regions within known IDR boundaries. The results indicate that this task remains challenging, highlighting the potential for improvement. The top-performing methods in CAID3 are mostly new and commonly used embeddings from protein language models (pLMs), underscoring the growing impact of pLMs in tackling the complexities of disordered proteins and advancing ID prediction.","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Critical Assessment of Protein Intrinsic Disorder Round 3 - Predicting Disorder in the Era of Protein Language Models.\",\"authors\":\"Mahta Mehdiabadi, Alessio Del Conte, Maria Victoria Nugnes, Maria Cristina Aspromonte, Silvio C E Tosatto, Damiano Piovesan\",\"doi\":\"10.1002/prot.70045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intrinsic disorder (ID) in proteins is a complex phenomenon, encompassing a continuum from entirely disordered regions to structured domains with flexible segments. The absence of a ground truth for all forms of disorder, combined with the possibility of structural transitions between ordered and disordered states under specific conditions, makes accurate prediction of ID especially challenging. The Critical Assessment of Protein Intrinsic Disorder (CAID) evaluates ID prediction methods using diverse benchmarks derived from DisProt, a manually curated database of experimentally validated annotations. This paper presents findings from the third (CAID3), in which 24 new methods were assessed along with the predictors from previous rounds. Compared to CAID2, the top-performing methods in CAID3 demonstrated significant gains in average precision: over 31% improvement in predicting linker regions, and 15% in disorder prediction. This round introduces a new binding sub-challenge focused on identifying binding regions within known IDR boundaries. The results indicate that this task remains challenging, highlighting the potential for improvement. The top-performing methods in CAID3 are mostly new and commonly used embeddings from protein language models (pLMs), underscoring the growing impact of pLMs in tackling the complexities of disordered proteins and advancing ID prediction.\",\"PeriodicalId\":56271,\"journal\":{\"name\":\"Proteins-Structure Function and Bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteins-Structure Function and Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/prot.70045\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.70045","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质的内在紊乱（ID）是一个复杂的现象，包括一个连续体，从完全无序的区域到具有柔性片段的结构域。所有形式的无序都缺乏一个基本真理，再加上在特定条件下有序状态和无序状态之间存在结构转换的可能性，使得对ID的准确预测尤其具有挑战性。蛋白质内在失调的关键评估（CAID）使用来自DisProt的各种基准来评估ID预测方法，DisProt是一个人工策划的实验验证注释数据库。本文介绍了第三轮（CAID3）的研究结果，其中评估了24种新方法以及前几轮的预测因子。与CAID2相比，CAID3中表现最好的方法在平均精度上有显著提高：在预测连接区域方面提高了31%以上，在疾病预测方面提高了15%。这一轮引入了一个新的绑定子挑战，重点是识别已知IDR边界内的绑定区域。结果表明，这项任务仍然具有挑战性，突出了改进的潜力。在CAID3中表现最好的方法大多是来自蛋白质语言模型（pLMs）的新的和常用的嵌入方法，强调了pLMs在解决无序蛋白质复杂性和推进ID预测方面日益增长的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Critical Assessment of Protein Intrinsic Disorder Round 3 - Predicting Disorder in the Era of Protein Language Models.

Intrinsic disorder (ID) in proteins is a complex phenomenon, encompassing a continuum from entirely disordered regions to structured domains with flexible segments. The absence of a ground truth for all forms of disorder, combined with the possibility of structural transitions between ordered and disordered states under specific conditions, makes accurate prediction of ID especially challenging. The Critical Assessment of Protein Intrinsic Disorder (CAID) evaluates ID prediction methods using diverse benchmarks derived from DisProt, a manually curated database of experimentally validated annotations. This paper presents findings from the third (CAID3), in which 24 new methods were assessed along with the predictors from previous rounds. Compared to CAID2, the top-performing methods in CAID3 demonstrated significant gains in average precision: over 31% improvement in predicting linker regions, and 15% in disorder prediction. This round introduces a new binding sub-challenge focused on identifying binding regions within known IDR boundaries. The results indicate that this task remains challenging, highlighting the potential for improvement. The top-performing methods in CAID3 are mostly new and commonly used embeddings from protein language models (pLMs), underscoring the growing impact of pLMs in tackling the complexities of disordered proteins and advancing ID prediction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proteins-Structure Function and Bioinformatics 生物-生化与分子生物学

CiteScore

5.90

自引率

3.40%

发文量

172

审稿时长

3 months

期刊介绍： PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.