一种基于人工智能的识别调节液-液相分离蛋白质的方法。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-07-02 DOI:10.1093/bib/bbaf313

Zahoor Ahmed, Kiran Shahzadi, Rui Li, Yu-Qing Jiang, Yan-Ting Jin, Muhammad Arif, Juan Feng

{"title":"一种基于人工智能的识别调节液-液相分离蛋白质的方法。","authors":"Zahoor Ahmed, Kiran Shahzadi, Rui Li, Yu-Qing Jiang, Yan-Ting Jin, Muhammad Arif, Juan Feng","doi":"10.1093/bib/bbaf313","DOIUrl":null,"url":null,"abstract":"Liquid-liquid phase separation (LLPS) is a biomolecular process that underpins the formation of membrane-less organelles within living cells. This phenomenon, along with the resulting condensate bodies, is increasingly recognized for its critical roles in various biological processes, such as ribonucleic acid (RNA) metabolism, chromatin rearrangement, and signal transduction. Notably, regulator proteins play a central role in the process of LLPS. They are essential for the formation, stabilization, and maintenance of the dynamic properties of LLPS, ensuring an appropriate phase separation response to cellular signals. Targeting these regulator proteins is the key to manipulating LLPS for applications in biotechnology, materials science, and medicine, including biomaterials, drug delivery, diagnostics, and synthetic biology. Given their importance, this study focused on an artificial intelligence-based approach to identify regulator proteins in LLPS. We constructed a dataset of 913 positive and 6584 negative protein sequences, and divided it into eight balanced training datasets and a test dataset. Semantic information from protein sequences was extracted using the ESM2_t36 pretrained protein language model, followed by training a multilayer perceptron classifier. The model achieved 0.78 accuracy on the test dataset, outperforming traditional sequence-based methods, one-hot encoding, and other pretrained embedding methods. SHapley Additive exPlanations (SHAP)-based interpretation revealed key biophysical patterns enriched in regulator proteins, including higher levels of charged and disordered residues. Our results show that deep contextual protein representations combined with neural network-based classifiers can accurately identify LLPS regulator proteins. This tool offers new opportunities for understanding condensate biology and designing synthetic phase-separating systems. All data and code are available at: https://github.com/bioplusAI/LLPS_regulators_pred.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239617/pdf/","citationCount":"0","resultStr":"{\"title\":\"An artificial intelligence-based approach for identifying the proteins regulating liquid-liquid phase separation.\",\"authors\":\"Zahoor Ahmed, Kiran Shahzadi, Rui Li, Yu-Qing Jiang, Yan-Ting Jin, Muhammad Arif, Juan Feng\",\"doi\":\"10.1093/bib/bbaf313\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Liquid-liquid phase separation (LLPS) is a biomolecular process that underpins the formation of membrane-less organelles within living cells. This phenomenon, along with the resulting condensate bodies, is increasingly recognized for its critical roles in various biological processes, such as ribonucleic acid (RNA) metabolism, chromatin rearrangement, and signal transduction. Notably, regulator proteins play a central role in the process of LLPS. They are essential for the formation, stabilization, and maintenance of the dynamic properties of LLPS, ensuring an appropriate phase separation response to cellular signals. Targeting these regulator proteins is the key to manipulating LLPS for applications in biotechnology, materials science, and medicine, including biomaterials, drug delivery, diagnostics, and synthetic biology. Given their importance, this study focused on an artificial intelligence-based approach to identify regulator proteins in LLPS. We constructed a dataset of 913 positive and 6584 negative protein sequences, and divided it into eight balanced training datasets and a test dataset. Semantic information from protein sequences was extracted using the ESM2_t36 pretrained protein language model, followed by training a multilayer perceptron classifier. The model achieved 0.78 accuracy on the test dataset, outperforming traditional sequence-based methods, one-hot encoding, and other pretrained embedding methods. SHapley Additive exPlanations (SHAP)-based interpretation revealed key biophysical patterns enriched in regulator proteins, including higher levels of charged and disordered residues. Our results show that deep contextual protein representations combined with neural network-based classifiers can accurately identify LLPS regulator proteins. This tool offers new opportunities for understanding condensate biology and designing synthetic phase-separating systems. All data and code are available at: https://github.com/bioplusAI/LLPS_regulators_pred.\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 4\",\"pages\":\"\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239617/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf313\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf313","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

液-液相分离（LLPS）是一种生物分子过程，支撑着活细胞内无膜细胞器的形成。这种现象，以及由此产生的凝聚体，越来越多地认识到其在各种生物过程中的关键作用，如核糖核酸（RNA）代谢，染色质重排和信号转导。值得注意的是，调节蛋白在LLPS过程中起着核心作用。它们对于LLPS动态特性的形成、稳定和维持至关重要，确保了对细胞信号的适当相分离响应。针对这些调节蛋白是操纵LLPS应用于生物技术、材料科学和医学的关键，包括生物材料、药物输送、诊断和合成生物学。鉴于它们的重要性，本研究侧重于基于人工智能的方法来识别LLPS中的调节蛋白。我们构建了913个阳性蛋白序列和6584个阴性蛋白序列的数据集，并将其分为8个平衡训练数据集和1个测试数据集。利用ESM2_t36预训练的蛋白质语言模型提取蛋白质序列的语义信息，然后训练多层感知器分类器。该模型在测试数据集上的准确率达到了0.78，优于传统的基于序列的方法、单热编码和其他预训练的嵌入方法。基于SHapley加性解释（SHAP）的解释揭示了调节蛋白富集的关键生物物理模式，包括更高水平的带电残基和无序残基。我们的研究结果表明，深度上下文蛋白表示结合基于神经网络的分类器可以准确地识别LLPS调节蛋白。该工具为理解凝析物生物学和设计合成相分离系统提供了新的机会。所有数据和代码可在：https://github.com/bioplusAI/LLPS_regulators_pred。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An artificial intelligence-based approach for identifying the proteins regulating liquid-liquid phase separation.

Liquid-liquid phase separation (LLPS) is a biomolecular process that underpins the formation of membrane-less organelles within living cells. This phenomenon, along with the resulting condensate bodies, is increasingly recognized for its critical roles in various biological processes, such as ribonucleic acid (RNA) metabolism, chromatin rearrangement, and signal transduction. Notably, regulator proteins play a central role in the process of LLPS. They are essential for the formation, stabilization, and maintenance of the dynamic properties of LLPS, ensuring an appropriate phase separation response to cellular signals. Targeting these regulator proteins is the key to manipulating LLPS for applications in biotechnology, materials science, and medicine, including biomaterials, drug delivery, diagnostics, and synthetic biology. Given their importance, this study focused on an artificial intelligence-based approach to identify regulator proteins in LLPS. We constructed a dataset of 913 positive and 6584 negative protein sequences, and divided it into eight balanced training datasets and a test dataset. Semantic information from protein sequences was extracted using the ESM2_t36 pretrained protein language model, followed by training a multilayer perceptron classifier. The model achieved 0.78 accuracy on the test dataset, outperforming traditional sequence-based methods, one-hot encoding, and other pretrained embedding methods. SHapley Additive exPlanations (SHAP)-based interpretation revealed key biophysical patterns enriched in regulator proteins, including higher levels of charged and disordered residues. Our results show that deep contextual protein representations combined with neural network-based classifiers can accurately identify LLPS regulator proteins. This tool offers new opportunities for understanding condensate biology and designing synthetic phase-separating systems. All data and code are available at: https://github.com/bioplusAI/LLPS_regulators_pred.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.