Zahoor Ahmed, Kiran Shahzadi, Rui Li, Yu-Qing Jiang, Yan-Ting Jin, Muhammad Arif, Juan Feng
{"title":"一种基于人工智能的识别调节液-液相分离蛋白质的方法。","authors":"Zahoor Ahmed, Kiran Shahzadi, Rui Li, Yu-Qing Jiang, Yan-Ting Jin, Muhammad Arif, Juan Feng","doi":"10.1093/bib/bbaf313","DOIUrl":null,"url":null,"abstract":"<p><p>Liquid-liquid phase separation (LLPS) is a biomolecular process that underpins the formation of membrane-less organelles within living cells. This phenomenon, along with the resulting condensate bodies, is increasingly recognized for its critical roles in various biological processes, such as ribonucleic acid (RNA) metabolism, chromatin rearrangement, and signal transduction. Notably, regulator proteins play a central role in the process of LLPS. They are essential for the formation, stabilization, and maintenance of the dynamic properties of LLPS, ensuring an appropriate phase separation response to cellular signals. Targeting these regulator proteins is the key to manipulating LLPS for applications in biotechnology, materials science, and medicine, including biomaterials, drug delivery, diagnostics, and synthetic biology. Given their importance, this study focused on an artificial intelligence-based approach to identify regulator proteins in LLPS. We constructed a dataset of 913 positive and 6584 negative protein sequences, and divided it into eight balanced training datasets and a test dataset. Semantic information from protein sequences was extracted using the ESM2_t36 pretrained protein language model, followed by training a multilayer perceptron classifier. The model achieved 0.78 accuracy on the test dataset, outperforming traditional sequence-based methods, one-hot encoding, and other pretrained embedding methods. SHapley Additive exPlanations (SHAP)-based interpretation revealed key biophysical patterns enriched in regulator proteins, including higher levels of charged and disordered residues. Our results show that deep contextual protein representations combined with neural network-based classifiers can accurately identify LLPS regulator proteins. This tool offers new opportunities for understanding condensate biology and designing synthetic phase-separating systems. All data and code are available at: https://github.com/bioplusAI/LLPS_regulators_pred.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239617/pdf/","citationCount":"0","resultStr":"{\"title\":\"An artificial intelligence-based approach for identifying the proteins regulating liquid-liquid phase separation.\",\"authors\":\"Zahoor Ahmed, Kiran Shahzadi, Rui Li, Yu-Qing Jiang, Yan-Ting Jin, Muhammad Arif, Juan Feng\",\"doi\":\"10.1093/bib/bbaf313\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Liquid-liquid phase separation (LLPS) is a biomolecular process that underpins the formation of membrane-less organelles within living cells. This phenomenon, along with the resulting condensate bodies, is increasingly recognized for its critical roles in various biological processes, such as ribonucleic acid (RNA) metabolism, chromatin rearrangement, and signal transduction. Notably, regulator proteins play a central role in the process of LLPS. They are essential for the formation, stabilization, and maintenance of the dynamic properties of LLPS, ensuring an appropriate phase separation response to cellular signals. Targeting these regulator proteins is the key to manipulating LLPS for applications in biotechnology, materials science, and medicine, including biomaterials, drug delivery, diagnostics, and synthetic biology. Given their importance, this study focused on an artificial intelligence-based approach to identify regulator proteins in LLPS. We constructed a dataset of 913 positive and 6584 negative protein sequences, and divided it into eight balanced training datasets and a test dataset. Semantic information from protein sequences was extracted using the ESM2_t36 pretrained protein language model, followed by training a multilayer perceptron classifier. The model achieved 0.78 accuracy on the test dataset, outperforming traditional sequence-based methods, one-hot encoding, and other pretrained embedding methods. SHapley Additive exPlanations (SHAP)-based interpretation revealed key biophysical patterns enriched in regulator proteins, including higher levels of charged and disordered residues. Our results show that deep contextual protein representations combined with neural network-based classifiers can accurately identify LLPS regulator proteins. This tool offers new opportunities for understanding condensate biology and designing synthetic phase-separating systems. All data and code are available at: https://github.com/bioplusAI/LLPS_regulators_pred.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 4\",\"pages\":\"\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239617/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf313\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf313","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
An artificial intelligence-based approach for identifying the proteins regulating liquid-liquid phase separation.
Liquid-liquid phase separation (LLPS) is a biomolecular process that underpins the formation of membrane-less organelles within living cells. This phenomenon, along with the resulting condensate bodies, is increasingly recognized for its critical roles in various biological processes, such as ribonucleic acid (RNA) metabolism, chromatin rearrangement, and signal transduction. Notably, regulator proteins play a central role in the process of LLPS. They are essential for the formation, stabilization, and maintenance of the dynamic properties of LLPS, ensuring an appropriate phase separation response to cellular signals. Targeting these regulator proteins is the key to manipulating LLPS for applications in biotechnology, materials science, and medicine, including biomaterials, drug delivery, diagnostics, and synthetic biology. Given their importance, this study focused on an artificial intelligence-based approach to identify regulator proteins in LLPS. We constructed a dataset of 913 positive and 6584 negative protein sequences, and divided it into eight balanced training datasets and a test dataset. Semantic information from protein sequences was extracted using the ESM2_t36 pretrained protein language model, followed by training a multilayer perceptron classifier. The model achieved 0.78 accuracy on the test dataset, outperforming traditional sequence-based methods, one-hot encoding, and other pretrained embedding methods. SHapley Additive exPlanations (SHAP)-based interpretation revealed key biophysical patterns enriched in regulator proteins, including higher levels of charged and disordered residues. Our results show that deep contextual protein representations combined with neural network-based classifiers can accurately identify LLPS regulator proteins. This tool offers new opportunities for understanding condensate biology and designing synthetic phase-separating systems. All data and code are available at: https://github.com/bioplusAI/LLPS_regulators_pred.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.