PICNIC准确地预测了凝结形成的蛋白质，而不管它们在生物体中的结构紊乱

IF 15.7 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Nature Communications Pub Date : 2024-12-11 DOI:10.1038/s41467-024-55089-x

Anna Hadarovich, Hari Raj Singh, Soumyadeep Ghosh, Maxim Scheremetjew, Nadia Rostam, Anthony A. Hyman, Agnes Toth-Petroczy

{"title":"PICNIC准确地预测了凝结形成的蛋白质，而不管它们在生物体中的结构紊乱","authors":"Anna Hadarovich, Hari Raj Singh, Soumyadeep Ghosh, Maxim Scheremetjew, Nadia Rostam, Anthony A. Hyman, Agnes Toth-Petroczy","doi":"10.1038/s41467-024-55089-x","DOIUrl":null,"url":null,"abstract":"Biomolecular condensates are membraneless organelles that can concentrate hundreds of different proteins in cells to operate essential biological functions. However, accurate identification of their components remains challenging and biased towards proteins with high structural disorder content with focus on self-phase separating (driver) proteins. Here, we present a machine learning algorithm, PICNIC (Proteins Involved in CoNdensates In Cells) to classify proteins that localize to biomolecular condensates regardless of their role in condensate formation. PICNIC successfully predicts condensate members by learning amino acid patterns in the protein sequence and structure in addition to the intrinsic disorder. Extensive experimental validation of 24 positive predictions in cellulo shows an overall ~82% accuracy regardless of the structural disorder content of the tested proteins. While increasing disorder content is associated with organismal complexity, our analysis of 26 species reveals no correlation between predicted condensate proteome content and disorder content across organisms. Overall, we present a machine learning classifier to interrogate condensate components at whole-proteome levels across the tree of life.","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"21 1","pages":""},"PeriodicalIF":15.7000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PICNIC accurately predicts condensate-forming proteins regardless of their structural disorder across organisms\",\"authors\":\"Anna Hadarovich, Hari Raj Singh, Soumyadeep Ghosh, Maxim Scheremetjew, Nadia Rostam, Anthony A. Hyman, Agnes Toth-Petroczy\",\"doi\":\"10.1038/s41467-024-55089-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Biomolecular condensates are membraneless organelles that can concentrate hundreds of different proteins in cells to operate essential biological functions. However, accurate identification of their components remains challenging and biased towards proteins with high structural disorder content with focus on self-phase separating (driver) proteins. Here, we present a machine learning algorithm, PICNIC (Proteins Involved in CoNdensates In Cells) to classify proteins that localize to biomolecular condensates regardless of their role in condensate formation. PICNIC successfully predicts condensate members by learning amino acid patterns in the protein sequence and structure in addition to the intrinsic disorder. Extensive experimental validation of 24 positive predictions in cellulo shows an overall ~82% accuracy regardless of the structural disorder content of the tested proteins. While increasing disorder content is associated with organismal complexity, our analysis of 26 species reveals no correlation between predicted condensate proteome content and disorder content across organisms. Overall, we present a machine learning classifier to interrogate condensate components at whole-proteome levels across the tree of life.\",\"PeriodicalId\":19066,\"journal\":{\"name\":\"Nature Communications\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":15.7000,\"publicationDate\":\"2024-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Communications\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41467-024-55089-x\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-024-55089-x","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

生物分子凝聚物是一种无膜细胞器，它可以在细胞中浓缩数百种不同的蛋白质来实现基本的生物功能。然而，其成分的准确鉴定仍然具有挑战性，并且偏向于结构紊乱含量高的蛋白质，重点是自相分离（驱动）蛋白质。在这里，我们提出了一种机器学习算法，PICNIC（细胞中涉及冷凝物的蛋白质），用于对定位于生物分子冷凝物的蛋白质进行分类，而不管它们在冷凝物形成中的作用。PICNIC通过学习蛋白质序列和结构中的氨基酸模式以及内在无序性，成功地预测了冷凝水成员。在cellulo中对24个阳性预测进行了广泛的实验验证，结果表明，无论被测蛋白质的结构紊乱含量如何，总体准确率约为82%。虽然无序含量的增加与生物体的复杂性有关，但我们对26个物种的分析显示，在整个生物体中，预测的凝析蛋白组含量与无序含量之间没有相关性。总的来说，我们提出了一种机器学习分类器，可以在整个生命树的整个蛋白质组水平上查询凝析成分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

PICNIC accurately predicts condensate-forming proteins regardless of their structural disorder across organisms

查看原文本刊更多论文

PICNIC accurately predicts condensate-forming proteins regardless of their structural disorder across organisms

Biomolecular condensates are membraneless organelles that can concentrate hundreds of different proteins in cells to operate essential biological functions. However, accurate identification of their components remains challenging and biased towards proteins with high structural disorder content with focus on self-phase separating (driver) proteins. Here, we present a machine learning algorithm, PICNIC (Proteins Involved in CoNdensates In Cells) to classify proteins that localize to biomolecular condensates regardless of their role in condensate formation. PICNIC successfully predicts condensate members by learning amino acid patterns in the protein sequence and structure in addition to the intrinsic disorder. Extensive experimental validation of 24 positive predictions in cellulo shows an overall ~82% accuracy regardless of the structural disorder content of the tested proteins. While increasing disorder content is associated with organismal complexity, our analysis of 26 species reveals no correlation between predicted condensate proteome content and disorder content across organisms. Overall, we present a machine learning classifier to interrogate condensate components at whole-proteome levels across the tree of life.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Communications Biological Science Disciplines-

CiteScore

24.90

自引率

2.40%

发文量

6928

审稿时长

3.7 months

期刊介绍： Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.