基于预训练大型语言模型的人类蛋白质本质综合预测与分析。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Boming Kang, Rui Fan, Chunmei Cui, Qinghua Cui
{"title":"基于预训练大型语言模型的人类蛋白质本质综合预测与分析。","authors":"Boming Kang, Rui Fan, Chunmei Cui, Qinghua Cui","doi":"10.1038/s43588-024-00733-1","DOIUrl":null,"url":null,"abstract":"<p><p>Human essential proteins (HEPs) are indispensable for individual viability and development. However, experimental methods to identify HEPs are often costly, time consuming and labor intensive. In addition, existing computational methods predict HEPs only at the cell line level, but HEPs vary across living human, cell line and animal models. Here we develop a sequence-based deep learning model, Protein Importance Calculator (PIC), by fine-tuning a pretrained protein language model. PIC not only substantially outperforms existing methods for predicting HEPs but also provides comprehensive prediction results across three levels: human, cell line and mouse. Furthermore, we define the protein essential score, derived from PIC, to quantify human protein essentiality and validate its effectiveness by a series of biological analyses. We also demonstrate the biomedical value of the protein essential score by identifying potential prognostic biomarkers for breast cancer and quantifying the essentiality of 617,462 human microproteins.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":12.0000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensive prediction and analysis of human protein essentiality based on a pretrained large language model.\",\"authors\":\"Boming Kang, Rui Fan, Chunmei Cui, Qinghua Cui\",\"doi\":\"10.1038/s43588-024-00733-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Human essential proteins (HEPs) are indispensable for individual viability and development. However, experimental methods to identify HEPs are often costly, time consuming and labor intensive. In addition, existing computational methods predict HEPs only at the cell line level, but HEPs vary across living human, cell line and animal models. Here we develop a sequence-based deep learning model, Protein Importance Calculator (PIC), by fine-tuning a pretrained protein language model. PIC not only substantially outperforms existing methods for predicting HEPs but also provides comprehensive prediction results across three levels: human, cell line and mouse. Furthermore, we define the protein essential score, derived from PIC, to quantify human protein essentiality and validate its effectiveness by a series of biological analyses. We also demonstrate the biomedical value of the protein essential score by identifying potential prognostic biomarkers for breast cancer and quantifying the essentiality of 617,462 human microproteins.</p>\",\"PeriodicalId\":74246,\"journal\":{\"name\":\"Nature computational science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":12.0000,\"publicationDate\":\"2024-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature computational science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1038/s43588-024-00733-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43588-024-00733-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

人类必需蛋白(HEPs)是个体存活和发育不可或缺的物质。然而,鉴定 HEPs 的实验方法往往成本高昂、耗时长且劳动强度大。此外,现有的计算方法只能在细胞系水平预测 HEPs,但不同的活人、细胞系和动物模型的 HEPs 都不尽相同。在这里,我们通过微调预训练的蛋白质语言模型,开发了一种基于序列的深度学习模型--蛋白质重要性计算器(PIC)。PIC 不仅大大优于现有的 HEPs 预测方法,还能提供跨越人类、细胞系和小鼠三个层次的综合预测结果。此外,我们还定义了由 PIC 得出的蛋白质本质分数,用于量化人类蛋白质的本质,并通过一系列生物学分析验证了其有效性。我们还通过确定潜在的乳腺癌预后生物标记物和量化 617,462 个人类微蛋白的本质,证明了蛋白质本质分数的生物医学价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comprehensive prediction and analysis of human protein essentiality based on a pretrained large language model.

Human essential proteins (HEPs) are indispensable for individual viability and development. However, experimental methods to identify HEPs are often costly, time consuming and labor intensive. In addition, existing computational methods predict HEPs only at the cell line level, but HEPs vary across living human, cell line and animal models. Here we develop a sequence-based deep learning model, Protein Importance Calculator (PIC), by fine-tuning a pretrained protein language model. PIC not only substantially outperforms existing methods for predicting HEPs but also provides comprehensive prediction results across three levels: human, cell line and mouse. Furthermore, we define the protein essential score, derived from PIC, to quantify human protein essentiality and validate its effectiveness by a series of biological analyses. We also demonstrate the biomedical value of the protein essential score by identifying potential prognostic biomarkers for breast cancer and quantifying the essentiality of 617,462 human microproteins.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
11.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信