{"title":"基于预训练大型语言模型的人类蛋白质本质综合预测与分析。","authors":"Boming Kang, Rui Fan, Chunmei Cui, Qinghua Cui","doi":"10.1038/s43588-024-00733-1","DOIUrl":null,"url":null,"abstract":"<p><p>Human essential proteins (HEPs) are indispensable for individual viability and development. However, experimental methods to identify HEPs are often costly, time consuming and labor intensive. In addition, existing computational methods predict HEPs only at the cell line level, but HEPs vary across living human, cell line and animal models. Here we develop a sequence-based deep learning model, Protein Importance Calculator (PIC), by fine-tuning a pretrained protein language model. PIC not only substantially outperforms existing methods for predicting HEPs but also provides comprehensive prediction results across three levels: human, cell line and mouse. Furthermore, we define the protein essential score, derived from PIC, to quantify human protein essentiality and validate its effectiveness by a series of biological analyses. We also demonstrate the biomedical value of the protein essential score by identifying potential prognostic biomarkers for breast cancer and quantifying the essentiality of 617,462 human microproteins.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":12.0000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensive prediction and analysis of human protein essentiality based on a pretrained large language model.\",\"authors\":\"Boming Kang, Rui Fan, Chunmei Cui, Qinghua Cui\",\"doi\":\"10.1038/s43588-024-00733-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Human essential proteins (HEPs) are indispensable for individual viability and development. However, experimental methods to identify HEPs are often costly, time consuming and labor intensive. In addition, existing computational methods predict HEPs only at the cell line level, but HEPs vary across living human, cell line and animal models. Here we develop a sequence-based deep learning model, Protein Importance Calculator (PIC), by fine-tuning a pretrained protein language model. PIC not only substantially outperforms existing methods for predicting HEPs but also provides comprehensive prediction results across three levels: human, cell line and mouse. Furthermore, we define the protein essential score, derived from PIC, to quantify human protein essentiality and validate its effectiveness by a series of biological analyses. We also demonstrate the biomedical value of the protein essential score by identifying potential prognostic biomarkers for breast cancer and quantifying the essentiality of 617,462 human microproteins.</p>\",\"PeriodicalId\":74246,\"journal\":{\"name\":\"Nature computational science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":12.0000,\"publicationDate\":\"2024-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature computational science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1038/s43588-024-00733-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43588-024-00733-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Comprehensive prediction and analysis of human protein essentiality based on a pretrained large language model.
Human essential proteins (HEPs) are indispensable for individual viability and development. However, experimental methods to identify HEPs are often costly, time consuming and labor intensive. In addition, existing computational methods predict HEPs only at the cell line level, but HEPs vary across living human, cell line and animal models. Here we develop a sequence-based deep learning model, Protein Importance Calculator (PIC), by fine-tuning a pretrained protein language model. PIC not only substantially outperforms existing methods for predicting HEPs but also provides comprehensive prediction results across three levels: human, cell line and mouse. Furthermore, we define the protein essential score, derived from PIC, to quantify human protein essentiality and validate its effectiveness by a series of biological analyses. We also demonstrate the biomedical value of the protein essential score by identifying potential prognostic biomarkers for breast cancer and quantifying the essentiality of 617,462 human microproteins.