NetGO 3.0:蛋白质语言模型改进了大规模功能注释。

IF 11.5 2区 生物学 Q1 GENETICS & HEREDITY
Shaojun Wang , Ronghui You , Yunjia Liu , Yi Xiong , Shanfeng Zhu
{"title":"NetGO 3.0:蛋白质语言模型改进了大规模功能注释。","authors":"Shaojun Wang ,&nbsp;Ronghui You ,&nbsp;Yunjia Liu ,&nbsp;Yi Xiong ,&nbsp;Shanfeng Zhu","doi":"10.1016/j.gpb.2023.04.001","DOIUrl":null,"url":null,"abstract":"<div><p>As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, <strong>protein language models</strong> have been proposed to learn informative representations [<em>e.g.</em>, Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at <span>https://dmiip.sjtu.edu.cn/ng3.0</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":11.5000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations\",\"authors\":\"Shaojun Wang ,&nbsp;Ronghui You ,&nbsp;Yunjia Liu ,&nbsp;Yi Xiong ,&nbsp;Shanfeng Zhu\",\"doi\":\"10.1016/j.gpb.2023.04.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, <strong>protein language models</strong> have been proposed to learn informative representations [<em>e.g.</em>, Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at <span>https://dmiip.sjtu.edu.cn/ng3.0</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":12528,\"journal\":{\"name\":\"Genomics, Proteomics & Bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics, Proteomics & Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1672022923000669\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, Proteomics & Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1672022923000669","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 6

摘要

作为最先进的自动函数预测(AFP)方法之一,NetGO 2.0集成了多源信息以提高性能。然而,它主要利用具有实验支持的功能注释的蛋白质,而没有利用来自大量未注释蛋白质的有价值信息。最近,蛋白质语言模型被提出来从基于自我监督的蛋白质序列中学习信息表示[例如,进化尺度建模(ESM)-1b嵌入]。在这里,我们用ESM-1b表示每种蛋白质,并使用逻辑回归(LR)来训练AFP的新模型LR-ESM。实验结果表明,LR-ESM的性能与性能最好的NetGO 2.0组件相当。因此,通过将LR-ESM纳入NetGO 2.0,我们开发了NetGO 3.0,以广泛提高AFP的性能。NetGO 3.0可在https://dmiip.sjtu.edu.cn/ng3.0.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations

As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at https://dmiip.sjtu.edu.cn/ng3.0.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genomics, Proteomics & Bioinformatics
Genomics, Proteomics & Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
14.30
自引率
4.20%
发文量
844
审稿时长
61 days
期刊介绍: Genomics, Proteomics and Bioinformatics (GPB) is the official journal of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. It aims to disseminate new developments in the field of omics and bioinformatics, publish high-quality discoveries quickly, and promote open access and online publication. GPB welcomes submissions in all areas of life science, biology, and biomedicine, with a focus on large data acquisition, analysis, and curation. Manuscripts covering omics and related bioinformatics topics are particularly encouraged. GPB is indexed/abstracted by PubMed/MEDLINE, PubMed Central, Scopus, BIOSIS Previews, Chemical Abstracts, CSCD, among others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信