PANDA-3D: protein function prediction based on AlphaFold models.

IF 4 Q1 GENETICS & HEREDITY
NAR Genomics and Bioinformatics Pub Date : 2024-08-06 eCollection Date: 2024-09-01 DOI:10.1093/nargab/lqae094
Chenguang Zhao, Tong Liu, Zheng Wang
{"title":"PANDA-3D: protein function prediction based on AlphaFold models.","authors":"Chenguang Zhao, Tong Liu, Zheng Wang","doi":"10.1093/nargab/lqae094","DOIUrl":null,"url":null,"abstract":"<p><p>Previous protein function predictors primarily make predictions from amino acid sequences instead of tertiary structures because of the limited number of experimentally determined structures and the unsatisfying qualities of predicted structures. AlphaFold recently achieved promising performances when predicting protein tertiary structures, and the AlphaFold protein structure database (AlphaFold DB) is fast-expanding. Therefore, we aimed to develop a deep-learning tool that is specifically trained with AlphaFold models and predict GO terms from AlphaFold models. We developed an advanced learning architecture by combining geometric vector perceptron graph neural networks and variant transformer decoder layers for multi-label classification. PANDA-3D predicts gene ontology (GO) terms from the predicted structures of AlphaFold and the embeddings of amino acid sequences based on a large language model. Our method significantly outperformed a state-of-the-art deep-learning method that was trained with experimentally determined tertiary structures, and either outperformed or was comparable with several other language-model-based state-of-the-art methods with amino acid sequences as input. PANDA-3D is tailored to AlphaFold models, and the AlphaFold DB currently contains over 200 million predicted protein structures (as of May 1st, 2023), making PANDA-3D a useful tool that can accurately annotate the functions of a large number of proteins. PANDA-3D can be freely accessed as a web server from http://dna.cs.miami.edu/PANDA-3D/ and as a repository from https://github.com/zwang-bioinformatics/PANDA-3D.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae094"},"PeriodicalIF":4.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11302463/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqae094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Previous protein function predictors primarily make predictions from amino acid sequences instead of tertiary structures because of the limited number of experimentally determined structures and the unsatisfying qualities of predicted structures. AlphaFold recently achieved promising performances when predicting protein tertiary structures, and the AlphaFold protein structure database (AlphaFold DB) is fast-expanding. Therefore, we aimed to develop a deep-learning tool that is specifically trained with AlphaFold models and predict GO terms from AlphaFold models. We developed an advanced learning architecture by combining geometric vector perceptron graph neural networks and variant transformer decoder layers for multi-label classification. PANDA-3D predicts gene ontology (GO) terms from the predicted structures of AlphaFold and the embeddings of amino acid sequences based on a large language model. Our method significantly outperformed a state-of-the-art deep-learning method that was trained with experimentally determined tertiary structures, and either outperformed or was comparable with several other language-model-based state-of-the-art methods with amino acid sequences as input. PANDA-3D is tailored to AlphaFold models, and the AlphaFold DB currently contains over 200 million predicted protein structures (as of May 1st, 2023), making PANDA-3D a useful tool that can accurately annotate the functions of a large number of proteins. PANDA-3D can be freely accessed as a web server from http://dna.cs.miami.edu/PANDA-3D/ and as a repository from https://github.com/zwang-bioinformatics/PANDA-3D.

PANDA-3D:基于 AlphaFold 模型的蛋白质功能预测。
由于实验确定的结构数量有限,而且预测结构的质量不能令人满意,以往的蛋白质功能预测工具主要根据氨基酸序列而不是三级结构进行预测。最近,AlphaFold 在预测蛋白质三级结构时取得了可喜的成绩,而且 AlphaFold 蛋白结构数据库(AlphaFold DB)正在快速扩展。因此,我们的目标是开发一种深度学习工具,专门使用 AlphaFold 模型进行训练,并根据 AlphaFold 模型预测 GO 术语。我们开发了一种先进的学习架构,将几何向量感知器图神经网络和变异变换解码器层结合起来,用于多标签分类。PANDA-3D 可根据 AlphaFold 预测的结构和基于大型语言模型的氨基酸序列嵌入预测基因本体(GO)术语。我们的方法明显优于使用实验确定的三级结构训练的最先进的深度学习方法,并且优于或可与使用氨基酸序列作为输入的其他几种基于语言模型的最先进方法相媲美。PANDA-3D 是为 AlphaFold 模型量身定制的,而 AlphaFold 数据库目前包含超过 2 亿个预测的蛋白质结构(截至 2023 年 5 月 1 日),这使得 PANDA-3D 成为可以准确注释大量蛋白质功能的有用工具。PANDA-3D 可作为网络服务器从 http://dna.cs.miami.edu/PANDA-3D/ 免费访问,也可作为资源库从 https://github.com/zwang-bioinformatics/PANDA-3D 免费访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
2.20%
发文量
95
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信