AngularQA: Protein Model Quality Assessment with LSTM Networks

Q2 Mathematics
Matthew Conover, Max Staples, Dong Si, Miao Sun, Renzhi Cao
{"title":"AngularQA: Protein Model Quality Assessment with LSTM Networks","authors":"Matthew Conover, Max Staples, Dong Si, Miao Sun, Renzhi Cao","doi":"10.1101/560995","DOIUrl":null,"url":null,"abstract":"Abstract Quality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA","PeriodicalId":34018,"journal":{"name":"Computational and Mathematical Biophysics","volume":"7 1","pages":"1 - 9"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and Mathematical Biophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/560995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 33

Abstract

Abstract Quality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA
基于LSTM网络的蛋白质模型质量评估
质量评价(QA)在蛋白质结构预测中起着重要作用。传统的多模型质量保证方法在进行预测时往往需要搜索数据库或与其他模型进行比较,当质量较差的模型在模型池中占主导地位时,往往会失败。我们提出了一种新的蛋白质单模型质量保证方法,该方法基于一种新的表示,将原始原子信息转换为一系列具有侧链信息的碳α (Cα)原子,这些信息由它们的二面角和与先前残基的键长定义。LSTM网络通过将每个氨基酸作为一个时间步长来预测质量,并考虑LSTM细胞返回的最终值。据我们所知,这是第一次有人尝试在QA问题上使用LSTM模型;此外,我们使用了一种新的尚未研究过的QA表示。除了角度,我们还利用了序列属性,如在每个时间步从蛋白质结构中解析的二级结构,而不使用任何数据库,这与所有现有的QA方法不同。我们的模型在CASP12测试数据集上的总体相关性为0.651。我们的实验为质量保证问题指明了新的方向,我们的方法可以广泛应用于蛋白质结构预测问题。该软件可以在GitHub上免费获得:https://github.com/caorenzhi/AngularQA
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computational and Mathematical Biophysics
Computational and Mathematical Biophysics Mathematics-Mathematical Physics
CiteScore
2.50
自引率
0.00%
发文量
8
审稿时长
30 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信