Short-length peptides contact map prediction using Convolution Neural Networks

Artem Maminov
{"title":"Short-length peptides contact map prediction using Convolution Neural Networks","authors":"Artem Maminov","doi":"10.22323/1.429.0016","DOIUrl":null,"url":null,"abstract":"In this article, it is considered an approach for predicting the contact matrix (contact map) for short-length peptides. Contact matrix is two-dimensional representation of the protein. It can be used for tertiary structure reconstruction or for starting approximation in energy minimization models. For this work, peptides with a chain length from 15 up to 30 were chosen to test the model and simplify the calculations. Convolutional neural networks (CNNs) were used as a prediction tool according to the fact that the feature space of each peptide is presented as a two-dimensional matrix. SCRATCH tool was used to generate the secondary structure, solvent accessibility, and profile matrix (PSSM) for each peptide. CNN was implemented in the Python programming language using the Keras library. To work with the common PDB-format, which presents the structure information of proteins, the BioPython module was used. As a result, training, validation and test samples were generated, the multilayer multi-output convolutional neural network was constructed, which was trained and validated. The experiments were conducted on a test sample to predict the contact matrix and compare it with native one. To assess the quality of prediction, conjunction matrices for the threshold of 8 and 12 (cid:164) 𝐴 were formed, the metrics F1-score, recall and precision were calculated. According to F1-score, we can observe, that even with small neural network we can acheve quite good results. At the final step FT-COMAR tool was used to reconstruct tertiary structure of the proteins from its contact matrix. The results shows, that for reconstructed structures from 12 threshhold contact matrix, RMSD metric is better. ***","PeriodicalId":262901,"journal":{"name":"Proceedings of The 6th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2022)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of The 6th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22323/1.429.0016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this article, it is considered an approach for predicting the contact matrix (contact map) for short-length peptides. Contact matrix is two-dimensional representation of the protein. It can be used for tertiary structure reconstruction or for starting approximation in energy minimization models. For this work, peptides with a chain length from 15 up to 30 were chosen to test the model and simplify the calculations. Convolutional neural networks (CNNs) were used as a prediction tool according to the fact that the feature space of each peptide is presented as a two-dimensional matrix. SCRATCH tool was used to generate the secondary structure, solvent accessibility, and profile matrix (PSSM) for each peptide. CNN was implemented in the Python programming language using the Keras library. To work with the common PDB-format, which presents the structure information of proteins, the BioPython module was used. As a result, training, validation and test samples were generated, the multilayer multi-output convolutional neural network was constructed, which was trained and validated. The experiments were conducted on a test sample to predict the contact matrix and compare it with native one. To assess the quality of prediction, conjunction matrices for the threshold of 8 and 12 (cid:164) 𝐴 were formed, the metrics F1-score, recall and precision were calculated. According to F1-score, we can observe, that even with small neural network we can acheve quite good results. At the final step FT-COMAR tool was used to reconstruct tertiary structure of the proteins from its contact matrix. The results shows, that for reconstructed structures from 12 threshhold contact matrix, RMSD metric is better. ***
基于卷积神经网络的短肽接触图谱预测
在这篇文章中,它被认为是预测短长度肽的接触矩阵(接触图)的一种方法。接触矩阵是蛋白质的二维表示。它可以用于三级结构重建或能量最小化模型的起始近似。在这项工作中,选择链长从15到30的肽来测试模型并简化计算。利用卷积神经网络(Convolutional neural networks, cnn)作为预测工具,将每个肽的特征空间以二维矩阵的形式呈现。使用SCRATCH工具生成每个肽的二级结构、溶剂可及性和谱矩阵(PSSM)。CNN是用Python编程语言使用Keras库实现的。为了使用常见的pdb格式(表示蛋白质的结构信息),使用了bioppython模块。生成训练样本、验证样本和测试样本,构建多层多输出卷积神经网络,并对其进行训练和验证。在试验样品上进行了接触矩阵预测实验,并与原始接触矩阵进行了比较。为了评估预测质量,形成了8和12 (cid:164)的阈值的关联矩阵,并计算了指标f1得分、召回率和精度。根据F1-score,我们可以观察到,即使使用较小的神经网络,我们也可以取得相当好的结果。最后一步,利用FT-COMAR工具从蛋白质的接触矩阵中重建蛋白质的三级结构。结果表明,对于由12个阈值接触矩阵重构的结构,RMSD度量效果较好。***
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信