A Novel Prediction Method for Metal-Ion Binding Sites in Protein Sequence Based on Ensemble Learning

Chuyi Song, Jing-qing Jiang
{"title":"A Novel Prediction Method for Metal-Ion Binding Sites in Protein Sequence Based on Ensemble Learning","authors":"Chuyi Song, Jing-qing Jiang","doi":"10.1145/3579654.3579694","DOIUrl":null,"url":null,"abstract":"The identification of metal ion-binding sites is important for detecting the protein structures and understanding its biological functions. However, in Protein Data Bank (PDB) which collects the known crystal structures of proteins, only less than one percent are membrane proteins even though they play a significant role in material exchange for cells and have a close relationship in drug target design. In this work, we develop an efficient prediction method for six different types of metal ion-binding sites in membrane proteins. In order to solve the imbalance problem in the dataset, multiple random down-sampling technique is used to obtain multiple training subsets with equal number of binding residues and non-binding residues. The support vector machines (SVM) and random forest (RF) classification models are built based on these subsets and their results are combined by ensemble learning algorithm which efficiently reduce the number of false positive samples in the final prediction. On an independent testing set, our proposed method achieves the average accuracy of 0.991 and average MCC of 0.681 which outperform a recently proposed prediction method, . The superiority in performance has demonstrated that our proposed method is expected to be an accurate tool for prediction of metal ion-binding sites in membrane proteins and it should provide assistant in design of new drug targets.","PeriodicalId":146783,"journal":{"name":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579654.3579694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The identification of metal ion-binding sites is important for detecting the protein structures and understanding its biological functions. However, in Protein Data Bank (PDB) which collects the known crystal structures of proteins, only less than one percent are membrane proteins even though they play a significant role in material exchange for cells and have a close relationship in drug target design. In this work, we develop an efficient prediction method for six different types of metal ion-binding sites in membrane proteins. In order to solve the imbalance problem in the dataset, multiple random down-sampling technique is used to obtain multiple training subsets with equal number of binding residues and non-binding residues. The support vector machines (SVM) and random forest (RF) classification models are built based on these subsets and their results are combined by ensemble learning algorithm which efficiently reduce the number of false positive samples in the final prediction. On an independent testing set, our proposed method achieves the average accuracy of 0.991 and average MCC of 0.681 which outperform a recently proposed prediction method, . The superiority in performance has demonstrated that our proposed method is expected to be an accurate tool for prediction of metal ion-binding sites in membrane proteins and it should provide assistant in design of new drug targets.
基于集成学习的蛋白质序列金属离子结合位点预测新方法
金属离子结合位点的鉴定对于检测蛋白质结构和了解其生物学功能具有重要意义。然而,在收集已知蛋白质晶体结构的蛋白质数据库(Protein Data Bank, PDB)中,只有不到1%是膜蛋白,尽管它们在细胞的物质交换中起着重要作用,并且在药物靶标设计中有着密切的关系。在这项工作中,我们开发了一种有效的预测膜蛋白中六种不同类型金属离子结合位点的方法。为了解决数据集的不平衡问题,采用多重随机下采样技术,获得具有相等数目的结合残数和非结合残数的多个训练子集。基于这些子集建立支持向量机(SVM)和随机森林(RF)分类模型,并通过集成学习算法将其结果组合在一起,有效地减少了最终预测中的假阳性样本数量。在独立测试集上,本文方法的平均准确率为0.991,平均MCC为0.681,优于最近提出的预测方法。性能上的优势表明,我们的方法有望成为预测膜蛋白中金属离子结合位点的准确工具,并为新药物靶点的设计提供辅助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信