Intuitionistic fuzzy rough set model based on k-means and its application to enhance prediction of aptamer–protein interacting pairs

3区计算机科学 Q1 Computer Science

Journal of Ambient Intelligence and Humanized Computing Pub Date : 2024-07-23 DOI:10.1007/s12652-024-04837-4

Pankhuri Jain, Anoop Tiwari, Tanmoy Som

{"title":"Intuitionistic fuzzy rough set model based on k-means and its application to enhance prediction of aptamer–protein interacting pairs","authors":"Pankhuri Jain, Anoop Tiwari, Tanmoy Som","doi":"10.1007/s12652-024-04837-4","DOIUrl":null,"url":null,"abstract":"<p>Aptamers are very interesting peptide molecules or oligonucleic acid. They are used to bind particular target molecules. Aptamers play vital roles in various practical applications and physiological functions. Consequently, several diseases can be treated using therapies based on aptamer proteins and designing the binding of aptamers to specific proteins is essential to advance understanding into processes of interaction between aptamer-protein. Despite the wide applications of aptamers, identification of interaction between aptamer protein is always inadequate and challenging. Therefore, it is necessary to develop a computational approach for achieving good predictions of interaction between aptamer-protein. In the present study, a novel method for enhancing the prediction of interacting aptamer-target pairs based on sequence features obtained from both aptamers and their target proteins by employing a novel k-mean based intuitionistic fuzzy rough feature selection method is proposed. Firstly, an intuitionistic fuzzy rough set model based on k nearest neighbour concept is proposed. Then, a novel feature selection technique is introduced by using this model. Furthermore, non-redundant and relevant features are selected from training as well as testing datasets by using proposed feature selection technique. Secondly, SMOTE (Synthetic Minority Oversampling Technique) is applied to obtain the optimal balanced training and testing datasets. Thirdly, we apply various machine learning algorithms on optimally balanced reduced training and testing datasets to evaluate their performances. Experimental results shows that the best prediction performance is obtained by boosted random forest learning algorithm. Using a 10 fold cross-validation test, the proposed method is a good performer, with sensitivity of 91.3, 86.4, specificity of 91.9, 84.8, overall accuracy of 91.60%, 85.60%, Mathews correlation coefficient of 0.832, 0.713, AUC (area under curve) of 0.969, 0.908, and g-means of 91.5, 85.5 on optimal balanced reduced training and testing datasets consisting of aptamer-protein interacting pairs. Finally, a comparative study of the best obtained results with the existing best results is presented, which clearly indicates that our proposed approach is the best performing approach till date.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ambient Intelligence and Humanized Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12652-024-04837-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

Aptamers are very interesting peptide molecules or oligonucleic acid. They are used to bind particular target molecules. Aptamers play vital roles in various practical applications and physiological functions. Consequently, several diseases can be treated using therapies based on aptamer proteins and designing the binding of aptamers to specific proteins is essential to advance understanding into processes of interaction between aptamer-protein. Despite the wide applications of aptamers, identification of interaction between aptamer protein is always inadequate and challenging. Therefore, it is necessary to develop a computational approach for achieving good predictions of interaction between aptamer-protein. In the present study, a novel method for enhancing the prediction of interacting aptamer-target pairs based on sequence features obtained from both aptamers and their target proteins by employing a novel k-mean based intuitionistic fuzzy rough feature selection method is proposed. Firstly, an intuitionistic fuzzy rough set model based on k nearest neighbour concept is proposed. Then, a novel feature selection technique is introduced by using this model. Furthermore, non-redundant and relevant features are selected from training as well as testing datasets by using proposed feature selection technique. Secondly, SMOTE (Synthetic Minority Oversampling Technique) is applied to obtain the optimal balanced training and testing datasets. Thirdly, we apply various machine learning algorithms on optimally balanced reduced training and testing datasets to evaluate their performances. Experimental results shows that the best prediction performance is obtained by boosted random forest learning algorithm. Using a 10 fold cross-validation test, the proposed method is a good performer, with sensitivity of 91.3, 86.4, specificity of 91.9, 84.8, overall accuracy of 91.60%, 85.60%, Mathews correlation coefficient of 0.832, 0.713, AUC (area under curve) of 0.969, 0.908, and g-means of 91.5, 85.5 on optimal balanced reduced training and testing datasets consisting of aptamer-protein interacting pairs. Finally, a comparative study of the best obtained results with the existing best results is presented, which clearly indicates that our proposed approach is the best performing approach till date.

Abstract Image

查看原文本刊更多论文

基于k-means的直觉模糊粗糙集模型及其在加强预测灵敏蛋白相互作用对中的应用

肽聚体是一种非常有趣的肽分子或寡核酸。它们用于结合特定的目标分子。适配体在各种实际应用和生理功能中发挥着重要作用。因此，一些疾病可以利用基于适配体蛋白质的疗法来治疗，而设计适配体与特定蛋白质的结合对于进一步了解适配体与蛋白质之间的相互作用过程至关重要。尽管适配体应用广泛，但识别适配体蛋白质之间的相互作用始终不够充分，而且具有挑战性。因此，有必要开发一种计算方法，以实现对适配体与蛋白质之间相互作用的良好预测。本研究提出了一种基于直观模糊特征选择的 k-mean 方法，根据从适配体及其目标蛋白中获得的序列特征，加强预测适配体与目标蛋白间相互作用的新方法。首先，提出了基于 k 近邻概念的直觉模糊粗糙集模型。然后，利用该模型引入了一种新颖的特征选择技术。此外，通过使用所提出的特征选择技术，从训练和测试数据集中选出非冗余的相关特征。其次，应用 SMOTE（合成少数群体过度采样技术）来获得最佳平衡的训练和测试数据集。第三，我们将各种机器学习算法应用于优化平衡的训练和测试数据集，以评估其性能。实验结果表明，提升随机森林学习算法的预测性能最佳。使用 10 倍交叉验证测试，在由aptamer-蛋白质相互作用对组成的最佳平衡缩减训练和测试数据集上，所提方法的灵敏度分别为 91.3、86.4，特异度分别为 91.9、84.8，总体准确率分别为 91.60%、85.60%，Mathews 相关系数分别为 0.832、0.713，AUC（曲线下面积）分别为 0.969、0.908，g-means 分别为 91.5、85.5。最后，我们对获得的最佳结果与现有最佳结果进行了比较研究，结果清楚地表明，我们提出的方法是迄今为止性能最好的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Ambient Intelligence and Humanized Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.60

自引率

0.00%

发文量

854

期刊介绍： The purpose of JAIHC is to provide a high profile, leading edge forum for academics, industrial professionals, educators and policy makers involved in the field to contribute, to disseminate the most innovative researches and developments of all aspects of ambient intelligence and humanized computing, such as intelligent/smart objects, environments/spaces, and systems. The journal discusses various technical, safety, personal, social, physical, political, artistic and economic issues. The research topics covered by the journal are (but not limited to): Pervasive/Ubiquitous Computing and Applications Cognitive wireless sensor network Embedded Systems and Software Mobile Computing and Wireless Communications Next Generation Multimedia Systems Security, Privacy and Trust Service and Semantic Computing Advanced Networking Architectures Dependable, Reliable and Autonomic Computing Embedded Smart Agents Context awareness, social sensing and inference Multi modal interaction design Ergonomics and product prototyping Intelligent and self-organizing transportation networks & services Healthcare Systems Virtual Humans & Virtual Worlds Wearables sensors and actuators