{"title":"基于k-means的直觉模糊粗糙集模型及其在加强预测灵敏蛋白相互作用对中的应用","authors":"Pankhuri Jain, Anoop Tiwari, Tanmoy Som","doi":"10.1007/s12652-024-04837-4","DOIUrl":null,"url":null,"abstract":"<p>Aptamers are very interesting peptide molecules or oligonucleic acid. They are used to bind particular target molecules. Aptamers play vital roles in various practical applications and physiological functions. Consequently, several diseases can be treated using therapies based on aptamer proteins and designing the binding of aptamers to specific proteins is essential to advance understanding into processes of interaction between aptamer-protein. Despite the wide applications of aptamers, identification of interaction between aptamer protein is always inadequate and challenging. Therefore, it is necessary to develop a computational approach for achieving good predictions of interaction between aptamer-protein. In the present study, a novel method for enhancing the prediction of interacting aptamer-target pairs based on sequence features obtained from both aptamers and their target proteins by employing a novel k-mean based intuitionistic fuzzy rough feature selection method is proposed. Firstly, an intuitionistic fuzzy rough set model based on k nearest neighbour concept is proposed. Then, a novel feature selection technique is introduced by using this model. Furthermore, non-redundant and relevant features are selected from training as well as testing datasets by using proposed feature selection technique. Secondly, SMOTE (Synthetic Minority Oversampling Technique) is applied to obtain the optimal balanced training and testing datasets. Thirdly, we apply various machine learning algorithms on optimally balanced reduced training and testing datasets to evaluate their performances. Experimental results shows that the best prediction performance is obtained by boosted random forest learning algorithm. Using a 10 fold cross-validation test, the proposed method is a good performer, with sensitivity of 91.3, 86.4, specificity of 91.9, 84.8, overall accuracy of 91.60%, 85.60%, Mathews correlation coefficient of 0.832, 0.713, AUC (area under curve) of 0.969, 0.908, and g-means of 91.5, 85.5 on optimal balanced reduced training and testing datasets consisting of aptamer-protein interacting pairs. Finally, a comparative study of the best obtained results with the existing best results is presented, which clearly indicates that our proposed approach is the best performing approach till date.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intuitionistic fuzzy rough set model based on k-means and its application to enhance prediction of aptamer–protein interacting pairs\",\"authors\":\"Pankhuri Jain, Anoop Tiwari, Tanmoy Som\",\"doi\":\"10.1007/s12652-024-04837-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Aptamers are very interesting peptide molecules or oligonucleic acid. They are used to bind particular target molecules. Aptamers play vital roles in various practical applications and physiological functions. Consequently, several diseases can be treated using therapies based on aptamer proteins and designing the binding of aptamers to specific proteins is essential to advance understanding into processes of interaction between aptamer-protein. Despite the wide applications of aptamers, identification of interaction between aptamer protein is always inadequate and challenging. Therefore, it is necessary to develop a computational approach for achieving good predictions of interaction between aptamer-protein. In the present study, a novel method for enhancing the prediction of interacting aptamer-target pairs based on sequence features obtained from both aptamers and their target proteins by employing a novel k-mean based intuitionistic fuzzy rough feature selection method is proposed. Firstly, an intuitionistic fuzzy rough set model based on k nearest neighbour concept is proposed. Then, a novel feature selection technique is introduced by using this model. Furthermore, non-redundant and relevant features are selected from training as well as testing datasets by using proposed feature selection technique. Secondly, SMOTE (Synthetic Minority Oversampling Technique) is applied to obtain the optimal balanced training and testing datasets. Thirdly, we apply various machine learning algorithms on optimally balanced reduced training and testing datasets to evaluate their performances. Experimental results shows that the best prediction performance is obtained by boosted random forest learning algorithm. Using a 10 fold cross-validation test, the proposed method is a good performer, with sensitivity of 91.3, 86.4, specificity of 91.9, 84.8, overall accuracy of 91.60%, 85.60%, Mathews correlation coefficient of 0.832, 0.713, AUC (area under curve) of 0.969, 0.908, and g-means of 91.5, 85.5 on optimal balanced reduced training and testing datasets consisting of aptamer-protein interacting pairs. Finally, a comparative study of the best obtained results with the existing best results is presented, which clearly indicates that our proposed approach is the best performing approach till date.</p>\",\"PeriodicalId\":14959,\"journal\":{\"name\":\"Journal of Ambient Intelligence and Humanized Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Ambient Intelligence and Humanized Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12652-024-04837-4\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ambient Intelligence and Humanized Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12652-024-04837-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
Intuitionistic fuzzy rough set model based on k-means and its application to enhance prediction of aptamer–protein interacting pairs
Aptamers are very interesting peptide molecules or oligonucleic acid. They are used to bind particular target molecules. Aptamers play vital roles in various practical applications and physiological functions. Consequently, several diseases can be treated using therapies based on aptamer proteins and designing the binding of aptamers to specific proteins is essential to advance understanding into processes of interaction between aptamer-protein. Despite the wide applications of aptamers, identification of interaction between aptamer protein is always inadequate and challenging. Therefore, it is necessary to develop a computational approach for achieving good predictions of interaction between aptamer-protein. In the present study, a novel method for enhancing the prediction of interacting aptamer-target pairs based on sequence features obtained from both aptamers and their target proteins by employing a novel k-mean based intuitionistic fuzzy rough feature selection method is proposed. Firstly, an intuitionistic fuzzy rough set model based on k nearest neighbour concept is proposed. Then, a novel feature selection technique is introduced by using this model. Furthermore, non-redundant and relevant features are selected from training as well as testing datasets by using proposed feature selection technique. Secondly, SMOTE (Synthetic Minority Oversampling Technique) is applied to obtain the optimal balanced training and testing datasets. Thirdly, we apply various machine learning algorithms on optimally balanced reduced training and testing datasets to evaluate their performances. Experimental results shows that the best prediction performance is obtained by boosted random forest learning algorithm. Using a 10 fold cross-validation test, the proposed method is a good performer, with sensitivity of 91.3, 86.4, specificity of 91.9, 84.8, overall accuracy of 91.60%, 85.60%, Mathews correlation coefficient of 0.832, 0.713, AUC (area under curve) of 0.969, 0.908, and g-means of 91.5, 85.5 on optimal balanced reduced training and testing datasets consisting of aptamer-protein interacting pairs. Finally, a comparative study of the best obtained results with the existing best results is presented, which clearly indicates that our proposed approach is the best performing approach till date.
期刊介绍:
The purpose of JAIHC is to provide a high profile, leading edge forum for academics, industrial professionals, educators and policy makers involved in the field to contribute, to disseminate the most innovative researches and developments of all aspects of ambient intelligence and humanized computing, such as intelligent/smart objects, environments/spaces, and systems. The journal discusses various technical, safety, personal, social, physical, political, artistic and economic issues. The research topics covered by the journal are (but not limited to):
Pervasive/Ubiquitous Computing and Applications
Cognitive wireless sensor network
Embedded Systems and Software
Mobile Computing and Wireless Communications
Next Generation Multimedia Systems
Security, Privacy and Trust
Service and Semantic Computing
Advanced Networking Architectures
Dependable, Reliable and Autonomic Computing
Embedded Smart Agents
Context awareness, social sensing and inference
Multi modal interaction design
Ergonomics and product prototyping
Intelligent and self-organizing transportation networks & services
Healthcare Systems
Virtual Humans & Virtual Worlds
Wearables sensors and actuators