基于Lasso降维和SMOTE的机器学习预测药物-靶标相互作用(DTI

Maria Theresa F. Calangian, V. P. Magboo
{"title":"基于Lasso降维和SMOTE的机器学习预测药物-靶标相互作用(DTI","authors":"Maria Theresa F. Calangian, V. P. Magboo","doi":"10.1109/ICECET55527.2022.9873060","DOIUrl":null,"url":null,"abstract":"The identification of Drug-Target Interaction (DTI) is an important process in pharmaceutical scientific research to develop new therapeutic agents for diseases. However, experimental methods involving identification of DTIs are time-consuming, expensive, and challenging. Computational methods that can accurately predict DTI pairs are of great interest because they can significantly reduce time and resources in drug discovery and research. This study presents a machine-learning-based model named, kNN-DTIPred, for DTI prediction that addresses two common problems of datasets: high-dimensionality and class imbalance. First, target protein feature vectors are extracted using Pseudo-Position Specific Scoring Matrix (PsePSSM). Using OpenBabel software, drug compounds are represented using FP2 Molecular Fingerprint. Lasso Dimensionality Reduction is then used to obtain only the most discriminating features while SMOTE is applied for class balancing. Five machine learning models were compared on 4 datasets. The best model was obtained by k-Nearest Neighbors classifier with overall prediction accuracy 98.23%, 94.77%, 95.07%, and 93.09% for enzymes, ion channel, G protein-coupled receptors and nuclear receptor datasets respectively. The area under the curve reached 97.05%, 95.95%, 94.89%, and 94.29%, respectively for the datasets mentioned. Additionally, our results showed that Lasso Dimensionality Reduction and SMOTE have significantly improved the predictive performance. This study has demonstrated that the proposed kNN-DTIPred model is highly accurate and effective in predicting drug-target pairs which can accelerate the DTI identification process by limiting the search space to be investigated in laboratory experiments.","PeriodicalId":249012,"journal":{"name":"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Drug-Target Interaction (DTI) based on Machine Learning with Lasso Dimensionality Reduction and SMOTE from Protein Sequence and Drug Fingerprint\",\"authors\":\"Maria Theresa F. Calangian, V. P. Magboo\",\"doi\":\"10.1109/ICECET55527.2022.9873060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The identification of Drug-Target Interaction (DTI) is an important process in pharmaceutical scientific research to develop new therapeutic agents for diseases. However, experimental methods involving identification of DTIs are time-consuming, expensive, and challenging. Computational methods that can accurately predict DTI pairs are of great interest because they can significantly reduce time and resources in drug discovery and research. This study presents a machine-learning-based model named, kNN-DTIPred, for DTI prediction that addresses two common problems of datasets: high-dimensionality and class imbalance. First, target protein feature vectors are extracted using Pseudo-Position Specific Scoring Matrix (PsePSSM). Using OpenBabel software, drug compounds are represented using FP2 Molecular Fingerprint. Lasso Dimensionality Reduction is then used to obtain only the most discriminating features while SMOTE is applied for class balancing. Five machine learning models were compared on 4 datasets. The best model was obtained by k-Nearest Neighbors classifier with overall prediction accuracy 98.23%, 94.77%, 95.07%, and 93.09% for enzymes, ion channel, G protein-coupled receptors and nuclear receptor datasets respectively. The area under the curve reached 97.05%, 95.95%, 94.89%, and 94.29%, respectively for the datasets mentioned. Additionally, our results showed that Lasso Dimensionality Reduction and SMOTE have significantly improved the predictive performance. This study has demonstrated that the proposed kNN-DTIPred model is highly accurate and effective in predicting drug-target pairs which can accelerate the DTI identification process by limiting the search space to be investigated in laboratory experiments.\",\"PeriodicalId\":249012,\"journal\":{\"name\":\"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECET55527.2022.9873060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECET55527.2022.9873060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

药物-靶标相互作用(DTI)的鉴定是药物科学研究开发新的治疗药物的重要过程。然而,涉及识别dti的实验方法耗时,昂贵且具有挑战性。能够准确预测DTI对的计算方法非常有趣,因为它们可以显著减少药物发现和研究的时间和资源。本研究提出了一个基于机器学习的模型,名为kNN-DTIPred,用于DTI预测,解决了数据集的两个常见问题:高维和类不平衡。首先,利用伪位置特异性评分矩阵(Pseudo-Position Specific Scoring Matrix, PsePSSM)提取目标蛋白特征向量;使用OpenBabel软件,使用FP2分子指纹图谱表示药物化合物。然后使用Lasso降维来获得最具区别性的特征,而SMOTE用于类平衡。在4个数据集上比较了5种机器学习模型。k-Nearest Neighbors分类器对酶、离子通道、G蛋白偶联受体和核受体数据集的总体预测准确率分别为98.23%、94.77%、95.07%和93.09%。曲线下面积分别为97.05%、95.95%、94.89%和94.29%。此外,我们的研究结果表明,Lasso降维和SMOTE显著提高了预测性能。本研究表明,所提出的kNN-DTIPred模型在预测药物-靶标对方面具有很高的准确性和有效性,通过限制实验室实验研究的搜索空间,可以加快DTI识别过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting Drug-Target Interaction (DTI) based on Machine Learning with Lasso Dimensionality Reduction and SMOTE from Protein Sequence and Drug Fingerprint
The identification of Drug-Target Interaction (DTI) is an important process in pharmaceutical scientific research to develop new therapeutic agents for diseases. However, experimental methods involving identification of DTIs are time-consuming, expensive, and challenging. Computational methods that can accurately predict DTI pairs are of great interest because they can significantly reduce time and resources in drug discovery and research. This study presents a machine-learning-based model named, kNN-DTIPred, for DTI prediction that addresses two common problems of datasets: high-dimensionality and class imbalance. First, target protein feature vectors are extracted using Pseudo-Position Specific Scoring Matrix (PsePSSM). Using OpenBabel software, drug compounds are represented using FP2 Molecular Fingerprint. Lasso Dimensionality Reduction is then used to obtain only the most discriminating features while SMOTE is applied for class balancing. Five machine learning models were compared on 4 datasets. The best model was obtained by k-Nearest Neighbors classifier with overall prediction accuracy 98.23%, 94.77%, 95.07%, and 93.09% for enzymes, ion channel, G protein-coupled receptors and nuclear receptor datasets respectively. The area under the curve reached 97.05%, 95.95%, 94.89%, and 94.29%, respectively for the datasets mentioned. Additionally, our results showed that Lasso Dimensionality Reduction and SMOTE have significantly improved the predictive performance. This study has demonstrated that the proposed kNN-DTIPred model is highly accurate and effective in predicting drug-target pairs which can accelerate the DTI identification process by limiting the search space to be investigated in laboratory experiments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信