{"title":"基于Lasso降维和SMOTE的机器学习预测药物-靶标相互作用(DTI","authors":"Maria Theresa F. Calangian, V. P. Magboo","doi":"10.1109/ICECET55527.2022.9873060","DOIUrl":null,"url":null,"abstract":"The identification of Drug-Target Interaction (DTI) is an important process in pharmaceutical scientific research to develop new therapeutic agents for diseases. However, experimental methods involving identification of DTIs are time-consuming, expensive, and challenging. Computational methods that can accurately predict DTI pairs are of great interest because they can significantly reduce time and resources in drug discovery and research. This study presents a machine-learning-based model named, kNN-DTIPred, for DTI prediction that addresses two common problems of datasets: high-dimensionality and class imbalance. First, target protein feature vectors are extracted using Pseudo-Position Specific Scoring Matrix (PsePSSM). Using OpenBabel software, drug compounds are represented using FP2 Molecular Fingerprint. Lasso Dimensionality Reduction is then used to obtain only the most discriminating features while SMOTE is applied for class balancing. Five machine learning models were compared on 4 datasets. The best model was obtained by k-Nearest Neighbors classifier with overall prediction accuracy 98.23%, 94.77%, 95.07%, and 93.09% for enzymes, ion channel, G protein-coupled receptors and nuclear receptor datasets respectively. The area under the curve reached 97.05%, 95.95%, 94.89%, and 94.29%, respectively for the datasets mentioned. Additionally, our results showed that Lasso Dimensionality Reduction and SMOTE have significantly improved the predictive performance. This study has demonstrated that the proposed kNN-DTIPred model is highly accurate and effective in predicting drug-target pairs which can accelerate the DTI identification process by limiting the search space to be investigated in laboratory experiments.","PeriodicalId":249012,"journal":{"name":"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Drug-Target Interaction (DTI) based on Machine Learning with Lasso Dimensionality Reduction and SMOTE from Protein Sequence and Drug Fingerprint\",\"authors\":\"Maria Theresa F. Calangian, V. P. Magboo\",\"doi\":\"10.1109/ICECET55527.2022.9873060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The identification of Drug-Target Interaction (DTI) is an important process in pharmaceutical scientific research to develop new therapeutic agents for diseases. However, experimental methods involving identification of DTIs are time-consuming, expensive, and challenging. Computational methods that can accurately predict DTI pairs are of great interest because they can significantly reduce time and resources in drug discovery and research. This study presents a machine-learning-based model named, kNN-DTIPred, for DTI prediction that addresses two common problems of datasets: high-dimensionality and class imbalance. First, target protein feature vectors are extracted using Pseudo-Position Specific Scoring Matrix (PsePSSM). Using OpenBabel software, drug compounds are represented using FP2 Molecular Fingerprint. Lasso Dimensionality Reduction is then used to obtain only the most discriminating features while SMOTE is applied for class balancing. Five machine learning models were compared on 4 datasets. The best model was obtained by k-Nearest Neighbors classifier with overall prediction accuracy 98.23%, 94.77%, 95.07%, and 93.09% for enzymes, ion channel, G protein-coupled receptors and nuclear receptor datasets respectively. The area under the curve reached 97.05%, 95.95%, 94.89%, and 94.29%, respectively for the datasets mentioned. Additionally, our results showed that Lasso Dimensionality Reduction and SMOTE have significantly improved the predictive performance. This study has demonstrated that the proposed kNN-DTIPred model is highly accurate and effective in predicting drug-target pairs which can accelerate the DTI identification process by limiting the search space to be investigated in laboratory experiments.\",\"PeriodicalId\":249012,\"journal\":{\"name\":\"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECET55527.2022.9873060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECET55527.2022.9873060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
药物-靶标相互作用(DTI)的鉴定是药物科学研究开发新的治疗药物的重要过程。然而,涉及识别dti的实验方法耗时,昂贵且具有挑战性。能够准确预测DTI对的计算方法非常有趣,因为它们可以显著减少药物发现和研究的时间和资源。本研究提出了一个基于机器学习的模型,名为kNN-DTIPred,用于DTI预测,解决了数据集的两个常见问题:高维和类不平衡。首先,利用伪位置特异性评分矩阵(Pseudo-Position Specific Scoring Matrix, PsePSSM)提取目标蛋白特征向量;使用OpenBabel软件,使用FP2分子指纹图谱表示药物化合物。然后使用Lasso降维来获得最具区别性的特征,而SMOTE用于类平衡。在4个数据集上比较了5种机器学习模型。k-Nearest Neighbors分类器对酶、离子通道、G蛋白偶联受体和核受体数据集的总体预测准确率分别为98.23%、94.77%、95.07%和93.09%。曲线下面积分别为97.05%、95.95%、94.89%和94.29%。此外,我们的研究结果表明,Lasso降维和SMOTE显著提高了预测性能。本研究表明,所提出的kNN-DTIPred模型在预测药物-靶标对方面具有很高的准确性和有效性,通过限制实验室实验研究的搜索空间,可以加快DTI识别过程。
Predicting Drug-Target Interaction (DTI) based on Machine Learning with Lasso Dimensionality Reduction and SMOTE from Protein Sequence and Drug Fingerprint
The identification of Drug-Target Interaction (DTI) is an important process in pharmaceutical scientific research to develop new therapeutic agents for diseases. However, experimental methods involving identification of DTIs are time-consuming, expensive, and challenging. Computational methods that can accurately predict DTI pairs are of great interest because they can significantly reduce time and resources in drug discovery and research. This study presents a machine-learning-based model named, kNN-DTIPred, for DTI prediction that addresses two common problems of datasets: high-dimensionality and class imbalance. First, target protein feature vectors are extracted using Pseudo-Position Specific Scoring Matrix (PsePSSM). Using OpenBabel software, drug compounds are represented using FP2 Molecular Fingerprint. Lasso Dimensionality Reduction is then used to obtain only the most discriminating features while SMOTE is applied for class balancing. Five machine learning models were compared on 4 datasets. The best model was obtained by k-Nearest Neighbors classifier with overall prediction accuracy 98.23%, 94.77%, 95.07%, and 93.09% for enzymes, ion channel, G protein-coupled receptors and nuclear receptor datasets respectively. The area under the curve reached 97.05%, 95.95%, 94.89%, and 94.29%, respectively for the datasets mentioned. Additionally, our results showed that Lasso Dimensionality Reduction and SMOTE have significantly improved the predictive performance. This study has demonstrated that the proposed kNN-DTIPred model is highly accurate and effective in predicting drug-target pairs which can accelerate the DTI identification process by limiting the search space to be investigated in laboratory experiments.