Algorithmic Generation of Positive Samples for Compound-Target Interaction Prediction

2021 13th International Conference on Machine Learning and Computing Pub Date : 2021-02-26 DOI:10.1145/3457682.3457689

Ebenezer Nanor, Wei-Ping Wu, S. Bayitaa, V. K. Agbesi, Brighter Agyemang

{"title":"Algorithmic Generation of Positive Samples for Compound-Target Interaction Prediction","authors":"Ebenezer Nanor, Wei-Ping Wu, S. Bayitaa, V. K. Agbesi, Brighter Agyemang","doi":"10.1145/3457682.3457689","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) methods have become the preferred computational methods for Compound-Target Interaction (CTI) prediction in small drug development in Bioinformatics, because they have been proven to be very efficient. However, the extremely imbalance nature of CTI datasets presents a major challenge when ML methods are leveraged to predict CTIs. To a large extent, these methods inaccurately predict the class of the minority samples, i.e. positive samples, which are rather of much interest to players in the business of drug development. In this study, we aim to improve the performance of ML-based methods for prediction of CTIs, particularly the positive samples, by addressing the challenge of class imbalance. We applied the technique of deep generative modeling to oversample selected positive samples from the original dataset in order to construct balance datasets. The process of oversampling espoused the General-based approach and a novel Domain Specific-based approach. In the experimental section, 3 Deep Learning (DL) methods and 6 classical ML methods were trained on the original imbalance dataset and two constructed sets of balance data to investigate their performance in the prediction of CTIs. To ensure robustness of the ML-based predictive methods, a Grid Search with 5-fold Cross Validation (CV) was performed to estimate the best hyperparameters for training. Convolutional Neural Network (CNN) produced the most competitive results in predicting positive samples following evaluation carried out with Recall metric.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine Learning (ML) methods have become the preferred computational methods for Compound-Target Interaction (CTI) prediction in small drug development in Bioinformatics, because they have been proven to be very efficient. However, the extremely imbalance nature of CTI datasets presents a major challenge when ML methods are leveraged to predict CTIs. To a large extent, these methods inaccurately predict the class of the minority samples, i.e. positive samples, which are rather of much interest to players in the business of drug development. In this study, we aim to improve the performance of ML-based methods for prediction of CTIs, particularly the positive samples, by addressing the challenge of class imbalance. We applied the technique of deep generative modeling to oversample selected positive samples from the original dataset in order to construct balance datasets. The process of oversampling espoused the General-based approach and a novel Domain Specific-based approach. In the experimental section, 3 Deep Learning (DL) methods and 6 classical ML methods were trained on the original imbalance dataset and two constructed sets of balance data to investigate their performance in the prediction of CTIs. To ensure robustness of the ML-based predictive methods, a Grid Search with 5-fold Cross Validation (CV) was performed to estimate the best hyperparameters for training. Convolutional Neural Network (CNN) produced the most competitive results in predicting positive samples following evaluation carried out with Recall metric.

查看原文本刊更多论文

化合物-靶标相互作用预测阳性样本的生成算法

机器学习(ML)方法已经成为生物信息学领域小药物开发中化合物-靶点相互作用(CTI)预测的首选计算方法，因为它已被证明是非常有效的。然而，CTI数据集的极度不平衡性质在利用ML方法预测CTI时提出了一个重大挑战。在很大程度上，这些方法不能准确地预测少数样本的类别，即阳性样本，这是药物开发业务参与者非常感兴趣的。在本研究中，我们的目标是通过解决类别不平衡的挑战，提高基于ml的cti预测方法的性能，特别是正样本。我们应用深度生成建模技术从原始数据集中选择正样本进行过采样，以构建平衡数据集。过采样过程支持基于通用的方法和一种新的基于领域特定的方法。在实验部分，在原始失衡数据集和两组构建的平衡数据集上训练了3种深度学习(DL)方法和6种经典ML方法，以研究它们在cti预测中的性能。为了确保基于ml的预测方法的稳健性，进行了5倍交叉验证(CV)的网格搜索来估计训练的最佳超参数。卷积神经网络(CNN)在预测阳性样本方面产生了最具竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 13th International Conference on Machine Learning and Computing

自引率

0.00%

发文量