Algorithmic Generation of Positive Samples for Compound-Target Interaction Prediction

Ebenezer Nanor, Wei-Ping Wu, S. Bayitaa, V. K. Agbesi, Brighter Agyemang
{"title":"Algorithmic Generation of Positive Samples for Compound-Target Interaction Prediction","authors":"Ebenezer Nanor, Wei-Ping Wu, S. Bayitaa, V. K. Agbesi, Brighter Agyemang","doi":"10.1145/3457682.3457689","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) methods have become the preferred computational methods for Compound-Target Interaction (CTI) prediction in small drug development in Bioinformatics, because they have been proven to be very efficient. However, the extremely imbalance nature of CTI datasets presents a major challenge when ML methods are leveraged to predict CTIs. To a large extent, these methods inaccurately predict the class of the minority samples, i.e. positive samples, which are rather of much interest to players in the business of drug development. In this study, we aim to improve the performance of ML-based methods for prediction of CTIs, particularly the positive samples, by addressing the challenge of class imbalance. We applied the technique of deep generative modeling to oversample selected positive samples from the original dataset in order to construct balance datasets. The process of oversampling espoused the General-based approach and a novel Domain Specific-based approach. In the experimental section, 3 Deep Learning (DL) methods and 6 classical ML methods were trained on the original imbalance dataset and two constructed sets of balance data to investigate their performance in the prediction of CTIs. To ensure robustness of the ML-based predictive methods, a Grid Search with 5-fold Cross Validation (CV) was performed to estimate the best hyperparameters for training. Convolutional Neural Network (CNN) produced the most competitive results in predicting positive samples following evaluation carried out with Recall metric.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine Learning (ML) methods have become the preferred computational methods for Compound-Target Interaction (CTI) prediction in small drug development in Bioinformatics, because they have been proven to be very efficient. However, the extremely imbalance nature of CTI datasets presents a major challenge when ML methods are leveraged to predict CTIs. To a large extent, these methods inaccurately predict the class of the minority samples, i.e. positive samples, which are rather of much interest to players in the business of drug development. In this study, we aim to improve the performance of ML-based methods for prediction of CTIs, particularly the positive samples, by addressing the challenge of class imbalance. We applied the technique of deep generative modeling to oversample selected positive samples from the original dataset in order to construct balance datasets. The process of oversampling espoused the General-based approach and a novel Domain Specific-based approach. In the experimental section, 3 Deep Learning (DL) methods and 6 classical ML methods were trained on the original imbalance dataset and two constructed sets of balance data to investigate their performance in the prediction of CTIs. To ensure robustness of the ML-based predictive methods, a Grid Search with 5-fold Cross Validation (CV) was performed to estimate the best hyperparameters for training. Convolutional Neural Network (CNN) produced the most competitive results in predicting positive samples following evaluation carried out with Recall metric.
化合物-靶标相互作用预测阳性样本的生成算法
机器学习(ML)方法已经成为生物信息学领域小药物开发中化合物-靶点相互作用(CTI)预测的首选计算方法,因为它已被证明是非常有效的。然而,CTI数据集的极度不平衡性质在利用ML方法预测CTI时提出了一个重大挑战。在很大程度上,这些方法不能准确地预测少数样本的类别,即阳性样本,这是药物开发业务参与者非常感兴趣的。在本研究中,我们的目标是通过解决类别不平衡的挑战,提高基于ml的cti预测方法的性能,特别是正样本。我们应用深度生成建模技术从原始数据集中选择正样本进行过采样,以构建平衡数据集。过采样过程支持基于通用的方法和一种新的基于领域特定的方法。在实验部分,在原始失衡数据集和两组构建的平衡数据集上训练了3种深度学习(DL)方法和6种经典ML方法,以研究它们在cti预测中的性能。为了确保基于ml的预测方法的稳健性,进行了5倍交叉验证(CV)的网格搜索来估计训练的最佳超参数。卷积神经网络(CNN)在预测阳性样本方面产生了最具竞争力的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信