Drug-target affinity prediction using applicability domain based on data density

2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) Pub Date : 2021-08-06 DOI:10.26434/chemrxiv.14498688.v1

Shunya Sugita, M. Ohue

{"title":"Drug-target affinity prediction using applicability domain based on data density","authors":"Shunya Sugita, M. Ohue","doi":"10.26434/chemrxiv.14498688.v1","DOIUrl":null,"url":null,"abstract":"In the pursuit of research and development of drug discovery, the computational prediction of the target affinity of a drug candidate is useful for screening compounds at an early stage and for verifying the binding potential to an unknown target. The chemogenomics-based method has attracted increased attention as it integrates information pertaining to the drug and target to predict drug-target affinity (DTA). However, the compound and target spaces are vast, and without sufficient training data, proper DTA prediction is not possible. If a DTA prediction is made in this situation, it will potentially lead to false predictions. In this study, we propose a DTA prediction method that can advise whether/when there are insufficient samples in the compound/target spaces based on the concept of the applicability domain (AD) and the data density of the training dataset. AD indicates a data region in which a machine learning model can make reliable predictions. By preclassifying the samples to be predicted by the constructed AD into those within (In-AD) and those outside the AD (Out-AD), we can determine whether a reasonable prediction can be made for these samples. The results of the evaluation experiments based on the use of three different public datasets showed that the AD constructed by the k-nearest neighbor (k-NN) method worked well, i.e., the prediction accuracy of the samples classified by the AD as Out-AD was low, while the prediction accuracy of the samples classified by the AD as In-AD was high.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26434/chemrxiv.14498688.v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the pursuit of research and development of drug discovery, the computational prediction of the target affinity of a drug candidate is useful for screening compounds at an early stage and for verifying the binding potential to an unknown target. The chemogenomics-based method has attracted increased attention as it integrates information pertaining to the drug and target to predict drug-target affinity (DTA). However, the compound and target spaces are vast, and without sufficient training data, proper DTA prediction is not possible. If a DTA prediction is made in this situation, it will potentially lead to false predictions. In this study, we propose a DTA prediction method that can advise whether/when there are insufficient samples in the compound/target spaces based on the concept of the applicability domain (AD) and the data density of the training dataset. AD indicates a data region in which a machine learning model can make reliable predictions. By preclassifying the samples to be predicted by the constructed AD into those within (In-AD) and those outside the AD (Out-AD), we can determine whether a reasonable prediction can be made for these samples. The results of the evaluation experiments based on the use of three different public datasets showed that the AD constructed by the k-nearest neighbor (k-NN) method worked well, i.e., the prediction accuracy of the samples classified by the AD as Out-AD was low, while the prediction accuracy of the samples classified by the AD as In-AD was high.

查看原文本刊更多论文

基于数据密度的适用域药物靶标亲和力预测

在药物发现的研究和开发过程中，候选药物的靶点亲和力的计算预测对于早期筛选化合物和验证与未知靶点的结合潜力是有用的。基于化学基因组学的方法越来越受到关注，因为它整合了与药物和靶标有关的信息来预测药物-靶标亲和力(DTA)。然而，复合空间和目标空间很大，没有足够的训练数据，无法进行正确的DTA预测。如果在这种情况下进行DTA预测，可能会导致错误的预测。在本研究中，我们提出了一种基于适用性域(AD)的概念和训练数据集的数据密度的DTA预测方法，该方法可以预测复合/目标空间中是否有足够的样本。AD是指机器学习模型可以做出可靠预测的数据区域。通过将构建的AD预测的样本预分类为AD内(In-AD)和AD外(Out-AD)，我们可以确定这些样本是否可以做出合理的预测。基于三种不同的公共数据集的评价实验结果表明，k-近邻(k-NN)方法构建的AD效果良好，即AD分类为Out-AD的样本预测精度较低，而AD分类为In-AD的样本预测精度较高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

自引率

0.00%

发文量