De Novo Drug Design by Multi-Objective Path Consistency Learning with Beam A∗ Search.

IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Dengwei Zhao, Jingyuan Zhou, Shikui Tu, Lei Xu
{"title":"De Novo Drug Design by Multi-Objective Path Consistency Learning with Beam A∗ Search.","authors":"Dengwei Zhao, Jingyuan Zhou, Shikui Tu, Lei Xu","doi":"10.1109/TCBB.2024.3477592","DOIUrl":null,"url":null,"abstract":"<p><p>Generating high-quality and drug-like molecules from scratch within the expansive chemical space presents a significant challenge in the field of drug discovery. In prior research, value-based reinforcement learning algorithms have been employed to generate molecules with multiple desired properties iteratively. The immediate reward was defined as the evaluation of intermediate-state molecules at each step, and the learning objective would be maximizing the expected cumulative evaluation scores for all molecules along the generative path. However, this definition of the reward was misleading, as in reality, the optimization target should be the evaluation score of only the final generated molecule. Furthermore, in previous works, randomness was introduced into the decision-making process, enabling the generation of diverse molecules but no longer pursuing the maximum future rewards. In this paper, immediate reward is defined as the improvement achieved through the modification of the molecule to maximize the evaluation score of the final generated molecule exclusively. Originating from the A ∗ search, path consistency (PC), i.e., f values on one optimal path should be identical, is employed as the objective function in the update of the f value estimator to train a multi-objective de novo drug designer. By incorporating the f value into the decision-making process of beam search, the DrugBA∗ algorithm is proposed to enable the large-scale generation of molecules that exhibit both high quality and diversity. Experimental results demonstrate a substantial enhancement over the state-of-theart algorithm QADD in multiple molecular properties of the generated molecules.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TCBB.2024.3477592","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Generating high-quality and drug-like molecules from scratch within the expansive chemical space presents a significant challenge in the field of drug discovery. In prior research, value-based reinforcement learning algorithms have been employed to generate molecules with multiple desired properties iteratively. The immediate reward was defined as the evaluation of intermediate-state molecules at each step, and the learning objective would be maximizing the expected cumulative evaluation scores for all molecules along the generative path. However, this definition of the reward was misleading, as in reality, the optimization target should be the evaluation score of only the final generated molecule. Furthermore, in previous works, randomness was introduced into the decision-making process, enabling the generation of diverse molecules but no longer pursuing the maximum future rewards. In this paper, immediate reward is defined as the improvement achieved through the modification of the molecule to maximize the evaluation score of the final generated molecule exclusively. Originating from the A ∗ search, path consistency (PC), i.e., f values on one optimal path should be identical, is employed as the objective function in the update of the f value estimator to train a multi-objective de novo drug designer. By incorporating the f value into the decision-making process of beam search, the DrugBA∗ algorithm is proposed to enable the large-scale generation of molecules that exhibit both high quality and diversity. Experimental results demonstrate a substantial enhancement over the state-of-theart algorithm QADD in multiple molecular properties of the generated molecules.

利用光束 A∗ 搜索的多目标路径一致性学习进行新药设计。
在广阔的化学空间内从零开始生成高质量的类药物分子是药物发现领域的一项重大挑战。在之前的研究中,基于价值的强化学习算法被用来迭代生成具有多种所需特性的分子。即时奖励被定义为每一步对中间状态分子的评估,学习目标是最大化生成路径上所有分子的预期累积评估分数。然而,这种对奖励的定义有误导性,因为实际上,优化目标应该只是最终生成的分子的评价得分。此外,在以前的研究中,决策过程中引入了随机性,从而可以生成多种分子,但不再追求未来的最大回报。在本文中,即时奖励被定义为通过对分子的修改所实现的改进,从而使最终生成的分子的评价得分最大化。路径一致性(PC)源于 A ∗ 搜索,即一条最优路径上的 f 值应完全相同,它被用作更新 f 值估计器的目标函数,以训练多目标全新药物设计器。通过将 f 值纳入波束搜索的决策过程,DrugBA∗ 算法得以大规模生成高质量和多样性的分子。实验结果表明,与最先进的 QADD 算法相比,所生成分子的多种分子特性都有大幅提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
6.70%
发文量
479
审稿时长
3 months
期刊介绍: IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信