Integrating Generative Pretrained Transformer and Genetic Algorithms for Efficient and Diverse Molecular Generation.

IF 3.1 4区 医学 Q3 CHEMISTRY, MEDICINAL
Chengcheng Xu, Chen Zeng, Xi Yang, Yingxu Liu, Xiangzhen Ning, Lidan Zheng, Yang Liu, Qing Fan, Chao Xu, Haichun Liu, Xian Wei, Yadong Chen, Yanmin Zhang, Rui Gu
{"title":"Integrating Generative Pretrained Transformer and Genetic Algorithms for Efficient and Diverse Molecular Generation.","authors":"Chengcheng Xu, Chen Zeng, Xi Yang, Yingxu Liu, Xiangzhen Ning, Lidan Zheng, Yang Liu, Qing Fan, Chao Xu, Haichun Liu, Xian Wei, Yadong Chen, Yanmin Zhang, Rui Gu","doi":"10.1002/minf.70005","DOIUrl":null,"url":null,"abstract":"<p><p>In computer-aided drug design, molecular generation models play a crucial role in accelerating the drug development process. Current models mainly fall into two categories: deep learning models with high performance but poor interpretability and heuristic algorithms with better interpretability but limited performance. In this study, we introduce an innovative molecular generation model, the compound construction model (CCMol), which integrates the powerful generative capabilities of the generative pretrained transformer (GPT) and the efficient optimization mechanisms of genetic algorithms (GA) to achieve effective and innovative molecular structures. Specifically, our approach uses structure-based drug design comprising both ligand and protein primary structure-based aspects. CCMol integrates GPT for initial molecular generation and GA for iterative optimization of physicochemical and biological properties. The model's reliability was validated by generating molecules targeting three critical disease-related proteins (GLP1, WRN, and JAK2). The results indicate that CCMol is on average with current advanced models in multiple indicators and performs better than the baseline model in terms of structure diversity and drug-related properties indicators, demonstrating that CCMol exhibits outstanding performance in developing novel and effective candidate drug molecules, particularly suitable for expanding the chemical validity of candidate structures at the early stages of drug discovery.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 8","pages":"e202500094"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.70005","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

In computer-aided drug design, molecular generation models play a crucial role in accelerating the drug development process. Current models mainly fall into two categories: deep learning models with high performance but poor interpretability and heuristic algorithms with better interpretability but limited performance. In this study, we introduce an innovative molecular generation model, the compound construction model (CCMol), which integrates the powerful generative capabilities of the generative pretrained transformer (GPT) and the efficient optimization mechanisms of genetic algorithms (GA) to achieve effective and innovative molecular structures. Specifically, our approach uses structure-based drug design comprising both ligand and protein primary structure-based aspects. CCMol integrates GPT for initial molecular generation and GA for iterative optimization of physicochemical and biological properties. The model's reliability was validated by generating molecules targeting three critical disease-related proteins (GLP1, WRN, and JAK2). The results indicate that CCMol is on average with current advanced models in multiple indicators and performs better than the baseline model in terms of structure diversity and drug-related properties indicators, demonstrating that CCMol exhibits outstanding performance in developing novel and effective candidate drug molecules, particularly suitable for expanding the chemical validity of candidate structures at the early stages of drug discovery.

集成生成预训练变压器和遗传算法的高效和多样化分子生成。
在计算机辅助药物设计中,分子生成模型在加速药物开发过程中起着至关重要的作用。目前的模型主要分为两类:性能高但可解释性差的深度学习模型和可解释性较好的启发式算法,但性能有限。在本研究中,我们引入了一种创新的分子生成模型——化合物构建模型(CCMol),该模型将生成式预训练变压器(GPT)强大的生成能力与遗传算法(GA)的高效优化机制相结合,以实现有效和创新的分子结构。具体来说,我们的方法使用基于结构的药物设计,包括配体和基于蛋白质初级结构的方面。CCMol将GPT用于初始分子生成,遗传算法用于物理化学和生物性质的迭代优化。通过生成靶向三种关键疾病相关蛋白(GLP1, WRN和JAK2)的分子,验证了该模型的可靠性。结果表明,CCMol在多个指标上与现有先进模型平均水平相当,在结构多样性和药物相关性质指标上优于基线模型,表明CCMol在开发新型有效的候选药物分子方面表现出色,特别适合在药物发现的早期阶段扩大候选结构的化学有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Informatics
Molecular Informatics CHEMISTRY, MEDICINAL-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.30
自引率
2.80%
发文量
70
审稿时长
3 months
期刊介绍: Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010. Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation. The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信