Debsindhu Bhowmik, Pei Zhang, Zachary Fox, Stephan Irle, John Gounley
{"title":"Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms","authors":"Debsindhu Bhowmik, Pei Zhang, Zachary Fox, Stephan Irle, John Gounley","doi":"10.1016/j.patter.2024.100947","DOIUrl":null,"url":null,"abstract":"This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"2 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2024.100947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
本研究探讨了生成模型在药物发现、材料科学和高分子科学中的有效性,旨在克服与依赖启发式规则的传统逆向设计方法相关的制约因素。生成模型能生成与真实数据相似的合成数据,从而无需大量标注数据集即可进行深度学习模型训练。事实证明,生成模型在为材料科学创建虚拟分子库以及通过生成具有特定性质的分子促进药物发现方面具有重要价值。虽然生成式对抗网络(GANs)被用于这些目的,但模式崩溃限制了它们的功效,限制了新结构的可变性。为了解决这个问题,我们引入了受自然语言处理启发的遮蔽语言模型(LM)。虽然单独的语言模型可能存在固有的局限性,但我们提出了一种结合语言模型和 GAN 的混合架构,以高效生成新分子,其性能优于独立的屏蔽语言模型,尤其是在较小的种群规模下。这种 LM-GAN 混合架构提高了优化属性和生成新样本的效率。