Leveraging language model for advanced multiproperty molecular optimization via prompt engineering

IF 18.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhenxing Wu, Odin Zhang, Xiaorui Wang, Li Fu, Huifeng Zhao, Jike Wang, Hongyan Du, Dejun Jiang, Yafeng Deng, Dongsheng Cao, Chang-Yu Hsieh, Tingjun Hou
{"title":"Leveraging language model for advanced multiproperty molecular optimization via prompt engineering","authors":"Zhenxing Wu, Odin Zhang, Xiaorui Wang, Li Fu, Huifeng Zhao, Jike Wang, Hongyan Du, Dejun Jiang, Yafeng Deng, Dongsheng Cao, Chang-Yu Hsieh, Tingjun Hou","doi":"10.1038/s42256-024-00916-5","DOIUrl":null,"url":null,"abstract":"<p>Optimizing a candidate molecule’s physiochemical and functional properties has been a critical task in drug and material design. Although the non-trivial task of balancing multiple (potentially conflicting) optimization objectives is considered ideal for artificial intelligence, several technical challenges such as the scarcity of multiproperty-labelled training data have hindered the development of a satisfactory AI solution for a long time. Prompt-MolOpt is a tool for molecular optimization; it makes use of prompt-based embeddings, as used in large language models, to improve the transformer’s ability to optimize molecules for specific property adjustments. Notably, Prompt-MolOpt excels in working with limited multiproperty data (even under the zero-shot setting) by effectively generalizing causal relationships learned from single-property datasets. In comparative evaluations against established models such as JTNN, hierG2G and Modof, Prompt-MolOpt achieves over a 15% relative improvement in multiproperty optimization success rates compared with the leading Modof model. Furthermore, a variant of Prompt-MolOpt, named Prompt-MolOpt<sup>P</sup>, can preserve the pharmacophores or any user-specified fragments under the structural transformation, further broadening its application scope. By constructing tailored optimization datasets, with the protocol introduced in this work, Prompt-MolOpt steers molecular optimization towards domain-relevant chemical spaces, enhancing the quality of the optimized molecules. Real-world tests, such as those involving blood–brain barrier permeability optimization, underscore its practical relevance. Prompt-MolOpt offers a versatile approach for multiproperty and multi-site molecular optimizations, suggesting its potential utility in chemistry research and drug and material discovery.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":null,"pages":null},"PeriodicalIF":18.8000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1038/s42256-024-00916-5","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Optimizing a candidate molecule’s physiochemical and functional properties has been a critical task in drug and material design. Although the non-trivial task of balancing multiple (potentially conflicting) optimization objectives is considered ideal for artificial intelligence, several technical challenges such as the scarcity of multiproperty-labelled training data have hindered the development of a satisfactory AI solution for a long time. Prompt-MolOpt is a tool for molecular optimization; it makes use of prompt-based embeddings, as used in large language models, to improve the transformer’s ability to optimize molecules for specific property adjustments. Notably, Prompt-MolOpt excels in working with limited multiproperty data (even under the zero-shot setting) by effectively generalizing causal relationships learned from single-property datasets. In comparative evaluations against established models such as JTNN, hierG2G and Modof, Prompt-MolOpt achieves over a 15% relative improvement in multiproperty optimization success rates compared with the leading Modof model. Furthermore, a variant of Prompt-MolOpt, named Prompt-MolOptP, can preserve the pharmacophores or any user-specified fragments under the structural transformation, further broadening its application scope. By constructing tailored optimization datasets, with the protocol introduced in this work, Prompt-MolOpt steers molecular optimization towards domain-relevant chemical spaces, enhancing the quality of the optimized molecules. Real-world tests, such as those involving blood–brain barrier permeability optimization, underscore its practical relevance. Prompt-MolOpt offers a versatile approach for multiproperty and multi-site molecular optimizations, suggesting its potential utility in chemistry research and drug and material discovery.

Abstract Image

利用语言模型,通过及时工程实现先进的多性能分子优化
优化候选分子的物理化学和功能特性一直是药物和材料设计的关键任务。尽管平衡多个(可能相互冲突的)优化目标这一非同小可的任务被认为是人工智能的理想选择,但长期以来,多属性标记训练数据的稀缺等一些技术挑战阻碍了令人满意的人工智能解决方案的开发。Prompt-MolOpt 是一种用于分子优化的工具;它利用基于提示的嵌入(用于大型语言模型)来提高转换器针对特定属性调整优化分子的能力。值得注意的是,Prompt-MolOpt 在处理有限的多属性数据(即使是在零镜头设置下)时表现出色,它有效地概括了从单属性数据集中学到的因果关系。在与 JTNN、hierG2G 和 Modof 等成熟模型的比较评估中,与领先的 Modof 模型相比,Prompt-MolOpt 的多属性优化成功率相对提高了 15%。此外,被命名为 Prompt-MolOptP 的 Prompt-MolOpt 变体可以在结构转换中保留药效团或任何用户指定的片段,从而进一步拓宽了其应用范围。通过构建量身定制的优化数据集,Prompt-MolOpt 利用本工作中介绍的协议,将分子优化引向与领域相关的化学空间,提高了优化分子的质量。血脑屏障通透性优化等实际测试凸显了它的实用性。Prompt-MolOpt 为多性能和多位点分子优化提供了一种多功能方法,表明它在化学研究、药物和材料发现方面具有潜在的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
36.90
自引率
2.10%
发文量
127
期刊介绍: Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信