利用带有片段标记化的树转换器VAE来生成高性能的大型化学模型。

IF 6.2 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Tensei Inukai, Aoi Yamato, Manato Akiyama, Yasubumi Sakakibara
{"title":"利用带有片段标记化的树转换器VAE来生成高性能的大型化学模型。","authors":"Tensei Inukai, Aoi Yamato, Manato Akiyama, Yasubumi Sakakibara","doi":"10.1038/s42004-025-01640-w","DOIUrl":null,"url":null,"abstract":"<p><p>Molecular generation models, especially chemical language model (CLM) utilizing SMILES, a string representation of compounds, face limitations in handling large and complex compounds while maintaining structural accuracy. To address these challenges, we propose the Fragment Tree-Transformer based VAE (FRATTVAE), which treats molecules as tree structures with fragments as nodes. FRATTVAE incorporates several innovative techniques to enhance molecular generation. Molecules are decomposed into fragments and organized into tree structures, allowing for efficient handling of large and complex compounds. Tree positional encoding assigns unique positional information to each fragment, preserving hierarchical relationships. The Transformer's self-attention mechanism models complex dependencies among fragments. This architecture allows FRATTVAE to surpass existing methods, making it a robust solution that is scalable to unprecedented dataset sizes and molecular complexities. Distribution learning across various benchmark datasets, from small molecules to natural compounds, showed that FRATTVAE consistently achieved high accuracy in all metrics while balancing reconstruction accuracy and generation quality. In molecular optimization tasks, FRATTVAE generated high-quality, stable molecules with desired properties, avoiding structural alerts. These results highlight FRATTVAE as a robust and versatile solution for molecular generation and optimization, making it well-suited for a variety of applications in cheminformatics and drug discovery.</p>","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":"8 1","pages":"228"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12325745/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging tree-transformer VAE with fragment tokenization for high-performance large chemical model generation.\",\"authors\":\"Tensei Inukai, Aoi Yamato, Manato Akiyama, Yasubumi Sakakibara\",\"doi\":\"10.1038/s42004-025-01640-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Molecular generation models, especially chemical language model (CLM) utilizing SMILES, a string representation of compounds, face limitations in handling large and complex compounds while maintaining structural accuracy. To address these challenges, we propose the Fragment Tree-Transformer based VAE (FRATTVAE), which treats molecules as tree structures with fragments as nodes. FRATTVAE incorporates several innovative techniques to enhance molecular generation. Molecules are decomposed into fragments and organized into tree structures, allowing for efficient handling of large and complex compounds. Tree positional encoding assigns unique positional information to each fragment, preserving hierarchical relationships. The Transformer's self-attention mechanism models complex dependencies among fragments. This architecture allows FRATTVAE to surpass existing methods, making it a robust solution that is scalable to unprecedented dataset sizes and molecular complexities. Distribution learning across various benchmark datasets, from small molecules to natural compounds, showed that FRATTVAE consistently achieved high accuracy in all metrics while balancing reconstruction accuracy and generation quality. In molecular optimization tasks, FRATTVAE generated high-quality, stable molecules with desired properties, avoiding structural alerts. These results highlight FRATTVAE as a robust and versatile solution for molecular generation and optimization, making it well-suited for a variety of applications in cheminformatics and drug discovery.</p>\",\"PeriodicalId\":10529,\"journal\":{\"name\":\"Communications Chemistry\",\"volume\":\"8 1\",\"pages\":\"228\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12325745/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1038/s42004-025-01640-w\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1038/s42004-025-01640-w","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

分子生成模型,特别是化学语言模型(CLM),利用化合物的字符串表示smile,在处理大而复杂的化合物时面临着限制,同时保持结构准确性。为了应对这些挑战,我们提出了基于片段树转换器的VAE (FRATTVAE),它将分子视为树形结构,将片段视为节点。FRATTVAE采用了几种创新技术来增强分子生成。分子被分解成碎片并组织成树形结构,允许有效地处理大而复杂的化合物。树位置编码为每个片段分配唯一的位置信息,保留层次关系。Transformer的自关注机制为片段之间的复杂依赖关系建模。这种架构允许FRATTVAE超越现有的方法,使其成为一个强大的解决方案,可扩展到前所未有的数据集大小和分子复杂性。从小分子到天然化合物的各种基准数据集的分布学习表明,FRATTVAE在平衡重建精度和生成质量的同时,始终在所有指标上实现了高精度。在分子优化任务中,FRATTVAE生成了具有理想特性的高质量、稳定的分子,避免了结构警报。这些结果突出了FRATTVAE作为分子生成和优化的强大和通用解决方案,使其非常适合化学信息学和药物发现的各种应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Leveraging tree-transformer VAE with fragment tokenization for high-performance large chemical model generation.

Molecular generation models, especially chemical language model (CLM) utilizing SMILES, a string representation of compounds, face limitations in handling large and complex compounds while maintaining structural accuracy. To address these challenges, we propose the Fragment Tree-Transformer based VAE (FRATTVAE), which treats molecules as tree structures with fragments as nodes. FRATTVAE incorporates several innovative techniques to enhance molecular generation. Molecules are decomposed into fragments and organized into tree structures, allowing for efficient handling of large and complex compounds. Tree positional encoding assigns unique positional information to each fragment, preserving hierarchical relationships. The Transformer's self-attention mechanism models complex dependencies among fragments. This architecture allows FRATTVAE to surpass existing methods, making it a robust solution that is scalable to unprecedented dataset sizes and molecular complexities. Distribution learning across various benchmark datasets, from small molecules to natural compounds, showed that FRATTVAE consistently achieved high accuracy in all metrics while balancing reconstruction accuracy and generation quality. In molecular optimization tasks, FRATTVAE generated high-quality, stable molecules with desired properties, avoiding structural alerts. These results highlight FRATTVAE as a robust and versatile solution for molecular generation and optimization, making it well-suited for a variety of applications in cheminformatics and drug discovery.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Communications Chemistry
Communications Chemistry Chemistry-General Chemistry
CiteScore
7.70
自引率
1.70%
发文量
146
审稿时长
13 weeks
期刊介绍: Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信