{"title":"利用带有片段标记化的树转换器VAE来生成高性能的大型化学模型。","authors":"Tensei Inukai, Aoi Yamato, Manato Akiyama, Yasubumi Sakakibara","doi":"10.1038/s42004-025-01640-w","DOIUrl":null,"url":null,"abstract":"<p><p>Molecular generation models, especially chemical language model (CLM) utilizing SMILES, a string representation of compounds, face limitations in handling large and complex compounds while maintaining structural accuracy. To address these challenges, we propose the Fragment Tree-Transformer based VAE (FRATTVAE), which treats molecules as tree structures with fragments as nodes. FRATTVAE incorporates several innovative techniques to enhance molecular generation. Molecules are decomposed into fragments and organized into tree structures, allowing for efficient handling of large and complex compounds. Tree positional encoding assigns unique positional information to each fragment, preserving hierarchical relationships. The Transformer's self-attention mechanism models complex dependencies among fragments. This architecture allows FRATTVAE to surpass existing methods, making it a robust solution that is scalable to unprecedented dataset sizes and molecular complexities. Distribution learning across various benchmark datasets, from small molecules to natural compounds, showed that FRATTVAE consistently achieved high accuracy in all metrics while balancing reconstruction accuracy and generation quality. In molecular optimization tasks, FRATTVAE generated high-quality, stable molecules with desired properties, avoiding structural alerts. These results highlight FRATTVAE as a robust and versatile solution for molecular generation and optimization, making it well-suited for a variety of applications in cheminformatics and drug discovery.</p>","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":"8 1","pages":"228"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12325745/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging tree-transformer VAE with fragment tokenization for high-performance large chemical model generation.\",\"authors\":\"Tensei Inukai, Aoi Yamato, Manato Akiyama, Yasubumi Sakakibara\",\"doi\":\"10.1038/s42004-025-01640-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Molecular generation models, especially chemical language model (CLM) utilizing SMILES, a string representation of compounds, face limitations in handling large and complex compounds while maintaining structural accuracy. To address these challenges, we propose the Fragment Tree-Transformer based VAE (FRATTVAE), which treats molecules as tree structures with fragments as nodes. FRATTVAE incorporates several innovative techniques to enhance molecular generation. Molecules are decomposed into fragments and organized into tree structures, allowing for efficient handling of large and complex compounds. Tree positional encoding assigns unique positional information to each fragment, preserving hierarchical relationships. The Transformer's self-attention mechanism models complex dependencies among fragments. This architecture allows FRATTVAE to surpass existing methods, making it a robust solution that is scalable to unprecedented dataset sizes and molecular complexities. Distribution learning across various benchmark datasets, from small molecules to natural compounds, showed that FRATTVAE consistently achieved high accuracy in all metrics while balancing reconstruction accuracy and generation quality. In molecular optimization tasks, FRATTVAE generated high-quality, stable molecules with desired properties, avoiding structural alerts. These results highlight FRATTVAE as a robust and versatile solution for molecular generation and optimization, making it well-suited for a variety of applications in cheminformatics and drug discovery.</p>\",\"PeriodicalId\":10529,\"journal\":{\"name\":\"Communications Chemistry\",\"volume\":\"8 1\",\"pages\":\"228\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12325745/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1038/s42004-025-01640-w\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1038/s42004-025-01640-w","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Leveraging tree-transformer VAE with fragment tokenization for high-performance large chemical model generation.
Molecular generation models, especially chemical language model (CLM) utilizing SMILES, a string representation of compounds, face limitations in handling large and complex compounds while maintaining structural accuracy. To address these challenges, we propose the Fragment Tree-Transformer based VAE (FRATTVAE), which treats molecules as tree structures with fragments as nodes. FRATTVAE incorporates several innovative techniques to enhance molecular generation. Molecules are decomposed into fragments and organized into tree structures, allowing for efficient handling of large and complex compounds. Tree positional encoding assigns unique positional information to each fragment, preserving hierarchical relationships. The Transformer's self-attention mechanism models complex dependencies among fragments. This architecture allows FRATTVAE to surpass existing methods, making it a robust solution that is scalable to unprecedented dataset sizes and molecular complexities. Distribution learning across various benchmark datasets, from small molecules to natural compounds, showed that FRATTVAE consistently achieved high accuracy in all metrics while balancing reconstruction accuracy and generation quality. In molecular optimization tasks, FRATTVAE generated high-quality, stable molecules with desired properties, avoiding structural alerts. These results highlight FRATTVAE as a robust and versatile solution for molecular generation and optimization, making it well-suited for a variety of applications in cheminformatics and drug discovery.
期刊介绍:
Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.