TCM-navigator, a deep learning-based workflow for generation and evaluation of traditional Chinese medicine-like compounds for drug development.

IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Feiying Chen, Victor Jun Yu Lim, Mingyu Li, Hao Fan
{"title":"TCM-navigator, a deep learning-based workflow for generation and evaluation of traditional Chinese medicine-like compounds for drug development.","authors":"Feiying Chen, Victor Jun Yu Lim, Mingyu Li, Hao Fan","doi":"10.1093/bib/bbaf498","DOIUrl":null,"url":null,"abstract":"<p><p>Traditional Chinese Medicine (TCM) has long been regarded as a valuable resource for modern drug discovery. However, the limited availability of recorded entities and information, the complexity and sparsity of the herb-ingredient-target-disease network, and inconsistencies in data representation hinder the effectiveness of high-throughput screening approaches. While some therapeutically valuable compounds from TCM have been discovered through manual experimental screening, such methods are time-consuming and require substantial human resources. To address these challenges, we developed a data-driven and deep learning-based workflow, TCM-navigator, which enables the in-silico generation, quality control, and physics-based evaluation of TCM-like molecules. The generation is done by TCM-Generator, a transfer learning- and Long Short-Term Memory (LSTM)-based chemical language model that generates standardized, hierarchically structured, and high-throughput-friendly datasets of TCM-like molecules. In this study, we generated a target-nonspecific dataset comprising 3.7 million TCM-like molecules, expanding the number of entities in existing TCM datasets by more than 100-fold. The workflow also enables flexible, goal-driven molecule generation customized for specific targets, yielding three target-specific datasets and multiple high-potential target-ligand pairs. The quality control is done by TCM-Identifier, the first quantitative model specifically designed to capture unique characteristics of TCM, using an AttentiveFP framework with message passing neural networks. TCM-Identifier is expected to serve as an essential evaluation and guidance tool for TCM-related drug development. Our workflow bridges cutting-edge data science-including deep learning-with biomedical research to tackle longstanding challenges in target identification and molecular design. Its adaptable framework is also transferable to interdisciplinary innovation beyond drug development.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12466116/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf498","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Traditional Chinese Medicine (TCM) has long been regarded as a valuable resource for modern drug discovery. However, the limited availability of recorded entities and information, the complexity and sparsity of the herb-ingredient-target-disease network, and inconsistencies in data representation hinder the effectiveness of high-throughput screening approaches. While some therapeutically valuable compounds from TCM have been discovered through manual experimental screening, such methods are time-consuming and require substantial human resources. To address these challenges, we developed a data-driven and deep learning-based workflow, TCM-navigator, which enables the in-silico generation, quality control, and physics-based evaluation of TCM-like molecules. The generation is done by TCM-Generator, a transfer learning- and Long Short-Term Memory (LSTM)-based chemical language model that generates standardized, hierarchically structured, and high-throughput-friendly datasets of TCM-like molecules. In this study, we generated a target-nonspecific dataset comprising 3.7 million TCM-like molecules, expanding the number of entities in existing TCM datasets by more than 100-fold. The workflow also enables flexible, goal-driven molecule generation customized for specific targets, yielding three target-specific datasets and multiple high-potential target-ligand pairs. The quality control is done by TCM-Identifier, the first quantitative model specifically designed to capture unique characteristics of TCM, using an AttentiveFP framework with message passing neural networks. TCM-Identifier is expected to serve as an essential evaluation and guidance tool for TCM-related drug development. Our workflow bridges cutting-edge data science-including deep learning-with biomedical research to tackle longstanding challenges in target identification and molecular design. Its adaptable framework is also transferable to interdisciplinary innovation beyond drug development.

TCM-navigator,一个基于深度学习的工作流程,用于生成和评估用于药物开发的类似中药的化合物。
中医药一直被认为是现代药物发现的宝贵资源。然而,记录实体和信息的有限可用性,草药-成分-目标-疾病网络的复杂性和稀疏性,以及数据表示的不一致性阻碍了高通量筛选方法的有效性。虽然一些有治疗价值的中药化合物是通过人工实验筛选发现的,但这种方法耗时且需要大量人力资源。为了应对这些挑战,我们开发了一种基于数据驱动和深度学习的工作流程,TCM-navigator,它可以实现对类tcm分子的硅生成、质量控制和基于物理的评估。生成是由TCM-Generator完成的,这是一个基于迁移学习和长短期记忆(LSTM)的化学语言模型,可以生成标准化的、分层结构的、高通量友好的类中药分子数据集。在这项研究中,我们生成了一个包含370万个类中药分子的目标非特异性数据集,将现有中药数据集中的实体数量扩大了100多倍。该工作流程还可以实现针对特定靶点定制的灵活、目标驱动的分子生成,产生三个特定靶点的数据集和多个高潜力的靶点配体对。质量控制由TCM- identifier完成,这是第一个专门用于捕捉中医独特特征的定量模型,使用带有消息传递神经网络的AttentiveFP框架。中药标识符有望成为中药相关药物开发的重要评价和指导工具。我们的工作流程将尖端数据科学(包括深度学习)与生物医学研究相结合,以解决目标识别和分子设计方面的长期挑战。它的适应性框架也适用于药物开发以外的跨学科创新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信