TCM-navigator, a deep learning-based workflow for generation and evaluation of traditional Chinese medicine-like compounds for drug development.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-08-31 DOI:10.1093/bib/bbaf498

Feiying Chen, Victor Jun Yu Lim, Mingyu Li, Hao Fan

{"title":"TCM-navigator, a deep learning-based workflow for generation and evaluation of traditional Chinese medicine-like compounds for drug development.","authors":"Feiying Chen, Victor Jun Yu Lim, Mingyu Li, Hao Fan","doi":"10.1093/bib/bbaf498","DOIUrl":null,"url":null,"abstract":"<p><p>Traditional Chinese Medicine (TCM) has long been regarded as a valuable resource for modern drug discovery. However, the limited availability of recorded entities and information, the complexity and sparsity of the herb-ingredient-target-disease network, and inconsistencies in data representation hinder the effectiveness of high-throughput screening approaches. While some therapeutically valuable compounds from TCM have been discovered through manual experimental screening, such methods are time-consuming and require substantial human resources. To address these challenges, we developed a data-driven and deep learning-based workflow, TCM-navigator, which enables the in-silico generation, quality control, and physics-based evaluation of TCM-like molecules. The generation is done by TCM-Generator, a transfer learning- and Long Short-Term Memory (LSTM)-based chemical language model that generates standardized, hierarchically structured, and high-throughput-friendly datasets of TCM-like molecules. In this study, we generated a target-nonspecific dataset comprising 3.7 million TCM-like molecules, expanding the number of entities in existing TCM datasets by more than 100-fold. The workflow also enables flexible, goal-driven molecule generation customized for specific targets, yielding three target-specific datasets and multiple high-potential target-ligand pairs. The quality control is done by TCM-Identifier, the first quantitative model specifically designed to capture unique characteristics of TCM, using an AttentiveFP framework with message passing neural networks. TCM-Identifier is expected to serve as an essential evaluation and guidance tool for TCM-related drug development. Our workflow bridges cutting-edge data science-including deep learning-with biomedical research to tackle longstanding challenges in target identification and molecular design. Its adaptable framework is also transferable to interdisciplinary innovation beyond drug development.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12466116/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf498","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional Chinese Medicine (TCM) has long been regarded as a valuable resource for modern drug discovery. However, the limited availability of recorded entities and information, the complexity and sparsity of the herb-ingredient-target-disease network, and inconsistencies in data representation hinder the effectiveness of high-throughput screening approaches. While some therapeutically valuable compounds from TCM have been discovered through manual experimental screening, such methods are time-consuming and require substantial human resources. To address these challenges, we developed a data-driven and deep learning-based workflow, TCM-navigator, which enables the in-silico generation, quality control, and physics-based evaluation of TCM-like molecules. The generation is done by TCM-Generator, a transfer learning- and Long Short-Term Memory (LSTM)-based chemical language model that generates standardized, hierarchically structured, and high-throughput-friendly datasets of TCM-like molecules. In this study, we generated a target-nonspecific dataset comprising 3.7 million TCM-like molecules, expanding the number of entities in existing TCM datasets by more than 100-fold. The workflow also enables flexible, goal-driven molecule generation customized for specific targets, yielding three target-specific datasets and multiple high-potential target-ligand pairs. The quality control is done by TCM-Identifier, the first quantitative model specifically designed to capture unique characteristics of TCM, using an AttentiveFP framework with message passing neural networks. TCM-Identifier is expected to serve as an essential evaluation and guidance tool for TCM-related drug development. Our workflow bridges cutting-edge data science-including deep learning-with biomedical research to tackle longstanding challenges in target identification and molecular design. Its adaptable framework is also transferable to interdisciplinary innovation beyond drug development.

查看原文本刊更多论文

TCM-navigator，一个基于深度学习的工作流程，用于生成和评估用于药物开发的类似中药的化合物。

中医药一直被认为是现代药物发现的宝贵资源。然而，记录实体和信息的有限可用性，草药-成分-目标-疾病网络的复杂性和稀疏性，以及数据表示的不一致性阻碍了高通量筛选方法的有效性。虽然一些有治疗价值的中药化合物是通过人工实验筛选发现的，但这种方法耗时且需要大量人力资源。为了应对这些挑战，我们开发了一种基于数据驱动和深度学习的工作流程，TCM-navigator，它可以实现对类tcm分子的硅生成、质量控制和基于物理的评估。生成是由TCM-Generator完成的，这是一个基于迁移学习和长短期记忆（LSTM）的化学语言模型，可以生成标准化的、分层结构的、高通量友好的类中药分子数据集。在这项研究中，我们生成了一个包含370万个类中药分子的目标非特异性数据集，将现有中药数据集中的实体数量扩大了100多倍。该工作流程还可以实现针对特定靶点定制的灵活、目标驱动的分子生成，产生三个特定靶点的数据集和多个高潜力的靶点配体对。质量控制由TCM- identifier完成，这是第一个专门用于捕捉中医独特特征的定量模型，使用带有消息传递神经网络的AttentiveFP框架。中药标识符有望成为中药相关药物开发的重要评价和指导工具。我们的工作流程将尖端数据科学（包括深度学习）与生物医学研究相结合，以解决目标识别和分子设计方面的长期挑战。它的适应性框架也适用于药物开发以外的跨学科创新。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.