Feiying Chen, Victor Jun Yu Lim, Mingyu Li, Hao Fan
{"title":"TCM-navigator,一个基于深度学习的工作流程,用于生成和评估用于药物开发的类似中药的化合物。","authors":"Feiying Chen, Victor Jun Yu Lim, Mingyu Li, Hao Fan","doi":"10.1093/bib/bbaf498","DOIUrl":null,"url":null,"abstract":"<p><p>Traditional Chinese Medicine (TCM) has long been regarded as a valuable resource for modern drug discovery. However, the limited availability of recorded entities and information, the complexity and sparsity of the herb-ingredient-target-disease network, and inconsistencies in data representation hinder the effectiveness of high-throughput screening approaches. While some therapeutically valuable compounds from TCM have been discovered through manual experimental screening, such methods are time-consuming and require substantial human resources. To address these challenges, we developed a data-driven and deep learning-based workflow, TCM-navigator, which enables the in-silico generation, quality control, and physics-based evaluation of TCM-like molecules. The generation is done by TCM-Generator, a transfer learning- and Long Short-Term Memory (LSTM)-based chemical language model that generates standardized, hierarchically structured, and high-throughput-friendly datasets of TCM-like molecules. In this study, we generated a target-nonspecific dataset comprising 3.7 million TCM-like molecules, expanding the number of entities in existing TCM datasets by more than 100-fold. The workflow also enables flexible, goal-driven molecule generation customized for specific targets, yielding three target-specific datasets and multiple high-potential target-ligand pairs. The quality control is done by TCM-Identifier, the first quantitative model specifically designed to capture unique characteristics of TCM, using an AttentiveFP framework with message passing neural networks. TCM-Identifier is expected to serve as an essential evaluation and guidance tool for TCM-related drug development. Our workflow bridges cutting-edge data science-including deep learning-with biomedical research to tackle longstanding challenges in target identification and molecular design. Its adaptable framework is also transferable to interdisciplinary innovation beyond drug development.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12466116/pdf/","citationCount":"0","resultStr":"{\"title\":\"TCM-navigator, a deep learning-based workflow for generation and evaluation of traditional Chinese medicine-like compounds for drug development.\",\"authors\":\"Feiying Chen, Victor Jun Yu Lim, Mingyu Li, Hao Fan\",\"doi\":\"10.1093/bib/bbaf498\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Traditional Chinese Medicine (TCM) has long been regarded as a valuable resource for modern drug discovery. However, the limited availability of recorded entities and information, the complexity and sparsity of the herb-ingredient-target-disease network, and inconsistencies in data representation hinder the effectiveness of high-throughput screening approaches. While some therapeutically valuable compounds from TCM have been discovered through manual experimental screening, such methods are time-consuming and require substantial human resources. To address these challenges, we developed a data-driven and deep learning-based workflow, TCM-navigator, which enables the in-silico generation, quality control, and physics-based evaluation of TCM-like molecules. The generation is done by TCM-Generator, a transfer learning- and Long Short-Term Memory (LSTM)-based chemical language model that generates standardized, hierarchically structured, and high-throughput-friendly datasets of TCM-like molecules. In this study, we generated a target-nonspecific dataset comprising 3.7 million TCM-like molecules, expanding the number of entities in existing TCM datasets by more than 100-fold. The workflow also enables flexible, goal-driven molecule generation customized for specific targets, yielding three target-specific datasets and multiple high-potential target-ligand pairs. The quality control is done by TCM-Identifier, the first quantitative model specifically designed to capture unique characteristics of TCM, using an AttentiveFP framework with message passing neural networks. TCM-Identifier is expected to serve as an essential evaluation and guidance tool for TCM-related drug development. Our workflow bridges cutting-edge data science-including deep learning-with biomedical research to tackle longstanding challenges in target identification and molecular design. Its adaptable framework is also transferable to interdisciplinary innovation beyond drug development.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 5\",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12466116/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf498\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf498","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
TCM-navigator, a deep learning-based workflow for generation and evaluation of traditional Chinese medicine-like compounds for drug development.
Traditional Chinese Medicine (TCM) has long been regarded as a valuable resource for modern drug discovery. However, the limited availability of recorded entities and information, the complexity and sparsity of the herb-ingredient-target-disease network, and inconsistencies in data representation hinder the effectiveness of high-throughput screening approaches. While some therapeutically valuable compounds from TCM have been discovered through manual experimental screening, such methods are time-consuming and require substantial human resources. To address these challenges, we developed a data-driven and deep learning-based workflow, TCM-navigator, which enables the in-silico generation, quality control, and physics-based evaluation of TCM-like molecules. The generation is done by TCM-Generator, a transfer learning- and Long Short-Term Memory (LSTM)-based chemical language model that generates standardized, hierarchically structured, and high-throughput-friendly datasets of TCM-like molecules. In this study, we generated a target-nonspecific dataset comprising 3.7 million TCM-like molecules, expanding the number of entities in existing TCM datasets by more than 100-fold. The workflow also enables flexible, goal-driven molecule generation customized for specific targets, yielding three target-specific datasets and multiple high-potential target-ligand pairs. The quality control is done by TCM-Identifier, the first quantitative model specifically designed to capture unique characteristics of TCM, using an AttentiveFP framework with message passing neural networks. TCM-Identifier is expected to serve as an essential evaluation and guidance tool for TCM-related drug development. Our workflow bridges cutting-edge data science-including deep learning-with biomedical research to tackle longstanding challenges in target identification and molecular design. Its adaptable framework is also transferable to interdisciplinary innovation beyond drug development.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.