{"title":"TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine.","authors":"Zhe Wang, Meng Hao, Suyuan Peng, Yuyan Huang, Yiwei Lu, Keyu Yao, Xiaolin Yang, Yan Zhu","doi":"10.1038/s41597-025-04772-9","DOIUrl":null,"url":null,"abstract":"<p><p>This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical texts, and medical records from hospitals, with metadata adhering to the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Each case has been annotated and curated by TCM experts and includes medical record ID, clinical data, explanatory summary, TCM syndrome, clinical information, and TCM pathogenesis, to support algorithms or models in emulating the diagnostic process of TCM clinicians. To provide a comprehensive description of the TCM syndrome diagnosis process, we summarize the diagnosis into four steps: (1) clinical information extraction, (2) TCM pathogenesis reasoning, (3) TCM syndrome reasoning, and (4) explanatory summary. We have also established validation criteria to evaluate their ability in TCM clinical diagnosis using this dataset. To facilitate research and evaluation in syndrome diagnosis of TCM, the TCMEval-SDT dataset is made publicly available under the CC-BY 4.0 license.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"437"},"PeriodicalIF":5.8000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11906624/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-04772-9","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical texts, and medical records from hospitals, with metadata adhering to the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Each case has been annotated and curated by TCM experts and includes medical record ID, clinical data, explanatory summary, TCM syndrome, clinical information, and TCM pathogenesis, to support algorithms or models in emulating the diagnostic process of TCM clinicians. To provide a comprehensive description of the TCM syndrome diagnosis process, we summarize the diagnosis into four steps: (1) clinical information extraction, (2) TCM pathogenesis reasoning, (3) TCM syndrome reasoning, and (4) explanatory summary. We have also established validation criteria to evaluate their ability in TCM clinical diagnosis using this dataset. To facilitate research and evaluation in syndrome diagnosis of TCM, the TCMEval-SDT dataset is made publicly available under the CC-BY 4.0 license.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.