Research on Traditional Chinese Medicine: Domain Knowledge Graph Completion and Quality Evaluation.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2024-08-02 DOI:10.2196/55090

Chang Liu, Zhan Li, Jianmin Li, Yiqian Qu, Ying Chang, Qing Han, Lingyong Cao, Shuyuan Lin

{"title":"Research on Traditional Chinese Medicine: Domain Knowledge Graph Completion and Quality Evaluation.","authors":"Chang Liu, Zhan Li, Jianmin Li, Yiqian Qu, Ying Chang, Qing Han, Lingyong Cao, Shuyuan Lin","doi":"10.2196/55090","DOIUrl":null,"url":null,"abstract":"Background: Knowledge graphs (KGs) can integrate domain knowledge into a traditional Chinese medicine (TCM) intelligent syndrome differentiation model. However, the quality of current KGs in the TCM domain varies greatly, related to the lack of knowledge graph completion (KGC) and evaluation methods.Objective: This study aims to investigate KGC and evaluation methods tailored for TCM domain knowledge.Methods: In the KGC phase, according to the characteristics of TCM domain knowledge, we proposed a 3-step \"entity-ontology-path\" completion approach. This approach uses path reasoning, ontology rule reasoning, and association rules. In the KGC quality evaluation phase, we proposed a 3-dimensional evaluation framework that encompasses completeness, accuracy, and usability, using quantitative metrics such as complex network analysis, ontology reasoning, and graph representation. Furthermore, we compared the impact of different graph representation models on KG usability.Results: In the KGC phase, 52, 107, 27, and 479 triples were added by outlier analysis, rule-based reasoning, association rules, and path-based reasoning, respectively. In addition, rule-based reasoning identified 14 contradictory triples. In the KGC quality evaluation phase, in terms of completeness, KG had higher density and lower sparsity after completion, and there were no contradictory rules within the KG. In terms of accuracy, KG after completion was more consistent with prior knowledge. In terms of usability, the mean reciprocal ranking, mean rank, and hit rate of the first N tail entities predicted by the model (Hits@N) of the TransE, RotatE, DistMult, and ComplEx graph representation models all showed improvement after KGC. Among them, the RotatE model achieved the best representation.Conclusions: The 3-step completion approach can effectively improve the completeness, accuracy, and availability of KGs, and the 3-dimensional evaluation framework can be used for comprehensive KGC evaluation. In the TCM field, the RotatE model performed better at KG representation.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e55090"},"PeriodicalIF":3.1000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11329848/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/55090","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Knowledge graphs (KGs) can integrate domain knowledge into a traditional Chinese medicine (TCM) intelligent syndrome differentiation model. However, the quality of current KGs in the TCM domain varies greatly, related to the lack of knowledge graph completion (KGC) and evaluation methods.

Objective: This study aims to investigate KGC and evaluation methods tailored for TCM domain knowledge.

Methods: In the KGC phase, according to the characteristics of TCM domain knowledge, we proposed a 3-step "entity-ontology-path" completion approach. This approach uses path reasoning, ontology rule reasoning, and association rules. In the KGC quality evaluation phase, we proposed a 3-dimensional evaluation framework that encompasses completeness, accuracy, and usability, using quantitative metrics such as complex network analysis, ontology reasoning, and graph representation. Furthermore, we compared the impact of different graph representation models on KG usability.

Results: In the KGC phase, 52, 107, 27, and 479 triples were added by outlier analysis, rule-based reasoning, association rules, and path-based reasoning, respectively. In addition, rule-based reasoning identified 14 contradictory triples. In the KGC quality evaluation phase, in terms of completeness, KG had higher density and lower sparsity after completion, and there were no contradictory rules within the KG. In terms of accuracy, KG after completion was more consistent with prior knowledge. In terms of usability, the mean reciprocal ranking, mean rank, and hit rate of the first N tail entities predicted by the model (Hits@N) of the TransE, RotatE, DistMult, and ComplEx graph representation models all showed improvement after KGC. Among them, the RotatE model achieved the best representation.

Conclusions: The 3-step completion approach can effectively improve the completeness, accuracy, and availability of KGs, and the 3-dimensional evaluation framework can be used for comprehensive KGC evaluation. In the TCM field, the RotatE model performed better at KG representation.

查看原文本刊更多论文

中医药研究：领域知识图谱的完成与质量评价。

背景：知识图谱（KGs）可以将领域知识整合到中医（TCM）智能综合征分型模型中。然而，目前中医领域的知识图谱质量参差不齐，这与缺乏知识图谱完善（KGC）和评估方法有关：本研究旨在探讨针对中医领域知识的 KGC 和评估方法：在 KGC 阶段，根据中医药领域知识的特点，我们提出了 "实体-本体-路径 "三步完成法。该方法使用了路径推理、本体规则推理和关联规则。在 KGC 质量评估阶段，我们提出了一个包括完整性、准确性和可用性的三维评估框架，并使用了复杂网络分析、本体推理和图表示等定量指标。此外，我们还比较了不同图形表示模型对 KG 可用性的影响：在 KGC 阶段，通过离群点分析、基于规则的推理、关联规则和基于路径的推理，分别添加了 52、107、27 和 479 个三元组。此外，基于规则的推理还发现了 14 个相互矛盾的三元组。在 KGC 质量评估阶段，就完备性而言，完成后的 KG 密度较高，稀疏性较低，而且 KG 中没有相互矛盾的规则。在准确性方面，完成后的 KG 与先前的知识更加一致。在可用性方面，TransE、RotatE、DistMult 和 ComplEx 图表示模型预测的前 N 个尾部实体的平均倒数排序、平均排序和命中率（Hits@N）在 KGC 后都有所改善。结论：结论：三步完成法能有效提高 KG 的完整性、准确性和可用性，三维评价框架可用于全面的 KGC 评价。在中医领域，RotatE 模型的 KG 表示性能更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.