{"title":"Research on Traditional Chinese Medicine: Domain Knowledge Graph Completion and Quality Evaluation.","authors":"Chang Liu, Zhan Li, Jianmin Li, Yiqian Qu, Ying Chang, Qing Han, Lingyong Cao, Shuyuan Lin","doi":"10.2196/55090","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Knowledge graphs (KGs) can integrate domain knowledge into a traditional Chinese medicine (TCM) intelligent syndrome differentiation model. However, the quality of current KGs in the TCM domain varies greatly, related to the lack of knowledge graph completion (KGC) and evaluation methods.</p><p><strong>Objective: </strong>This study aims to investigate KGC and evaluation methods tailored for TCM domain knowledge.</p><p><strong>Methods: </strong>In the KGC phase, according to the characteristics of TCM domain knowledge, we proposed a 3-step \"entity-ontology-path\" completion approach. This approach uses path reasoning, ontology rule reasoning, and association rules. In the KGC quality evaluation phase, we proposed a 3-dimensional evaluation framework that encompasses completeness, accuracy, and usability, using quantitative metrics such as complex network analysis, ontology reasoning, and graph representation. Furthermore, we compared the impact of different graph representation models on KG usability.</p><p><strong>Results: </strong>In the KGC phase, 52, 107, 27, and 479 triples were added by outlier analysis, rule-based reasoning, association rules, and path-based reasoning, respectively. In addition, rule-based reasoning identified 14 contradictory triples. In the KGC quality evaluation phase, in terms of completeness, KG had higher density and lower sparsity after completion, and there were no contradictory rules within the KG. In terms of accuracy, KG after completion was more consistent with prior knowledge. In terms of usability, the mean reciprocal ranking, mean rank, and hit rate of the first N tail entities predicted by the model (Hits@N) of the TransE, RotatE, DistMult, and ComplEx graph representation models all showed improvement after KGC. Among them, the RotatE model achieved the best representation.</p><p><strong>Conclusions: </strong>The 3-step completion approach can effectively improve the completeness, accuracy, and availability of KGs, and the 3-dimensional evaluation framework can be used for comprehensive KGC evaluation. In the TCM field, the RotatE model performed better at KG representation.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e55090"},"PeriodicalIF":3.1000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11329848/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/55090","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Knowledge graphs (KGs) can integrate domain knowledge into a traditional Chinese medicine (TCM) intelligent syndrome differentiation model. However, the quality of current KGs in the TCM domain varies greatly, related to the lack of knowledge graph completion (KGC) and evaluation methods.
Objective: This study aims to investigate KGC and evaluation methods tailored for TCM domain knowledge.
Methods: In the KGC phase, according to the characteristics of TCM domain knowledge, we proposed a 3-step "entity-ontology-path" completion approach. This approach uses path reasoning, ontology rule reasoning, and association rules. In the KGC quality evaluation phase, we proposed a 3-dimensional evaluation framework that encompasses completeness, accuracy, and usability, using quantitative metrics such as complex network analysis, ontology reasoning, and graph representation. Furthermore, we compared the impact of different graph representation models on KG usability.
Results: In the KGC phase, 52, 107, 27, and 479 triples were added by outlier analysis, rule-based reasoning, association rules, and path-based reasoning, respectively. In addition, rule-based reasoning identified 14 contradictory triples. In the KGC quality evaluation phase, in terms of completeness, KG had higher density and lower sparsity after completion, and there were no contradictory rules within the KG. In terms of accuracy, KG after completion was more consistent with prior knowledge. In terms of usability, the mean reciprocal ranking, mean rank, and hit rate of the first N tail entities predicted by the model (Hits@N) of the TransE, RotatE, DistMult, and ComplEx graph representation models all showed improvement after KGC. Among them, the RotatE model achieved the best representation.
Conclusions: The 3-step completion approach can effectively improve the completeness, accuracy, and availability of KGs, and the 3-dimensional evaluation framework can be used for comprehensive KGC evaluation. In the TCM field, the RotatE model performed better at KG representation.
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.