Construction and evaluation of a domain-specific knowledge graph for knowledge discovery

IF 2.6 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE

Information Discovery and Delivery Pub Date : 2023-02-03 DOI:10.1108/idd-06-2022-0054

Huyen Nguyen, Haihua Chen, Jiangping Chen, Kate Kargozari, Junhua Ding

{"title":"Construction and evaluation of a domain-specific knowledge graph for knowledge discovery","authors":"Huyen Nguyen, Haihua Chen, Jiangping Chen, Kate Kargozari, Junhua Ding","doi":"10.1108/idd-06-2022-0054","DOIUrl":null,"url":null,"abstract":"\nPurpose\nThis study aims to evaluate a method of building a biomedical knowledge graph (KG).\n\n\nDesign/methodology/approach\nThis research first constructs a COVID-19 KG on the COVID-19 Open Research Data Set, covering information over six categories (i.e. disease, drug, gene, species, therapy and symptom). The construction used open-source tools to extract entities, relations and triples. Then, the COVID-19 KG is evaluated on three data-quality dimensions: correctness, relatedness and comprehensiveness, using a semiautomatic approach. Finally, this study assesses the application of the KG by building a question answering (Q&A) system. Five queries regarding COVID-19 genomes, symptoms, transmissions and therapeutics were submitted to the system and the results were analyzed.\n\n\nFindings\nWith current extraction tools, the quality of the KG is moderate and difficult to improve, unless more efforts are made to improve the tools for entity extraction, relation extraction and others. This study finds that comprehensiveness and relatedness positively correlate with the data size. Furthermore, the results indicate the performances of the Q&A systems built on the larger-scale KGs are better than the smaller ones for most queries, proving the importance of relatedness and comprehensiveness to ensure the usefulness of the KG.\n\n\nOriginality/value\nThe KG construction process, data-quality-based and application-based evaluations discussed in this paper provide valuable references for KG researchers and practitioners to build high-quality domain-specific knowledge discovery systems.\n","PeriodicalId":43488,"journal":{"name":"Information Discovery and Delivery","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Discovery and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/idd-06-2022-0054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 1

Abstract

Purpose This study aims to evaluate a method of building a biomedical knowledge graph (KG). Design/methodology/approach This research first constructs a COVID-19 KG on the COVID-19 Open Research Data Set, covering information over six categories (i.e. disease, drug, gene, species, therapy and symptom). The construction used open-source tools to extract entities, relations and triples. Then, the COVID-19 KG is evaluated on three data-quality dimensions: correctness, relatedness and comprehensiveness, using a semiautomatic approach. Finally, this study assesses the application of the KG by building a question answering (Q&A) system. Five queries regarding COVID-19 genomes, symptoms, transmissions and therapeutics were submitted to the system and the results were analyzed. Findings With current extraction tools, the quality of the KG is moderate and difficult to improve, unless more efforts are made to improve the tools for entity extraction, relation extraction and others. This study finds that comprehensiveness and relatedness positively correlate with the data size. Furthermore, the results indicate the performances of the Q&A systems built on the larger-scale KGs are better than the smaller ones for most queries, proving the importance of relatedness and comprehensiveness to ensure the usefulness of the KG. Originality/value The KG construction process, data-quality-based and application-based evaluations discussed in this paper provide valuable references for KG researchers and practitioners to build high-quality domain-specific knowledge discovery systems.

查看原文本刊更多论文

用于知识发现的特定领域知识图的构建与评价

目的本研究旨在评估建立生物医学知识图谱（KG）的方法。设计/方法论/方法本研究首先在新冠肺炎开放研究数据集上构建了一个新冠肺炎知识图谱，涵盖了六个类别（即疾病、药物、基因、物种、治疗和症状）的信息。该构建使用开源工具来提取实体、关系和三元组。然后，使用半自动方法，从正确性、相关性和全面性三个数据质量维度对新冠肺炎KG进行评估。最后，本研究通过建立问答系统来评估KG的应用。向系统提交了关于新冠肺炎基因组、症状、传播和治疗的五个查询，并对结果进行了分析。发现使用当前的提取工具，KG的质量是中等的，很难提高，除非做出更多的努力来改进实体提取、关系提取等工具。本研究发现，综合性和相关性与数据量呈正相关。此外，结果表明，在大多数查询中，建立在较大规模KGs上的问答系统的性能优于较小规模KGs，证明了相关性和全面性对确保KG.Originality/value有用性的重要性，本文讨论的基于数据质量和基于应用程序的评估为KG的研究人员和从业者构建高质量的领域特定知识发现系统提供了宝贵的参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Discovery and Delivery INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

5.40

自引率

4.80%

发文量

期刊介绍： Information Discovery and Delivery covers information discovery and access for digital information researchers. This includes educators, knowledge professionals in education and cultural organisations, knowledge managers in media, health care and government, as well as librarians. The journal publishes research and practice which explores the digital information supply chain ie transport, flows, tracking, exchange and sharing, including within and between libraries. It is also interested in digital information capture, packaging and storage by ‘collectors’ of all kinds. Information is widely defined, including but not limited to: Records, Documents, Learning objects, Visual and sound files, Data and metadata and , User-generated content.