Huyen Nguyen, Haihua Chen, Jiangping Chen, Kate Kargozari, Junhua Ding
{"title":"Construction and evaluation of a domain-specific knowledge graph for knowledge discovery","authors":"Huyen Nguyen, Haihua Chen, Jiangping Chen, Kate Kargozari, Junhua Ding","doi":"10.1108/idd-06-2022-0054","DOIUrl":null,"url":null,"abstract":"\nPurpose\nThis study aims to evaluate a method of building a biomedical knowledge graph (KG).\n\n\nDesign/methodology/approach\nThis research first constructs a COVID-19 KG on the COVID-19 Open Research Data Set, covering information over six categories (i.e. disease, drug, gene, species, therapy and symptom). The construction used open-source tools to extract entities, relations and triples. Then, the COVID-19 KG is evaluated on three data-quality dimensions: correctness, relatedness and comprehensiveness, using a semiautomatic approach. Finally, this study assesses the application of the KG by building a question answering (Q&A) system. Five queries regarding COVID-19 genomes, symptoms, transmissions and therapeutics were submitted to the system and the results were analyzed.\n\n\nFindings\nWith current extraction tools, the quality of the KG is moderate and difficult to improve, unless more efforts are made to improve the tools for entity extraction, relation extraction and others. This study finds that comprehensiveness and relatedness positively correlate with the data size. Furthermore, the results indicate the performances of the Q&A systems built on the larger-scale KGs are better than the smaller ones for most queries, proving the importance of relatedness and comprehensiveness to ensure the usefulness of the KG.\n\n\nOriginality/value\nThe KG construction process, data-quality-based and application-based evaluations discussed in this paper provide valuable references for KG researchers and practitioners to build high-quality domain-specific knowledge discovery systems.\n","PeriodicalId":43488,"journal":{"name":"Information Discovery and Delivery","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Discovery and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/idd-06-2022-0054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 1
Abstract
Purpose
This study aims to evaluate a method of building a biomedical knowledge graph (KG).
Design/methodology/approach
This research first constructs a COVID-19 KG on the COVID-19 Open Research Data Set, covering information over six categories (i.e. disease, drug, gene, species, therapy and symptom). The construction used open-source tools to extract entities, relations and triples. Then, the COVID-19 KG is evaluated on three data-quality dimensions: correctness, relatedness and comprehensiveness, using a semiautomatic approach. Finally, this study assesses the application of the KG by building a question answering (Q&A) system. Five queries regarding COVID-19 genomes, symptoms, transmissions and therapeutics were submitted to the system and the results were analyzed.
Findings
With current extraction tools, the quality of the KG is moderate and difficult to improve, unless more efforts are made to improve the tools for entity extraction, relation extraction and others. This study finds that comprehensiveness and relatedness positively correlate with the data size. Furthermore, the results indicate the performances of the Q&A systems built on the larger-scale KGs are better than the smaller ones for most queries, proving the importance of relatedness and comprehensiveness to ensure the usefulness of the KG.
Originality/value
The KG construction process, data-quality-based and application-based evaluations discussed in this paper provide valuable references for KG researchers and practitioners to build high-quality domain-specific knowledge discovery systems.
期刊介绍:
Information Discovery and Delivery covers information discovery and access for digital information researchers. This includes educators, knowledge professionals in education and cultural organisations, knowledge managers in media, health care and government, as well as librarians. The journal publishes research and practice which explores the digital information supply chain ie transport, flows, tracking, exchange and sharing, including within and between libraries. It is also interested in digital information capture, packaging and storage by ‘collectors’ of all kinds. Information is widely defined, including but not limited to: Records, Documents, Learning objects, Visual and sound files, Data and metadata and , User-generated content.