Automatic Relationship Verification in Online Medical Knowledge Base: a Large Scale Study in SemMedDB

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2018-12-01 DOI:10.1109/BIBM.2018.8621316

Danchen Zhang, Daqing He, Ning Zou, Xin Zhou, Fen Pei

{"title":"Automatic Relationship Verification in Online Medical Knowledge Base: a Large Scale Study in SemMedDB","authors":"Danchen Zhang, Daqing He, Ning Zou, Xin Zhou, Fen Pei","doi":"10.1109/BIBM.2018.8621316","DOIUrl":null,"url":null,"abstract":"Automatically generated public medical knowledge bases (KBs), such as SemMedDB, are commonly used in various medical informatic tasks because of their comprehensive coverage. However, due to the imperfectness of the automatic algorithms for generating those KBs, they often contain noisy statements about medical concepts and relationships. For example, the extraction precision of SemRep, the tool used for constructing SemMedDB, is reported be 74.5%. Previous work focused on improving the algorithms for more accurate extraction. In this paper, however, we propose a supervised learning method to automatically verify the medical relationships. Through a study conducted on SemMedDB, we develop a method for generating a large set of training data with a relative small human labor annotation cost. We further propose nine features to characterize each medical relationship instance. After testing on several classifiers, our proposed methods can achieve the best F1 score and Accuracy at 80%, which demonstrates the effectiveness of our approach. In summary, our study demonstrates that noisy relationships in large scale medical KBs can be identified and removed without much human involvement.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Automatically generated public medical knowledge bases (KBs), such as SemMedDB, are commonly used in various medical informatic tasks because of their comprehensive coverage. However, due to the imperfectness of the automatic algorithms for generating those KBs, they often contain noisy statements about medical concepts and relationships. For example, the extraction precision of SemRep, the tool used for constructing SemMedDB, is reported be 74.5%. Previous work focused on improving the algorithms for more accurate extraction. In this paper, however, we propose a supervised learning method to automatically verify the medical relationships. Through a study conducted on SemMedDB, we develop a method for generating a large set of training data with a relative small human labor annotation cost. We further propose nine features to characterize each medical relationship instance. After testing on several classifiers, our proposed methods can achieve the best F1 score and Accuracy at 80%, which demonstrates the effectiveness of our approach. In summary, our study demonstrates that noisy relationships in large scale medical KBs can be identified and removed without much human involvement.

查看原文本刊更多论文

在线医学知识库中的自动关系验证:基于SemMedDB的大规模研究

自动生成的公共医学知识库(KBs)，如SemMedDB，通常用于各种医疗信息任务，因为它们的覆盖范围很广。然而，由于生成这些知识库的自动算法的不完善，它们经常包含关于医学概念和关系的嘈杂陈述。例如，据报道，用于构建SemMedDB的工具SemRep的提取精度为74.5%。之前的工作主要集中在改进算法以获得更准确的提取。然而，在本文中，我们提出了一种监督学习方法来自动验证医学关系。通过对SemMedDB的研究，我们开发了一种以相对较小的人工注释成本生成大型训练数据集的方法。我们进一步提出了九个特征来描述每个医疗关系实例。经过在多个分类器上的测试，我们提出的方法可以达到最好的F1分数和准确率在80%，这证明了我们的方法的有效性。总之，我们的研究表明，在大规模医疗KBs中，可以在没有太多人为参与的情况下识别和去除噪声关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量