Automatic Relationship Verification in Online Medical Knowledge Base: a Large Scale Study in SemMedDB

Danchen Zhang, Daqing He, Ning Zou, Xin Zhou, Fen Pei
{"title":"Automatic Relationship Verification in Online Medical Knowledge Base: a Large Scale Study in SemMedDB","authors":"Danchen Zhang, Daqing He, Ning Zou, Xin Zhou, Fen Pei","doi":"10.1109/BIBM.2018.8621316","DOIUrl":null,"url":null,"abstract":"Automatically generated public medical knowledge bases (KBs), such as SemMedDB, are commonly used in various medical informatic tasks because of their comprehensive coverage. However, due to the imperfectness of the automatic algorithms for generating those KBs, they often contain noisy statements about medical concepts and relationships. For example, the extraction precision of SemRep, the tool used for constructing SemMedDB, is reported be 74.5%. Previous work focused on improving the algorithms for more accurate extraction. In this paper, however, we propose a supervised learning method to automatically verify the medical relationships. Through a study conducted on SemMedDB, we develop a method for generating a large set of training data with a relative small human labor annotation cost. We further propose nine features to characterize each medical relationship instance. After testing on several classifiers, our proposed methods can achieve the best F1 score and Accuracy at 80%, which demonstrates the effectiveness of our approach. In summary, our study demonstrates that noisy relationships in large scale medical KBs can be identified and removed without much human involvement.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Automatically generated public medical knowledge bases (KBs), such as SemMedDB, are commonly used in various medical informatic tasks because of their comprehensive coverage. However, due to the imperfectness of the automatic algorithms for generating those KBs, they often contain noisy statements about medical concepts and relationships. For example, the extraction precision of SemRep, the tool used for constructing SemMedDB, is reported be 74.5%. Previous work focused on improving the algorithms for more accurate extraction. In this paper, however, we propose a supervised learning method to automatically verify the medical relationships. Through a study conducted on SemMedDB, we develop a method for generating a large set of training data with a relative small human labor annotation cost. We further propose nine features to characterize each medical relationship instance. After testing on several classifiers, our proposed methods can achieve the best F1 score and Accuracy at 80%, which demonstrates the effectiveness of our approach. In summary, our study demonstrates that noisy relationships in large scale medical KBs can be identified and removed without much human involvement.
在线医学知识库中的自动关系验证:基于SemMedDB的大规模研究
自动生成的公共医学知识库(KBs),如SemMedDB,通常用于各种医疗信息任务,因为它们的覆盖范围很广。然而,由于生成这些知识库的自动算法的不完善,它们经常包含关于医学概念和关系的嘈杂陈述。例如,据报道,用于构建SemMedDB的工具SemRep的提取精度为74.5%。之前的工作主要集中在改进算法以获得更准确的提取。然而,在本文中,我们提出了一种监督学习方法来自动验证医学关系。通过对SemMedDB的研究,我们开发了一种以相对较小的人工注释成本生成大型训练数据集的方法。我们进一步提出了九个特征来描述每个医疗关系实例。经过在多个分类器上的测试,我们提出的方法可以达到最好的F1分数和准确率在80%,这证明了我们的方法的有效性。总之,我们的研究表明,在大规模医疗KBs中,可以在没有太多人为参与的情况下识别和去除噪声关系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信