Danchen Zhang, Daqing He, Ning Zou, Xin Zhou, Fen Pei
{"title":"Automatic Relationship Verification in Online Medical Knowledge Base: a Large Scale Study in SemMedDB","authors":"Danchen Zhang, Daqing He, Ning Zou, Xin Zhou, Fen Pei","doi":"10.1109/BIBM.2018.8621316","DOIUrl":null,"url":null,"abstract":"Automatically generated public medical knowledge bases (KBs), such as SemMedDB, are commonly used in various medical informatic tasks because of their comprehensive coverage. However, due to the imperfectness of the automatic algorithms for generating those KBs, they often contain noisy statements about medical concepts and relationships. For example, the extraction precision of SemRep, the tool used for constructing SemMedDB, is reported be 74.5%. Previous work focused on improving the algorithms for more accurate extraction. In this paper, however, we propose a supervised learning method to automatically verify the medical relationships. Through a study conducted on SemMedDB, we develop a method for generating a large set of training data with a relative small human labor annotation cost. We further propose nine features to characterize each medical relationship instance. After testing on several classifiers, our proposed methods can achieve the best F1 score and Accuracy at 80%, which demonstrates the effectiveness of our approach. In summary, our study demonstrates that noisy relationships in large scale medical KBs can be identified and removed without much human involvement.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Automatically generated public medical knowledge bases (KBs), such as SemMedDB, are commonly used in various medical informatic tasks because of their comprehensive coverage. However, due to the imperfectness of the automatic algorithms for generating those KBs, they often contain noisy statements about medical concepts and relationships. For example, the extraction precision of SemRep, the tool used for constructing SemMedDB, is reported be 74.5%. Previous work focused on improving the algorithms for more accurate extraction. In this paper, however, we propose a supervised learning method to automatically verify the medical relationships. Through a study conducted on SemMedDB, we develop a method for generating a large set of training data with a relative small human labor annotation cost. We further propose nine features to characterize each medical relationship instance. After testing on several classifiers, our proposed methods can achieve the best F1 score and Accuracy at 80%, which demonstrates the effectiveness of our approach. In summary, our study demonstrates that noisy relationships in large scale medical KBs can be identified and removed without much human involvement.