Interactive online learning for clinical entity recognition

HILDA '16 Pub Date : 2016-06-26 DOI:10.1145/2939502.2939510

L. Tari, Varish Mulwad, Anna von Reden

{"title":"Interactive online learning for clinical entity recognition","authors":"L. Tari, Varish Mulwad, Anna von Reden","doi":"10.1145/2939502.2939510","DOIUrl":null,"url":null,"abstract":"Named entity recognition and entity linking are core natural language processing components that are predominantly solved by supervised machine learning approaches. Such supervised machine learning approaches require manual annotation of training data that can be expensive to compile. The applicability of supervised, machine learning-based entity recognition and linking components in real-world applications can be hindered by the limited availability of training data. In this paper, we propose a novel approach that uses ontologies as a basis for entity recognition and linking, and captures context of neighboring tokens of the entities of interest with vectors based on syntactic and semantic features. Our approach takes user feedback so that the vector-based model can be continuously updated in an online setting. Here we demonstrate our approach in a healthcare context, using it to recognize body part and imaging modality entities within clinical documents, and map these entities to the right concepts in the RadLex and NCIT medical ontologies. Our current evaluation shows promising results on a small set of clinical documents with a precision and recall of 0.841 and 0.966. The evaluation also demonstrates that our approach is capable of continuous performance improvement with increasing size of examples. We believe that our human-in-the-loop, online learning approach to entity recognition and linking shows promise that it is suitable for real-world applications.","PeriodicalId":356971,"journal":{"name":"HILDA '16","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HILDA '16","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2939502.2939510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Named entity recognition and entity linking are core natural language processing components that are predominantly solved by supervised machine learning approaches. Such supervised machine learning approaches require manual annotation of training data that can be expensive to compile. The applicability of supervised, machine learning-based entity recognition and linking components in real-world applications can be hindered by the limited availability of training data. In this paper, we propose a novel approach that uses ontologies as a basis for entity recognition and linking, and captures context of neighboring tokens of the entities of interest with vectors based on syntactic and semantic features. Our approach takes user feedback so that the vector-based model can be continuously updated in an online setting. Here we demonstrate our approach in a healthcare context, using it to recognize body part and imaging modality entities within clinical documents, and map these entities to the right concepts in the RadLex and NCIT medical ontologies. Our current evaluation shows promising results on a small set of clinical documents with a precision and recall of 0.841 and 0.966. The evaluation also demonstrates that our approach is capable of continuous performance improvement with increasing size of examples. We believe that our human-in-the-loop, online learning approach to entity recognition and linking shows promise that it is suitable for real-world applications.

查看原文本刊更多论文

临床实体识别的交互式在线学习

命名实体识别和实体链接是自然语言处理的核心组件，主要由监督机器学习方法解决。这种有监督的机器学习方法需要对训练数据进行手动注释，编译成本可能很高。有监督的、基于机器学习的实体识别和链接组件在现实应用中的适用性可能会受到训练数据有限可用性的阻碍。在本文中，我们提出了一种新的方法，该方法使用本体作为实体识别和链接的基础，并基于语法和语义特征的向量捕获感兴趣实体的相邻标记的上下文。我们的方法采用用户反馈，因此基于向量的模型可以在在线设置中不断更新。在这里，我们将在医疗保健上下文中演示我们的方法，使用它来识别临床文档中的身体部位和成像模式实体，并将这些实体映射到RadLex和NCIT医学本体中的正确概念。我们目前的评估在一小部分临床文献上显示出有希望的结果，准确率和召回率分别为0.841和0.966。评估还表明，我们的方法能够随着样本规模的增加而持续提高性能。我们相信，我们的人在循环，在线学习方法的实体识别和链接显示出它适用于现实世界的应用前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

HILDA '16

自引率

0.00%

发文量