医疗实体识别系统的广泛误差分析和基于学习的评估以接近用户体验

Workshop on Biomedical Natural Language Processing Pub Date : 2020-06-01 DOI:10.18653/v1/2020.bionlp-1.19

I. Nejadgholi, Kathleen C. Fraser, Berry de Bruijn

{"title":"医疗实体识别系统的广泛误差分析和基于学习的评估以接近用户体验","authors":"I. Nejadgholi, Kathleen C. Fraser, Berry de Bruijn","doi":"10.18653/v1/2020.bionlp-1.19","DOIUrl":null,"url":null,"abstract":"When comparing entities extracted by a medical entity recognition system with gold standard annotations over a test set, two types of mismatches might occur, label mismatch or span mismatch. Here we focus on span mismatch and show that its severity can vary from a serious error to a fully acceptable entity extraction due to the subjectivity of span annotations. For a domain-specific BERT-based NER system, we showed that 25% of the errors have the same labels and overlapping span with gold standard entities. We collected expert judgement which shows more than 90% of these mismatches are accepted or partially accepted by the user. Using the training set of the NER system, we built a fast and lightweight entity classifier to approximate the user experience of such mismatches through accepting or rejecting them. The decisions made by this classifier are used to calculate a learning-based F-score which is shown to be a better approximation of a forgiving user’s experience than the relaxed F-score. We demonstrated the results of applying the proposed evaluation metric for a variety of deep learning medical entity recognition models trained with two datasets.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience\",\"authors\":\"I. Nejadgholi, Kathleen C. Fraser, Berry de Bruijn\",\"doi\":\"10.18653/v1/2020.bionlp-1.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When comparing entities extracted by a medical entity recognition system with gold standard annotations over a test set, two types of mismatches might occur, label mismatch or span mismatch. Here we focus on span mismatch and show that its severity can vary from a serious error to a fully acceptable entity extraction due to the subjectivity of span annotations. For a domain-specific BERT-based NER system, we showed that 25% of the errors have the same labels and overlapping span with gold standard entities. We collected expert judgement which shows more than 90% of these mismatches are accepted or partially accepted by the user. Using the training set of the NER system, we built a fast and lightweight entity classifier to approximate the user experience of such mismatches through accepting or rejecting them. The decisions made by this classifier are used to calculate a learning-based F-score which is shown to be a better approximation of a forgiving user’s experience than the relaxed F-score. We demonstrated the results of applying the proposed evaluation metric for a variety of deep learning medical entity recognition models trained with two datasets.\",\"PeriodicalId\":200974,\"journal\":{\"name\":\"Workshop on Biomedical Natural Language Processing\",\"volume\":\"127 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Biomedical Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2020.bionlp-1.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Biomedical Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.bionlp-1.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

当比较医疗实体识别系统在测试集上使用金标准注释提取的实体时，可能会出现两种类型的不匹配，即标签不匹配或跨度不匹配。这里我们关注跨度不匹配，并说明由于跨度注释的主观性，其严重性可以从严重错误到完全可接受的实体提取。对于特定领域的基于bert的NER系统，我们发现25%的错误与金标准实体具有相同的标签和重叠的跨度。我们收集了专家的判断，显示超过90%的这些不匹配被用户接受或部分接受。利用NER系统的训练集，我们构建了一个快速轻量级的实体分类器，通过接受或拒绝这些不匹配来近似用户体验。这个分类器所做的决定被用来计算一个基于学习的f分，这个f分比宽松的f分更接近宽容的用户体验。我们展示了将所提出的评估指标应用于使用两个数据集训练的各种深度学习医学实体识别模型的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience

When comparing entities extracted by a medical entity recognition system with gold standard annotations over a test set, two types of mismatches might occur, label mismatch or span mismatch. Here we focus on span mismatch and show that its severity can vary from a serious error to a fully acceptable entity extraction due to the subjectivity of span annotations. For a domain-specific BERT-based NER system, we showed that 25% of the errors have the same labels and overlapping span with gold standard entities. We collected expert judgement which shows more than 90% of these mismatches are accepted or partially accepted by the user. Using the training set of the NER system, we built a fast and lightweight entity classifier to approximate the user experience of such mismatches through accepting or rejecting them. The decisions made by this classifier are used to calculate a learning-based F-score which is shown to be a better approximation of a forgiving user’s experience than the relaxed F-score. We demonstrated the results of applying the proposed evaluation metric for a variety of deep learning medical entity recognition models trained with two datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Biomedical Natural Language Processing

自引率

0.00%

发文量