Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016.

CEUR workshop proceedings Pub Date : 2016-09-01

Aurélie Névéol, K Bretonnel Cohen, Cyril Grouin, Thierry Hamon, Thomas Lavergne, Liadh Kelly, Lorraine Goeuriot, Grégoire Rey, Aude Robert, Xavier Tannier, Pierre Zweigenbaum

{"title":"Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016.","authors":"Aurélie Névéol, K Bretonnel Cohen, Cyril Grouin, Thierry Hamon, Thomas Lavergne, Liadh Kelly, Lorraine Goeuriot, Grégoire Rey, Aude Robert, Xavier Tannier, Pierre Zweigenbaum","doi":"","DOIUrl":null,"url":null,"abstract":"This paper reports on Task 2 of the 2016 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with named entity recognition and normalization in French narratives, as offered in CLEF eHealth 2015. Named entity recognition involved ten types of entities including disorders that were defined according to Semantic Groups in the Unified Medical Language System® (UMLS®), which was also used for normalizing the entities. In addition, we introduced a large-scale classification task in French death certificates, which consisted of extracting causes of death as coded in the International Classification of Diseases, tenth revision (ICD10). Participant systems were evaluated against a blind reference standard of 832 titles of scientific articles indexed in MEDLINE, 4 drug monographs published by the European Medicines Agency (EMEA) and 27,850 death certificates using Precision, Recall and F-measure. In total, seven teams participated, including five in the entity recognition and normalization task, and five in the death certificate coding task. Three teams submitted their systems to our newly offered reproducibility track. For entity recognition, the highest performance was achieved on the EMEA corpus, with an overall F-measure of 0.702 for plain entities recognition and 0.529 for normalized entity recognition. For entity normalization, the highest performance was achieved on the MEDLINE corpus, with an overall F-measure of 0.552. For death certificate coding, the highest performance was 0.848 F-measure.","PeriodicalId":72554,"journal":{"name":"CEUR workshop proceedings","volume":"1609 ","pages":"28-42"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5756095/pdf/nihms921614.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CEUR workshop proceedings","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper reports on Task 2 of the 2016 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with named entity recognition and normalization in French narratives, as offered in CLEF eHealth 2015. Named entity recognition involved ten types of entities including disorders that were defined according to Semantic Groups in the Unified Medical Language System^® (UMLS^®), which was also used for normalizing the entities. In addition, we introduced a large-scale classification task in French death certificates, which consisted of extracting causes of death as coded in the International Classification of Diseases, tenth revision (ICD10). Participant systems were evaluated against a blind reference standard of 832 titles of scientific articles indexed in MEDLINE, 4 drug monographs published by the European Medicines Agency (EMEA) and 27,850 death certificates using Precision, Recall and F-measure. In total, seven teams participated, including five in the entity recognition and normalization task, and five in the death certificate coding task. Three teams submitted their systems to our newly offered reproducibility track. For entity recognition, the highest performance was achieved on the EMEA corpus, with an overall F-measure of 0.702 for plain entities recognition and 0.529 for normalized entity recognition. For entity normalization, the highest performance was achieved on the MEDLINE corpus, with an overall F-measure of 0.552. For death certificate coding, the highest performance was 0.848 F-measure.

本刊更多论文

CLEF健康评估实验室临床信息提取2016。

本文报告了2016年CLEF eHealth评估实验室的Task 2，它扩展了ShARe/CLEF eHealth评估实验室之前的信息提取任务。这项任务继续在法语叙述中进行命名实体识别和规范化，如CLEF eHealth 2015所提供的那样。命名实体识别涉及十种类型的实体，包括根据统一医学语言系统®(UMLS®)中的语义组定义的疾病，该系统也用于规范化实体。此外，我们在法国死亡证明中引入了一项大规模分类任务，其中包括提取国际疾病分类第十版(ICD10)编码的死亡原因。参与者系统根据MEDLINE索引的832篇科学文章标题、欧洲药品管理局(EMEA)发表的4篇药物专著和使用Precision、Recall和F-measure的27,850份死亡证明的盲参考标准进行评估。总共有7个小组参加，其中5个小组参加实体识别和规范化任务，5个小组参加死亡证明编码任务。三个团队将他们的系统提交到我们新提供的可重复性轨道上。对于实体识别，在EMEA语料库上实现了最高的性能，普通实体识别的总体f值为0.702，规范化实体识别的总体f值为0.529。对于实体规范化，在MEDLINE语料库上实现了最高的性能，总体f值为0.552。对于死亡证明编码，最高性能为0.848 F-measure。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CEUR workshop proceedings

CiteScore

1.10

自引率

0.00%

发文量