A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

IF 0.9 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

Applied Computing Review Pub Date : 2023-03-27 DOI:10.1145/3555776.3578577

Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, M'ario Amorim Lopes

{"title":"A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese","authors":"Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, M'ario Amorim Lopes","doi":"10.1145/3555776.3578577","DOIUrl":null,"url":null,"abstract":"Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved F1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":"52 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555776.3578577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

Abstract

Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved F1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.

查看原文本刊更多论文

葡萄牙语肿瘤健康记录的生物医学实体提取管道

癌症患者的文本健康记录通常是冗长且高度无结构的，这使得卫生专业人员对患者的治疗过程进行完整的概述非常耗时。由于这些限制可能导致次优和/或低效的治疗程序，医疗保健提供者将从有效总结这些记录信息的系统中受益匪浅。随着深度神经模型的出现，这一目标在英语临床文本中已经部分实现，然而，对于资源有限的语言，研究界仍然缺乏有效的解决方案。在本文中，我们提出了我们开发的方法，从欧洲葡萄牙语写的肿瘤健康记录中提取程序，药物和疾病。该项目是与葡萄牙肿瘤研究所合作开展的，该研究所除了保存了10多年得到适当保护的医疗记录外，还在整个项目开发过程中提供了肿瘤学家的专业知识。由于没有葡萄牙语生物医学实体提取的注释语料库，我们还提出了我们在为模型开发注释语料库时遵循的策略。最终的模型结合了神经结构和实体链接，在程序、药物和疾病的提及提取方面分别获得了88.6、95.0和55.8%的F1分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Computing Review COMPUTER SCIENCE, INFORMATION SYSTEMS-

自引率

40.00%

发文量