Named Entity Recognition of Medical Examination Reports Based on BiLSTM+CRF Model

2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT) Pub Date : 2023-06-16 DOI:10.1109/AINIT59027.2023.10212675

Ying Zhang, Fan Zhang

{"title":"Named Entity Recognition of Medical Examination Reports Based on BiLSTM+CRF Model","authors":"Ying Zhang, Fan Zhang","doi":"10.1109/AINIT59027.2023.10212675","DOIUrl":null,"url":null,"abstract":"Medical examination reports are typical unstructured data written in natural language. Named entity recognition (NER) is used to extract key information from medical texts, which serves as the foundation for further analysis of entity relationships and extraction of diagnostic knowledge. Currently, using either deep learning models or earlier conditional random field (CRF) models alone has their respective drawbacks, such as heavy annotation workload, overfitting, and model generalization issues. Additionally, Chinese medical text data presents greater difficulties for NER tasks due to its specialized and non-standardized structure. To address these issues, this paper proposes an integrated model for extracting structured information from medical examination reports, namely the Bi-LSTM and CRF ensemble model (BLC). BLC identifies medical entities in the reports, with the BiLSTM model determining the probability of each label for individual characters and the CRF decoding ensuring the final sequence adheres to the output standards. Real gastrointestinal endoscopy reports provided by hospitals were used as experimental data for annotation, and the Bi-LSTM+CRF model was built using the TensorFlow framework for training the experimental data. The effects of different parameters on entity recognition were compared. The results showed that under the BIOES annotation scheme, the model's recognition performance was superior to the BIO annotation scheme. Good segmentation results for entity categories with well-segmented features led to better recognition performance.","PeriodicalId":276778,"journal":{"name":"2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINIT59027.2023.10212675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Medical examination reports are typical unstructured data written in natural language. Named entity recognition (NER) is used to extract key information from medical texts, which serves as the foundation for further analysis of entity relationships and extraction of diagnostic knowledge. Currently, using either deep learning models or earlier conditional random field (CRF) models alone has their respective drawbacks, such as heavy annotation workload, overfitting, and model generalization issues. Additionally, Chinese medical text data presents greater difficulties for NER tasks due to its specialized and non-standardized structure. To address these issues, this paper proposes an integrated model for extracting structured information from medical examination reports, namely the Bi-LSTM and CRF ensemble model (BLC). BLC identifies medical entities in the reports, with the BiLSTM model determining the probability of each label for individual characters and the CRF decoding ensuring the final sequence adheres to the output standards. Real gastrointestinal endoscopy reports provided by hospitals were used as experimental data for annotation, and the Bi-LSTM+CRF model was built using the TensorFlow framework for training the experimental data. The effects of different parameters on entity recognition were compared. The results showed that under the BIOES annotation scheme, the model's recognition performance was superior to the BIO annotation scheme. Good segmentation results for entity categories with well-segmented features led to better recognition performance.

查看原文本刊更多论文

基于BiLSTM+CRF模型的体检报告命名实体识别

医学检查报告是典型的用自然语言编写的非结构化数据。命名实体识别(NER)用于从医学文本中提取关键信息，为进一步分析实体关系和提取诊断知识奠定基础。目前，单独使用深度学习模型或早期的条件随机场(CRF)模型都有各自的缺点，如繁重的注释工作量、过拟合和模型泛化问题。此外，中医文本数据由于其专业化和非标准化的结构，给NER任务带来了较大的困难。为了解决这些问题，本文提出了一种从体检报告中提取结构化信息的集成模型，即Bi-LSTM和CRF集成模型(BLC)。BLC识别报告中的医疗实体，BiLSTM模型确定每个标签对应单个字符的概率，CRF解码确保最终序列符合输出标准。采用医院提供的真实胃肠内镜报告作为实验数据进行标注，利用TensorFlow框架构建Bi-LSTM+CRF模型对实验数据进行训练。比较了不同参数对实体识别的影响。结果表明，在BIOES标注方案下，该模型的识别性能优于BIO标注方案。对于特征分割良好的实体类别，分割效果好，识别性能好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)

自引率

0.00%

发文量