Named Entity Recognition of Medical Examination Reports Based on BiLSTM+CRF Model

Ying Zhang, Fan Zhang
{"title":"Named Entity Recognition of Medical Examination Reports Based on BiLSTM+CRF Model","authors":"Ying Zhang, Fan Zhang","doi":"10.1109/AINIT59027.2023.10212675","DOIUrl":null,"url":null,"abstract":"Medical examination reports are typical unstructured data written in natural language. Named entity recognition (NER) is used to extract key information from medical texts, which serves as the foundation for further analysis of entity relationships and extraction of diagnostic knowledge. Currently, using either deep learning models or earlier conditional random field (CRF) models alone has their respective drawbacks, such as heavy annotation workload, overfitting, and model generalization issues. Additionally, Chinese medical text data presents greater difficulties for NER tasks due to its specialized and non-standardized structure. To address these issues, this paper proposes an integrated model for extracting structured information from medical examination reports, namely the Bi-LSTM and CRF ensemble model (BLC). BLC identifies medical entities in the reports, with the BiLSTM model determining the probability of each label for individual characters and the CRF decoding ensuring the final sequence adheres to the output standards. Real gastrointestinal endoscopy reports provided by hospitals were used as experimental data for annotation, and the Bi-LSTM+CRF model was built using the TensorFlow framework for training the experimental data. The effects of different parameters on entity recognition were compared. The results showed that under the BIOES annotation scheme, the model's recognition performance was superior to the BIO annotation scheme. Good segmentation results for entity categories with well-segmented features led to better recognition performance.","PeriodicalId":276778,"journal":{"name":"2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINIT59027.2023.10212675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Medical examination reports are typical unstructured data written in natural language. Named entity recognition (NER) is used to extract key information from medical texts, which serves as the foundation for further analysis of entity relationships and extraction of diagnostic knowledge. Currently, using either deep learning models or earlier conditional random field (CRF) models alone has their respective drawbacks, such as heavy annotation workload, overfitting, and model generalization issues. Additionally, Chinese medical text data presents greater difficulties for NER tasks due to its specialized and non-standardized structure. To address these issues, this paper proposes an integrated model for extracting structured information from medical examination reports, namely the Bi-LSTM and CRF ensemble model (BLC). BLC identifies medical entities in the reports, with the BiLSTM model determining the probability of each label for individual characters and the CRF decoding ensuring the final sequence adheres to the output standards. Real gastrointestinal endoscopy reports provided by hospitals were used as experimental data for annotation, and the Bi-LSTM+CRF model was built using the TensorFlow framework for training the experimental data. The effects of different parameters on entity recognition were compared. The results showed that under the BIOES annotation scheme, the model's recognition performance was superior to the BIO annotation scheme. Good segmentation results for entity categories with well-segmented features led to better recognition performance.
基于BiLSTM+CRF模型的体检报告命名实体识别
医学检查报告是典型的用自然语言编写的非结构化数据。命名实体识别(NER)用于从医学文本中提取关键信息,为进一步分析实体关系和提取诊断知识奠定基础。目前,单独使用深度学习模型或早期的条件随机场(CRF)模型都有各自的缺点,如繁重的注释工作量、过拟合和模型泛化问题。此外,中医文本数据由于其专业化和非标准化的结构,给NER任务带来了较大的困难。为了解决这些问题,本文提出了一种从体检报告中提取结构化信息的集成模型,即Bi-LSTM和CRF集成模型(BLC)。BLC识别报告中的医疗实体,BiLSTM模型确定每个标签对应单个字符的概率,CRF解码确保最终序列符合输出标准。采用医院提供的真实胃肠内镜报告作为实验数据进行标注,利用TensorFlow框架构建Bi-LSTM+CRF模型对实验数据进行训练。比较了不同参数对实体识别的影响。结果表明,在BIOES标注方案下,该模型的识别性能优于BIO标注方案。对于特征分割良好的实体类别,分割效果好,识别性能好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信