基于领域知识和位置编码的医学命名实体识别。

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Shuifa Sun, Qin Hu, Fengjiao Xu, Feng Hu, Yirong Wu, Ben Wang
{"title":"基于领域知识和位置编码的医学命名实体识别。","authors":"Shuifa Sun, Qin Hu, Fengjiao Xu, Feng Hu, Yirong Wu, Ben Wang","doi":"10.1186/s12911-025-03037-0","DOIUrl":null,"url":null,"abstract":"<p><p>A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence's feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"235"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220262/pdf/","citationCount":"0","resultStr":"{\"title\":\"Medical named entity recognition based on domain knowledge and position encoding.\",\"authors\":\"Shuifa Sun, Qin Hu, Fengjiao Xu, Feng Hu, Yirong Wu, Ben Wang\",\"doi\":\"10.1186/s12911-025-03037-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence's feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"235\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220262/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-03037-0\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03037-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

提出了一种基于医学领域知识和位置编码的中文电子病历命名实体识别模型,其重点是精确的边界检测。首先通过词法适配器将医学领域特定术语集成到BERT模块中。经过预训练,该模型捕获包含词法信息和边界信息的动态字符特征表示。在特征编码模块中,采用Star-Transformer和BiLSTM分别提取局部特征和远距离特征,生成序列的特征表示。此外,考虑到文本中字符之间的相对位置信息对识别结果的影响,采用旋转位置嵌入(RoPE)对Star-Transformer进行改进,提高了提取语义特征的能力。在CCKS2020数据集上的实验结果表明,f1得分提高了85.78%。与基线模型相比,f1得分提高了2.96%。对于自建的乳腺癌超声报告数据集,也观察到了改进,证明了该模型在医学领域的有效性和适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Medical named entity recognition based on domain knowledge and position encoding.

A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence's feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信