Shuifa Sun, Qin Hu, Fengjiao Xu, Feng Hu, Yirong Wu, Ben Wang
{"title":"基于领域知识和位置编码的医学命名实体识别。","authors":"Shuifa Sun, Qin Hu, Fengjiao Xu, Feng Hu, Yirong Wu, Ben Wang","doi":"10.1186/s12911-025-03037-0","DOIUrl":null,"url":null,"abstract":"<p><p>A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence's feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"235"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220262/pdf/","citationCount":"0","resultStr":"{\"title\":\"Medical named entity recognition based on domain knowledge and position encoding.\",\"authors\":\"Shuifa Sun, Qin Hu, Fengjiao Xu, Feng Hu, Yirong Wu, Ben Wang\",\"doi\":\"10.1186/s12911-025-03037-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence's feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"235\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220262/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-03037-0\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03037-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Medical named entity recognition based on domain knowledge and position encoding.
A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence's feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.