MF-MNER：中国临床电子病历中的多模型融合 MNER

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences Pub Date : 2024-04-05 DOI:10.1007/s12539-024-00624-z

Haoze Du, Jiahao Xu, Zhiyong Du, Lihui Chen, Shaohui Ma, Dongqing Wei, Xianfang Wang

{"title":"MF-MNER：中国临床电子病历中的多模型融合 MNER","authors":"Haoze Du, Jiahao Xu, Zhiyong Du, Lihui Chen, Shaohui Ma, Dongqing Wei, Xianfang Wang","doi":"10.1007/s12539-024-00624-z","DOIUrl":null,"url":null,"abstract":"To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision, macro avg Recall, weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score, macro avg F1-score, weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics (micro average, macro average, weighted average) under the same dataset conditions. In the case of weighted average, the Precision, Recall, and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision, Recall, and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER.<h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>Illustration of the proposed MF-MNER. The method mainly includes four steps: (1) medical electronic medical records need to be cleared, coded, and segmented. (2) The semantic representation obtained by dynamic fusion of the bidirectional autoregressive converter (BART) model. (3) The sequence information is captured by a bi-directional short-term memory (Bi-LSTM) network. (4) the multi-task entity recognition is decoded and output by conditional random field (CRF).\n","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":"2014 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records\",\"authors\":\"Haoze Du, Jiahao Xu, Zhiyong Du, Lihui Chen, Shaohui Ma, Dongqing Wei, Xianfang Wang\",\"doi\":\"10.1007/s12539-024-00624-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision, macro avg Recall, weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score, macro avg F1-score, weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics (micro average, macro average, weighted average) under the same dataset conditions. In the case of weighted average, the Precision, Recall, and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision, Recall, and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER.<h3 data-test=\\\"abstract-sub-heading\\\">Graphical Abstract</h3>Illustration of the proposed MF-MNER. The method mainly includes four steps: (1) medical electronic medical records need to be cleared, coded, and segmented. (2) The semantic representation obtained by dynamic fusion of the bidirectional autoregressive converter (BART) model. (3) The sequence information is captured by a bi-directional short-term memory (Bi-LSTM) network. (4) the multi-task entity recognition is decoded and output by conditional random field (CRF).\\n\",\"PeriodicalId\":13670,\"journal\":{\"name\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"volume\":\"2014 1\",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s12539-024-00624-z\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-024-00624-z","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

针对临床电子病历中缺乏中文注释导致实体识别率低的问题，本文提出了一种多医疗实体识别方法 F-MNER，该方法采用了 BART、Bi-LSTM 和 CRF 三种融合技术。首先，在对电子病历进行清理、编码和分割后，利用双向自回归变换器（BART）模型对获得的语义表征进行动态融合。然后，使用双向长短期记忆（Bi-LSTM）网络捕捉序列信息。最后，使用条件随机场（CRF）进行解码并输出多任务实体识别。我们在 CCKS2019 数据集上进行了实验，微观平均精度（Micro avg Precision）、宏观平均召回率（Macro avg Recall）、加权平均精度（Weighted avg Precision）分别达到 0.880、0.887 和 0.883，微观平均 F1 分数（Micro avg F1-score）、宏观平均 F1 分数（Macro avg F1-score）、加权平均 F1 分数（Weighted avg F1-score）分别达到 0.875、0.876 和 0.876。与现有模型相比，在相同的数据集条件下，我们的方法在三个评价指标（微观平均值、宏观平均值、加权平均值）上都优于现有文献。在加权平均的情况下，精确度、召回率和 F1 分数分别比现有的 BERT-BiLSTM-CRF 模型高出 19.64%、15.67% 和 17.58%。使用我们的 MF-MNER 在实际临床数据集上进行了实验，在 micro-avg 评估机制下，精确度、召回率和 F1-score 分别为 0.638、0.825 和 0.719。在宏观质量评价机制下，精确度、召回率和 F1 分数分别为 0.685、0.800 和 0.733。在加权平均值评价机制下，精确度、召回率和 F1 分数分别为 0.647、0.825 和 0.722。以上结果表明，我们的方法 MF-MNER 可以综合 BART、Bi-LSTM 和 CRF 层的优势，在少量注释的情况下显著提高下游命名实体识别任务的性能，并在召回得分方面取得了优异的表现，具有一定的实用意义。本文结果的源代码和数据集可在 https://github.com/xfwang1969/MF-MNER.Graphical 网站上获取。摘要图示了所提出的 MF-MNER。该方法主要包括四个步骤：(1) 需要对医疗电子病历进行清理、编码和分割。(2）通过双向自回归转换器（BART）模型的动态融合获得语义表示。(3) 序列信息由双向短时记忆（Bi-LSTM）网络捕获。(4) 多任务实体识别由条件随机场（CRF）解码和输出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records

查看原文本刊更多论文

MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records

To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision, macro avg Recall, weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score, macro avg F1-score, weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics (micro average, macro average, weighted average) under the same dataset conditions. In the case of weighted average, the Precision, Recall, and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision, Recall, and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER.

Graphical Abstract

Illustration of the proposed MF-MNER. The method mainly includes four steps: (1) medical electronic medical records need to be cleared, coded, and segmented. (2) The semantic representation obtained by dynamic fusion of the bidirectional autoregressive converter (BART) model. (3) The sequence information is captured by a bi-directional short-term memory (Bi-LSTM) network. (4) the multi-task entity recognition is decoded and output by conditional random field (CRF).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Interdisciplinary Sciences: Computational Life Sciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

8.60

自引率

4.20%

发文量

期刊介绍： Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology. The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer. The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.