使用微调临床语言模型识别临床文本中的不良药物事件：机器学习研究。

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES

JMIR Formative Research Pub Date : 2025-09-11 DOI:10.2196/71949

Elizaveta Kopacheva, Aron Henriksson, Hercules Dalianis, Tora Hammar, Alisa Lincke

{"title":"使用微调临床语言模型识别临床文本中的不良药物事件：机器学习研究。","authors":"Elizaveta Kopacheva, Aron Henriksson, Hercules Dalianis, Tora Hammar, Alisa Lincke","doi":"10.2196/71949","DOIUrl":null,"url":null,"abstract":"Background: Medications are essential for health care but can cause adverse drug events (ADEs), which are harmful and sometimes fatal. Detecting ADEs is a challenging task because they are often not documented in the structured data of electronic health records (EHRs). There is a need for automatically extracting ADE-related information from clinical notes, as manual review is labor-intensive and time-consuming.Objective: This study aims to fine-tune the pretrained clinical language model, Swedish Deidentified Clinical Bidirectional Encoder Representations from Transformers (SweDeClin-BERT), for medical named entity recognition (NER) and relation extraction (RE) tasks, and to implement an integrated NER-RE approach to more effectively identify ADEs in clinical notes from clinical units in Sweden. The performance of this approach is compared with our previous machine learning method, which used conditional random fields (CRFs) and random forest (RF).Methods: A subset of clinical notes from the Stockholm EPR (Electronic Patient Record) Corpus, dated 2009-2010, containing suspected ADEs based on International Classification of Diseases, 10th Revision (ICD-10) codes in the A.1 and A.2 categories was randomly sampled. These notes were annotated by a physician with ADE-related entities and relations following the ADE annotation guidelines. We fine-tuned the SweDeClin-BERT model for the NER and RE tasks and implemented an integrated NER-RE pipeline to extract entities and relationships from clinical notes. The models were evaluated using 395 clinical notes from clinical units in Sweden. The NER-RE pipeline was then applied to classify the clinical notes as containing or not containing ADEs. In addition, we conducted an error analysis to better understand the model's behavior and to identify potential areas for improvement.Results: In total, 62% of notes contained an explicit description of an ADE, indicating that an ADE-related ICD-10 code alone does not ensure detailed event documentation. The fine-tuned SweDeClin-BERT model achieved an F1-score of 0.845 for NER and 0.81 for RE task, outperforming the baseline models (CRFs for NER and random forests for RE). In particular, the RE task showed a 53% improvement in macro-average F1-score compared to the baseline. The integrated NER-RE pipeline achieved an overall F1-score of 0.81.Conclusions: Using a domain-specific language model like SweDeClin-BERT for detecting ADEs in clinical notes demonstrates improved classification performance (0.77 in strict and 0.81 in relaxed mode) compared to conventional machine learning models like CRFs and RF. The proposed fine-tuned ADE model requires further refinement and evaluation on annotated clinical notes from another hospital to evaluate the model's generalizability. In addition, the annotation guidelines should be revised, as there is an overlap of words between the Finding and Disorder entity categories, which were not consistently distinguished by the annotators. Furthermore, future work should address the handling of compound words and split entities to better capture context in the Swedish language.","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e71949"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12425423/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study.\",\"authors\":\"Elizaveta Kopacheva, Aron Henriksson, Hercules Dalianis, Tora Hammar, Alisa Lincke\",\"doi\":\"10.2196/71949\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Medications are essential for health care but can cause adverse drug events (ADEs), which are harmful and sometimes fatal. Detecting ADEs is a challenging task because they are often not documented in the structured data of electronic health records (EHRs). There is a need for automatically extracting ADE-related information from clinical notes, as manual review is labor-intensive and time-consuming.Objective: This study aims to fine-tune the pretrained clinical language model, Swedish Deidentified Clinical Bidirectional Encoder Representations from Transformers (SweDeClin-BERT), for medical named entity recognition (NER) and relation extraction (RE) tasks, and to implement an integrated NER-RE approach to more effectively identify ADEs in clinical notes from clinical units in Sweden. The performance of this approach is compared with our previous machine learning method, which used conditional random fields (CRFs) and random forest (RF).Methods: A subset of clinical notes from the Stockholm EPR (Electronic Patient Record) Corpus, dated 2009-2010, containing suspected ADEs based on International Classification of Diseases, 10th Revision (ICD-10) codes in the A.1 and A.2 categories was randomly sampled. These notes were annotated by a physician with ADE-related entities and relations following the ADE annotation guidelines. We fine-tuned the SweDeClin-BERT model for the NER and RE tasks and implemented an integrated NER-RE pipeline to extract entities and relationships from clinical notes. The models were evaluated using 395 clinical notes from clinical units in Sweden. The NER-RE pipeline was then applied to classify the clinical notes as containing or not containing ADEs. In addition, we conducted an error analysis to better understand the model's behavior and to identify potential areas for improvement.Results: In total, 62% of notes contained an explicit description of an ADE, indicating that an ADE-related ICD-10 code alone does not ensure detailed event documentation. The fine-tuned SweDeClin-BERT model achieved an F1-score of 0.845 for NER and 0.81 for RE task, outperforming the baseline models (CRFs for NER and random forests for RE). In particular, the RE task showed a 53% improvement in macro-average F1-score compared to the baseline. The integrated NER-RE pipeline achieved an overall F1-score of 0.81.Conclusions: Using a domain-specific language model like SweDeClin-BERT for detecting ADEs in clinical notes demonstrates improved classification performance (0.77 in strict and 0.81 in relaxed mode) compared to conventional machine learning models like CRFs and RF. The proposed fine-tuned ADE model requires further refinement and evaluation on annotated clinical notes from another hospital to evaluate the model's generalizability. In addition, the annotation guidelines should be revised, as there is an overlap of words between the Finding and Disorder entity categories, which were not consistently distinguished by the annotators. Furthermore, future work should address the handling of compound words and split entities to better capture context in the Swedish language.\",\"PeriodicalId\":14841,\"journal\":{\"name\":\"JMIR Formative Research\",\"volume\":\"9 \",\"pages\":\"e71949\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12425423/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Formative Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/71949\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/71949","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景：药物治疗对卫生保健至关重要，但可能导致药物不良事件（ADEs），这些事件是有害的，有时甚至是致命的。检测ade是一项具有挑战性的任务，因为它们通常没有记录在电子健康记录（EHRs）的结构化数据中。有必要从临床记录中自动提取ade相关信息，因为手动审查是劳动密集型和耗时的。目的：本研究旨在为医学命名实体识别（NER）和关系提取（RE）任务调整预训练的临床语言模型瑞典Deidentified临床双向编码器表示（SweDeClin-BERT），并实施一种集成的NER-RE方法，以更有效地识别瑞典临床单位的临床记录中的ade。该方法的性能与我们之前使用条件随机场（CRFs）和随机森林（RF）的机器学习方法进行了比较。方法：随机抽取2009-2010年斯德哥尔摩EPR（电子病历）语料库中A.1和A.2类含有疑似ade的临床记录子集，这些临床记录基于国际疾病分类第10版（ICD-10）代码。这些笔记由具有ADE相关实体和关系的医生根据ADE注释指南进行注释。我们针对NER和RE任务对swedecline - bert模型进行了微调，并实现了一个集成的NER-RE管道，以从临床记录中提取实体和关系。这些模型使用来自瑞典临床单位的395份临床记录进行评估。然后应用NER-RE管道将临床记录分类为含有或不含有ade。此外，我们进行了错误分析，以更好地理解模型的行为，并确定需要改进的潜在领域。结果：总的来说，62%的笔记包含对ADE的明确描述，这表明ADE相关的ICD-10代码本身并不能确保详细的事件文档。优化后的SweDeClin-BERT模型对NER任务的f1得分为0.845，对RE任务的f1得分为0.81，优于基线模型（NER任务的CRFs和RE任务的随机森林）。特别是，与基线相比，RE任务显示宏观平均f1得分提高了53%。综合NER-RE管道的总体f1得分为0.81。结论：与传统的机器学习模型（如CRFs和RF）相比，使用特定领域的语言模型（如swedecline - bert）检测临床记录中的ade具有更高的分类性能（严格模式为0.77，宽松模式为0.81）。提出的微调ADE模型需要进一步完善和评估来自另一家医院的注释临床记录，以评估模型的普遍性。此外，注释指南应该修改，因为在Finding和Disorder实体类别之间存在重叠的单词，而注释者并没有始终区分它们。此外，未来的工作应该解决复合词和分割实体的处理，以更好地捕获瑞典语中的上下文。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study.

查看原文本刊更多论文

Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study.

Background: Medications are essential for health care but can cause adverse drug events (ADEs), which are harmful and sometimes fatal. Detecting ADEs is a challenging task because they are often not documented in the structured data of electronic health records (EHRs). There is a need for automatically extracting ADE-related information from clinical notes, as manual review is labor-intensive and time-consuming.

Objective: This study aims to fine-tune the pretrained clinical language model, Swedish Deidentified Clinical Bidirectional Encoder Representations from Transformers (SweDeClin-BERT), for medical named entity recognition (NER) and relation extraction (RE) tasks, and to implement an integrated NER-RE approach to more effectively identify ADEs in clinical notes from clinical units in Sweden. The performance of this approach is compared with our previous machine learning method, which used conditional random fields (CRFs) and random forest (RF).

Methods: A subset of clinical notes from the Stockholm EPR (Electronic Patient Record) Corpus, dated 2009-2010, containing suspected ADEs based on International Classification of Diseases, 10th Revision (ICD-10) codes in the A.1 and A.2 categories was randomly sampled. These notes were annotated by a physician with ADE-related entities and relations following the ADE annotation guidelines. We fine-tuned the SweDeClin-BERT model for the NER and RE tasks and implemented an integrated NER-RE pipeline to extract entities and relationships from clinical notes. The models were evaluated using 395 clinical notes from clinical units in Sweden. The NER-RE pipeline was then applied to classify the clinical notes as containing or not containing ADEs. In addition, we conducted an error analysis to better understand the model's behavior and to identify potential areas for improvement.

Results: In total, 62% of notes contained an explicit description of an ADE, indicating that an ADE-related ICD-10 code alone does not ensure detailed event documentation. The fine-tuned SweDeClin-BERT model achieved an F1-score of 0.845 for NER and 0.81 for RE task, outperforming the baseline models (CRFs for NER and random forests for RE). In particular, the RE task showed a 53% improvement in macro-average F1-score compared to the baseline. The integrated NER-RE pipeline achieved an overall F1-score of 0.81.

Conclusions: Using a domain-specific language model like SweDeClin-BERT for detecting ADEs in clinical notes demonstrates improved classification performance (0.77 in strict and 0.81 in relaxed mode) compared to conventional machine learning models like CRFs and RF. The proposed fine-tuned ADE model requires further refinement and evaluation on annotated clinical notes from another hospital to evaluate the model's generalizability. In addition, the annotation guidelines should be revised, as there is an overlap of words between the Finding and Disorder entity categories, which were not consistently distinguished by the annotators. Furthermore, future work should address the handling of compound words and split entities to better capture context in the Swedish language.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊