评估BERT对患者安全的可转移性：对多种类型的事件报告进行分类。

IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES

BMJ Health & Care Informatics Pub Date : 2025-08-18 DOI:10.1136/bmjhci-2024-101146

Ying Wang, Farah Magrabi

{"title":"评估BERT对患者安全的可转移性：对多种类型的事件报告进行分类。","authors":"Ying Wang, Farah Magrabi","doi":"10.1136/bmjhci-2024-101146","DOIUrl":null,"url":null,"abstract":"Objective: To evaluate the transferability of BERT (Bidirectional Encoder Representations from Transformers) to patient safety, we use it to classify incident reports characterised by limited data and encompassing multiple imbalanced classes.Methods: BERT was applied to classify 10 incident types and 4 severity levels by (1) fine-tuning and (2) extracting word embeddings for feature representation. Training datasets were collected from a state-wide incident reporting system in Australia (n_type/severity=2860/1160). Transferability was evaluated using three datasets: a balanced dataset (type/severity: n_benchmark=286/116); a real-world imbalanced dataset (n_original=444/4837, rare types/severity<=1%); and an independent hospital-level reporting system (n_independent=6000/5950, imbalanced). Model performance was evaluated by F-score, precision and recall, then compared with convolutional neural networks (CNNs) using BERT embeddings and local embeddings from incident reports.Results: Fine-tuned BERT outperformed small CNNs trained with BERT embedding and static word embeddings developed from scratch. The default parameters of BERT were found to be the most optimal configuration. For incident type, fine-tuned BERT achieved high F-scores above 89% across all test datasets (CNNs=81%). It effectively generalised to real-world settings, including rare incident types (eg, clinical handover with 11.1% and 30.3% improvement). For ambiguous medium and low severity levels, the F-score improvements ranged from 3.6% to 19.7% across all test datasets.Discussion: Fine-tuned BERT led to improved performance, particularly in identifying rare classes and generalising effectively to unseen data, compared with small CNNs.Conclusion: Fine-tuned BERT may be useful for classification tasks in patient safety where data privacy, scarcity and imbalance are common challenges.","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366584/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing the transferability of BERT to patient safety: classifying multiple types of incident reports.\",\"authors\":\"Ying Wang, Farah Magrabi\",\"doi\":\"10.1136/bmjhci-2024-101146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To evaluate the transferability of BERT (Bidirectional Encoder Representations from Transformers) to patient safety, we use it to classify incident reports characterised by limited data and encompassing multiple imbalanced classes.Methods: BERT was applied to classify 10 incident types and 4 severity levels by (1) fine-tuning and (2) extracting word embeddings for feature representation. Training datasets were collected from a state-wide incident reporting system in Australia (n_type/severity=2860/1160). Transferability was evaluated using three datasets: a balanced dataset (type/severity: n_benchmark=286/116); a real-world imbalanced dataset (n_original=444/4837, rare types/severity<=1%); and an independent hospital-level reporting system (n_independent=6000/5950, imbalanced). Model performance was evaluated by F-score, precision and recall, then compared with convolutional neural networks (CNNs) using BERT embeddings and local embeddings from incident reports.Results: Fine-tuned BERT outperformed small CNNs trained with BERT embedding and static word embeddings developed from scratch. The default parameters of BERT were found to be the most optimal configuration. For incident type, fine-tuned BERT achieved high F-scores above 89% across all test datasets (CNNs=81%). It effectively generalised to real-world settings, including rare incident types (eg, clinical handover with 11.1% and 30.3% improvement). For ambiguous medium and low severity levels, the F-score improvements ranged from 3.6% to 19.7% across all test datasets.Discussion: Fine-tuned BERT led to improved performance, particularly in identifying rare classes and generalising effectively to unseen data, compared with small CNNs.Conclusion: Fine-tuned BERT may be useful for classification tasks in patient safety where data privacy, scarcity and imbalance are common challenges.\",\"PeriodicalId\":9050,\"journal\":{\"name\":\"BMJ Health & Care Informatics\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366584/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Health & Care Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjhci-2024-101146\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：为了评估BERT（来自变压器的双向编码器表示）对患者安全的可转移性，我们使用它对以有限数据为特征并包含多个不平衡类别的事件报告进行分类。方法：通过(1)微调和(2)提取词嵌入进行特征表示，利用BERT对10种事件类型和4个严重级别进行分类。训练数据集收集自澳大利亚的一个全州事件报告系统（n_type/severity=2860/1160）。可转移性使用三个数据集进行评估：一个平衡数据集（类型/严重性：n_benchmark=286/116）；一个真实世界的失衡数据集（n_original=444/4837, rare types/severity）；独立的医院级报告系统（n_independent=6000/5950，不平衡）。通过f值、精度和召回率来评估模型的性能，然后与使用BERT嵌入和事件报告中的局部嵌入的卷积神经网络（cnn）进行比较。结果：微调BERT优于使用BERT嵌入和从头开发的静态词嵌入训练的小型cnn。发现BERT的默认参数是最优配置。对于事件类型，微调BERT在所有测试数据集（cnn =81%）中获得了89%以上的高f分。它有效地推广到现实环境中，包括罕见的事件类型（例如，临床交接有11.1%和30.3%的改善）。对于不明确的中、低严重程度，所有测试数据集的f分数改善范围为3.6%至19.7%。讨论：与小型cnn相比，微调BERT提高了性能，特别是在识别稀有类和有效地泛化到未见数据方面。结论：在数据隐私、稀缺和不平衡是常见挑战的患者安全分类任务中，微调BERT可能有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Assessing the transferability of BERT to patient safety: classifying multiple types of incident reports.

查看原文本刊更多论文

Assessing the transferability of BERT to patient safety: classifying multiple types of incident reports.

Objective: To evaluate the transferability of BERT (Bidirectional Encoder Representations from Transformers) to patient safety, we use it to classify incident reports characterised by limited data and encompassing multiple imbalanced classes.

Methods: BERT was applied to classify 10 incident types and 4 severity levels by (1) fine-tuning and (2) extracting word embeddings for feature representation. Training datasets were collected from a state-wide incident reporting system in Australia (n_type/severity=2860/1160). Transferability was evaluated using three datasets: a balanced dataset (type/severity: n_benchmark=286/116); a real-world imbalanced dataset (n_original=444/4837, rare types/severity<=1%); and an independent hospital-level reporting system (n_independent=6000/5950, imbalanced). Model performance was evaluated by F-score, precision and recall, then compared with convolutional neural networks (CNNs) using BERT embeddings and local embeddings from incident reports.

Results: Fine-tuned BERT outperformed small CNNs trained with BERT embedding and static word embeddings developed from scratch. The default parameters of BERT were found to be the most optimal configuration. For incident type, fine-tuned BERT achieved high F-scores above 89% across all test datasets (CNNs=81%). It effectively generalised to real-world settings, including rare incident types (eg, clinical handover with 11.1% and 30.3% improvement). For ambiguous medium and low severity levels, the F-score improvements ranged from 3.6% to 19.7% across all test datasets.

Discussion: Fine-tuned BERT led to improved performance, particularly in identifying rare classes and generalising effectively to unseen data, compared with small CNNs.

Conclusion: Fine-tuned BERT may be useful for classification tasks in patient safety where data privacy, scarcity and imbalance are common challenges.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMJ Health & Care Informatics Multiple-

CiteScore

6.10

自引率

4.90%

发文量

审稿时长

18 weeks