专有大语言模型在标记产科事故报告中的准确性。

IF 2.3 Q2 HEALTH CARE SCIENCES & SERVICES
Jeanene Johnson, Conner Brown, Grace Lee, Keith Morse
{"title":"专有大语言模型在标记产科事故报告中的准确性。","authors":"Jeanene Johnson, Conner Brown, Grace Lee, Keith Morse","doi":"10.1016/j.jcjq.2024.08.001","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.</p><p><strong>Methods: </strong>A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.</p><p><strong>Results: </strong>The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.</p><p><strong>Conclusion: </strong>The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems.</p>","PeriodicalId":14835,"journal":{"name":"Joint Commission journal on quality and patient safety","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accuracy of a Proprietary Large Language Model in Labeling Obstetric Incident Reports.\",\"authors\":\"Jeanene Johnson, Conner Brown, Grace Lee, Keith Morse\",\"doi\":\"10.1016/j.jcjq.2024.08.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.</p><p><strong>Methods: </strong>A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.</p><p><strong>Results: </strong>The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.</p><p><strong>Conclusion: </strong>The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems.</p>\",\"PeriodicalId\":14835,\"journal\":{\"name\":\"Joint Commission journal on quality and patient safety\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Joint Commission journal on quality and patient safety\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jcjq.2024.08.001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Commission journal on quality and patient safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.jcjq.2024.08.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:使用事故报告系统收集的数据具有挑战性,因为这些数据主要是大量的定性信息。大型语言模型(LLM),如 ChatGPT,在文本总结和标注方面提供了新的功能,可以支持安全数据趋势和早期识别机会,以防止对患者造成伤害。本研究评估了专利语言模型(GPT-3.5)自动标注真实世界产科事件报告横截面样本的能力:提取了 2022 年 12 月至 2023 年 5 月间提交给产科住院部的 370 份事件报告样本。人工标注的标签由临床医生审核员指定,被视为黄金标准。LLM 仅根据其预先训练的知识和提示中包含的信息对事件报告进行标记。评估的主要结果包括灵敏度、特异性、阳性预测值和阴性预测值。次要结果是评估人类对模型贴标签理由的感知质量:结果:结果表明,LLM 能够以较高的灵敏度和特异性为事件报告贴标签。该模型共使用了 79 个标签,而审核员使用了 49 个标签。该模型的总体灵敏度为 85.7%,特异性为 97.9%。阳性和阴性预测值分别为 53.2% 和 99.6%。对于 60.8% 的标签,评审员认可模型应用标签的理由:专有的 LLM 展示了以高灵敏度和特异性对产科事故报告进行标记的能力。LLM 有助于更有效地利用事故报告系统中的数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Accuracy of a Proprietary Large Language Model in Labeling Obstetric Incident Reports.

Background: Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.

Methods: A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.

Results: The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.

Conclusion: The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.80
自引率
4.30%
发文量
116
审稿时长
49 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信