Toward equitable documentation: Evaluating ChatGPT’s role in identifying and rephrasing stigmatizing language in electronic health records

IF 3.7 2区 医学 Q1 NURSING
Zhihong Zhang PhD, RN , Jihye Kim Scroggins PhD, RN , Sarah Harkins BSN, RN , Ismael Ibrahim Hulchafo MD, MS , Hans Moen PhD , Michele Tadiello MS , Veronica Barcelona PhD, RN , Maxim Topaz PhD, RN
{"title":"Toward equitable documentation: Evaluating ChatGPT’s role in identifying and rephrasing stigmatizing language in electronic health records","authors":"Zhihong Zhang PhD, RN ,&nbsp;Jihye Kim Scroggins PhD, RN ,&nbsp;Sarah Harkins BSN, RN ,&nbsp;Ismael Ibrahim Hulchafo MD, MS ,&nbsp;Hans Moen PhD ,&nbsp;Michele Tadiello MS ,&nbsp;Veronica Barcelona PhD, RN ,&nbsp;Maxim Topaz PhD, RN","doi":"10.1016/j.outlook.2025.102472","DOIUrl":null,"url":null,"abstract":"<div><div>Stigmatizing language in electronic health records (EHRs) harms clinician and patient relationships, reinforcing health disparities. To assess ChatGPT’s ability to reduce stigmatizing language in clinical notes. We analyzed 140 clinical notes and 150 stigmatizing examples from 2 urban hospitals. ChatGPT-4 identified and rephrased stigmatizing language. Identification performance was evaluated using precision, recall, and F1 score, with human expert annotations as the gold standard. Rephrasing quality was rated by experts on a three-point Likert scale for de-stigmatization, faithfulness, conciseness, and clarity. ChatGPT showed poor overall identification (micro-F1 = 0.51) but moderate-to-high performance across individual stigmatizing language categories (micro-F1 = 0.69–0.91). Rephrasing scored 2.7 for de-stigmatization, 2.8 for faithfulness, and 3.0 for conciseness and clarity. Prompt design significantly affected ChatGPT’s performance. While ChatGPT has limitations in automatic identification, it can be used to support real-time identification and rephrasing stigmatizing language in EHRs with appropriate prompt design and human oversight.</div></div>","PeriodicalId":54705,"journal":{"name":"Nursing Outlook","volume":"73 4","pages":"Article 102472"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nursing Outlook","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0029655425001253","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NURSING","Score":null,"Total":0}
引用次数: 0

Abstract

Stigmatizing language in electronic health records (EHRs) harms clinician and patient relationships, reinforcing health disparities. To assess ChatGPT’s ability to reduce stigmatizing language in clinical notes. We analyzed 140 clinical notes and 150 stigmatizing examples from 2 urban hospitals. ChatGPT-4 identified and rephrased stigmatizing language. Identification performance was evaluated using precision, recall, and F1 score, with human expert annotations as the gold standard. Rephrasing quality was rated by experts on a three-point Likert scale for de-stigmatization, faithfulness, conciseness, and clarity. ChatGPT showed poor overall identification (micro-F1 = 0.51) but moderate-to-high performance across individual stigmatizing language categories (micro-F1 = 0.69–0.91). Rephrasing scored 2.7 for de-stigmatization, 2.8 for faithfulness, and 3.0 for conciseness and clarity. Prompt design significantly affected ChatGPT’s performance. While ChatGPT has limitations in automatic identification, it can be used to support real-time identification and rephrasing stigmatizing language in EHRs with appropriate prompt design and human oversight.
迈向公平的文件:评估ChatGPT在识别和改写电子健康记录中的污名化语言方面的作用
电子健康记录(EHRs)中的污名化语言损害了临床医生和患者的关系,加剧了健康差距。评估ChatGPT减少临床记录中污名化语言的能力。我们分析了来自2家城市医院的140个临床记录和150个污名化案例。ChatGPT-4识别并重新措辞污名化语言。识别性能使用精度、召回率和F1分数进行评估,并以人类专家注释为金标准。换句话说的质量是由专家根据去污名化、忠实、简洁和清晰的李克特量表打分的。ChatGPT表现出较差的整体识别能力(微f1 = 0.51),但在个体污名化语言类别中表现出中等到较高的表现(微f1 = 0.69-0.91)。重新措辞在去污名化方面得分为2.7,在忠实方面得分为2.8,在简洁和清晰方面得分为3.0。提示设计显著影响ChatGPT的性能。虽然ChatGPT在自动识别方面有局限性,但它可以用于支持实时识别,并在适当的提示设计和人工监督下,在电子病历中改写污名化语言。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nursing Outlook
Nursing Outlook 医学-护理
CiteScore
6.20
自引率
7.00%
发文量
109
审稿时长
25 days
期刊介绍: Nursing Outlook, a bimonthly journal, provides innovative ideas for nursing leaders through peer-reviewed articles and timely reports. Each issue examines current issues and trends in nursing practice, education, and research, offering progressive solutions to the challenges facing the profession. Nursing Outlook is the official journal of the American Academy of Nursing and the Council for the Advancement of Nursing Science and supports their mission to serve the public and the nursing profession by advancing health policy and practice through the generation, synthesis, and dissemination of nursing knowledge. The journal is included in MEDLINE, CINAHL and the Journal Citation Reports published by Clarivate Analytics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信