ChatGPT-4作为产后乳腺炎管理在线门诊助手的评价:一项观察性研究的内容分析

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS
Fatih Dolu, Oğuzhan Fatih Ay, Aydın Hakan Kupeli, Enes Karademir, Muhammed Huseyin Büyükavcı
{"title":"ChatGPT-4作为产后乳腺炎管理在线门诊助手的评价:一项观察性研究的内容分析","authors":"Fatih Dolu, Oğuzhan Fatih Ay, Aydın Hakan Kupeli, Enes Karademir, Muhammed Huseyin Büyükavcı","doi":"10.2196/68980","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The integration of artificial intelligence (AI) into clinical workflows holds promise for enhancing outpatient decision-making and patient education. ChatGPT, a large language model developed by OpenAI, has gained attention for its potential to support both clinicians and patients. However, its performance in the outpatient setting of general surgery remains underexplored.</p><p><strong>Objective: </strong>This study aimed to evaluate whether ChatGPT-4 can function as a virtual outpatient assistant in the management of puerperal mastitis by assessing the accuracy, clarity, and clinical safety of its responses to frequently asked patient questions in Turkish.</p><p><strong>Methods: </strong>Fifteen questions about puerperal mastitis were sourced from public health care websites and online forums. These questions were categorized into general information (n=2), symptoms and diagnosis (n=6), treatment (n=2), and prognosis (n=5). Each question was entered into ChatGPT-4 (September 3, 2024), and a single Turkish-language response was obtained. The responses were evaluated by a panel consisting of 3 board-certified general surgeons and 2 general surgery residents, using five criteria: sufficient length, patient-understandable language, accuracy, adherence to current guidelines, and patient safety. Quantitative metrics included the DISCERN score, Flesch-Kincaid readability score, and inter-rater reliability assessed using the intraclass correlation coefficient (ICC).</p><p><strong>Results: </strong>A total of 15 questions were evaluated. ChatGPT's responses were rated as \"excellent\" overall by the evaluators, with higher scores observed for treatment- and prognosis-related questions. A statistically significant difference was found in DISCERN scores across question types (P=.01), with treatment and prognosis questions receiving higher ratings. In contrast, no significant differences were detected in evaluator-based ratings (sufficient length, understandability, accuracy, guideline compliance, and patient safety), JAMA benchmark scores, or Flesch-Kincaid readability levels (P>.05 for all). Interrater agreement was good across all evaluation parameters (ICC=0.772); however, agreement varied when assessed by individual criteria. Correlation analyses revealed no significant overall associations between subjective ratings and objective quality measures, although a strong positive correlation between literature compliance and patient safety was identified for one question (r=0.968, P<.001).</p><p><strong>Conclusions: </strong>ChatGPT demonstrated adequate capability in providing information on puerperal mastitis, particularly for treatment and prognosis. However, evaluator variability and the subjective nature of assessments highlight the need for further optimization of AI tools. Future research should emphasize iterative questioning and dynamic updates to AI knowledge bases to enhance reliability and accessibility.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e68980"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288767/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.\",\"authors\":\"Fatih Dolu, Oğuzhan Fatih Ay, Aydın Hakan Kupeli, Enes Karademir, Muhammed Huseyin Büyükavcı\",\"doi\":\"10.2196/68980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The integration of artificial intelligence (AI) into clinical workflows holds promise for enhancing outpatient decision-making and patient education. ChatGPT, a large language model developed by OpenAI, has gained attention for its potential to support both clinicians and patients. However, its performance in the outpatient setting of general surgery remains underexplored.</p><p><strong>Objective: </strong>This study aimed to evaluate whether ChatGPT-4 can function as a virtual outpatient assistant in the management of puerperal mastitis by assessing the accuracy, clarity, and clinical safety of its responses to frequently asked patient questions in Turkish.</p><p><strong>Methods: </strong>Fifteen questions about puerperal mastitis were sourced from public health care websites and online forums. These questions were categorized into general information (n=2), symptoms and diagnosis (n=6), treatment (n=2), and prognosis (n=5). Each question was entered into ChatGPT-4 (September 3, 2024), and a single Turkish-language response was obtained. The responses were evaluated by a panel consisting of 3 board-certified general surgeons and 2 general surgery residents, using five criteria: sufficient length, patient-understandable language, accuracy, adherence to current guidelines, and patient safety. Quantitative metrics included the DISCERN score, Flesch-Kincaid readability score, and inter-rater reliability assessed using the intraclass correlation coefficient (ICC).</p><p><strong>Results: </strong>A total of 15 questions were evaluated. ChatGPT's responses were rated as \\\"excellent\\\" overall by the evaluators, with higher scores observed for treatment- and prognosis-related questions. A statistically significant difference was found in DISCERN scores across question types (P=.01), with treatment and prognosis questions receiving higher ratings. In contrast, no significant differences were detected in evaluator-based ratings (sufficient length, understandability, accuracy, guideline compliance, and patient safety), JAMA benchmark scores, or Flesch-Kincaid readability levels (P>.05 for all). Interrater agreement was good across all evaluation parameters (ICC=0.772); however, agreement varied when assessed by individual criteria. Correlation analyses revealed no significant overall associations between subjective ratings and objective quality measures, although a strong positive correlation between literature compliance and patient safety was identified for one question (r=0.968, P<.001).</p><p><strong>Conclusions: </strong>ChatGPT demonstrated adequate capability in providing information on puerperal mastitis, particularly for treatment and prognosis. However, evaluator variability and the subjective nature of assessments highlight the need for further optimization of AI tools. Future research should emphasize iterative questioning and dynamic updates to AI knowledge bases to enhance reliability and accessibility.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e68980\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288767/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/68980\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/68980","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

背景:将人工智能(AI)整合到临床工作流程中,有望增强门诊决策和患者教育。ChatGPT是OpenAI开发的一种大型语言模型,因其支持临床医生和患者的潜力而受到关注。然而,它在普通外科门诊的表现仍未得到充分探讨。目的:本研究旨在通过评估ChatGPT-4对土耳其语患者常见问题的回答的准确性、清晰度和临床安全性,评估ChatGPT-4是否可以作为产后乳腺炎管理的虚拟门诊助手。方法:从公共卫生保健网站和在线论坛上收集有关产褥期乳腺炎的15个问题。这些问题分为一般信息(n=2)、症状和诊断(n=6)、治疗(n=2)和预后(n=5)。每个问题都输入ChatGPT-4(2024年9月3日),并获得单一的土耳其语回答。回答由3名经委员会认证的普通外科医生和2名普通外科住院医师组成的小组评估,使用5个标准:足够的长度,患者可理解的语言,准确性,遵守现行指南和患者安全。定量指标包括DISCERN评分、Flesch-Kincaid可读性评分和使用类内相关系数(ICC)评估的评分者间信度。结果:共评估了15个问题。ChatGPT的回答被评估者评为“优秀”,在治疗和预后相关的问题上得分更高。不同问题类型的辨析得分有统计学意义差异(P= 0.01),治疗和预后问题得分较高。相比之下,在基于评估者的评分(足够的长度、可理解性、准确性、指南依从性和患者安全性)、JAMA基准评分或Flesch-Kincaid可读性水平(P b>)方面没有发现显著差异。所有人都是05)。所有评价参数间的一致性良好(ICC=0.772);然而,根据个人标准评估时,一致性有所不同。相关性分析显示,主观评分和客观质量指标之间没有显著的总体关联,尽管文献依从性和患者安全性之间存在很强的正相关(r=0.968),结论:ChatGPT在提供产褥期乳腺炎信息方面具有足够的能力,特别是在治疗和预后方面。然而,评估者的可变性和评估的主观性突出了进一步优化人工智能工具的必要性。未来的研究应强调迭代提问和对人工智能知识库的动态更新,以提高可靠性和可访问性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.

Background: The integration of artificial intelligence (AI) into clinical workflows holds promise for enhancing outpatient decision-making and patient education. ChatGPT, a large language model developed by OpenAI, has gained attention for its potential to support both clinicians and patients. However, its performance in the outpatient setting of general surgery remains underexplored.

Objective: This study aimed to evaluate whether ChatGPT-4 can function as a virtual outpatient assistant in the management of puerperal mastitis by assessing the accuracy, clarity, and clinical safety of its responses to frequently asked patient questions in Turkish.

Methods: Fifteen questions about puerperal mastitis were sourced from public health care websites and online forums. These questions were categorized into general information (n=2), symptoms and diagnosis (n=6), treatment (n=2), and prognosis (n=5). Each question was entered into ChatGPT-4 (September 3, 2024), and a single Turkish-language response was obtained. The responses were evaluated by a panel consisting of 3 board-certified general surgeons and 2 general surgery residents, using five criteria: sufficient length, patient-understandable language, accuracy, adherence to current guidelines, and patient safety. Quantitative metrics included the DISCERN score, Flesch-Kincaid readability score, and inter-rater reliability assessed using the intraclass correlation coefficient (ICC).

Results: A total of 15 questions were evaluated. ChatGPT's responses were rated as "excellent" overall by the evaluators, with higher scores observed for treatment- and prognosis-related questions. A statistically significant difference was found in DISCERN scores across question types (P=.01), with treatment and prognosis questions receiving higher ratings. In contrast, no significant differences were detected in evaluator-based ratings (sufficient length, understandability, accuracy, guideline compliance, and patient safety), JAMA benchmark scores, or Flesch-Kincaid readability levels (P>.05 for all). Interrater agreement was good across all evaluation parameters (ICC=0.772); however, agreement varied when assessed by individual criteria. Correlation analyses revealed no significant overall associations between subjective ratings and objective quality measures, although a strong positive correlation between literature compliance and patient safety was identified for one question (r=0.968, P<.001).

Conclusions: ChatGPT demonstrated adequate capability in providing information on puerperal mastitis, particularly for treatment and prognosis. However, evaluator variability and the subjective nature of assessments highlight the need for further optimization of AI tools. Future research should emphasize iterative questioning and dynamic updates to AI knowledge bases to enhance reliability and accessibility.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信