Evaluation of AI chatbot responses to brachytherapy frequently asked questions

IF 1.8 4区医学 Q4 ONCOLOGY

Brachytherapy Pub Date : 2026-03-01 Epub Date: 2025-12-16 DOI:10.1016/j.brachy.2025.10.005

Ramez Kouzy , Ibrahim Ibrahim , Paul Nemr , Saketh Vinjamuri , Bassam Ballout , Juan Sebastian Gonzalez Gonzalez Diaz , Dalissa Negron Figueroa , Molly B. El Alam , Zakaria El Kouzi , Comron Hassanzadeh , Osama Mohamad , Chris Weil , Lauren Colbert , Ann Klopp

{"title":"Evaluation of AI chatbot responses to brachytherapy frequently asked questions","authors":"Ramez Kouzy , Ibrahim Ibrahim , Paul Nemr , Saketh Vinjamuri , Bassam Ballout , Juan Sebastian Gonzalez Gonzalez Diaz , Dalissa Negron Figueroa , Molly B. El Alam , Zakaria El Kouzi , Comron Hassanzadeh , Osama Mohamad , Chris Weil , Lauren Colbert , Ann Klopp","doi":"10.1016/j.brachy.2025.10.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Patients are increasingly using artificial intelligence (AI) chatbots for health information. Evaluating their reliability for specialized topics, such as brachytherapy, is crucial for guiding their safe use. We assessed a readily accessible AI chatbot's suitability for answering frequently asked questions (FAQ) related to brachytherapy.</div></div><div><h3>Methods</h3><div>We compared responses from an AI chatbot (ChatGPT 4o-mini) against gold standard (GS) authoritative sources for 10 brachytherapy frequently asked questions. Four blinded board-certified brachytherapy experts evaluated 80 response pairs using metrics, including accuracy, clinical appropriateness, readability, and tone. Five simulated patient personas with varying literacy levels were used to assess helpfulness, readability, and emotional tone. The objective readability metrics were also calculated.</div></div><div><h3>Results</h3><div>Experts rated the AI chatbot higher for accuracy (75% highly/mostly accurate vs. 50% for GS) and appropriateness (77% vs 55%), although inaccuracies were noted in both sources in a blinded review. Simulated patients preferred GS responses (62% vs. 34%), particularly lower-literacy personas, citing better perceived readability (92% easy/very easy vs. 44% for AI) and a more reassuring tone (42% vs. 24% for AI). Objective analysis confirmed that both sources significantly exceeded the recommended reading levels (e.g., >12th grade Flesch-Kincaid), with AI responses being substantially longer. Performance varied considerably across individual questions for both AI and GS sources.</div></div><div><h3>Conclusions</h3><div>In this blinded cross-sectional evaluation, a publicly available AI chatbot provided accurate responses to brachytherapy-related FAQs. However, further development and validation focused on accessibility, trustworthiness, and user-centered design are required before these tools can be safely and effectively integrated into patient-care workflows.</div></div>","PeriodicalId":55334,"journal":{"name":"Brachytherapy","volume":"25 2","pages":"Pages 275-282"},"PeriodicalIF":1.8000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brachytherapy","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S153847212500323X","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/12/16 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Patients are increasingly using artificial intelligence (AI) chatbots for health information. Evaluating their reliability for specialized topics, such as brachytherapy, is crucial for guiding their safe use. We assessed a readily accessible AI chatbot's suitability for answering frequently asked questions (FAQ) related to brachytherapy.

Methods

We compared responses from an AI chatbot (ChatGPT 4o-mini) against gold standard (GS) authoritative sources for 10 brachytherapy frequently asked questions. Four blinded board-certified brachytherapy experts evaluated 80 response pairs using metrics, including accuracy, clinical appropriateness, readability, and tone. Five simulated patient personas with varying literacy levels were used to assess helpfulness, readability, and emotional tone. The objective readability metrics were also calculated.

Results

Experts rated the AI chatbot higher for accuracy (75% highly/mostly accurate vs. 50% for GS) and appropriateness (77% vs 55%), although inaccuracies were noted in both sources in a blinded review. Simulated patients preferred GS responses (62% vs. 34%), particularly lower-literacy personas, citing better perceived readability (92% easy/very easy vs. 44% for AI) and a more reassuring tone (42% vs. 24% for AI). Objective analysis confirmed that both sources significantly exceeded the recommended reading levels (e.g., >12th grade Flesch-Kincaid), with AI responses being substantially longer. Performance varied considerably across individual questions for both AI and GS sources.

Conclusions

In this blinded cross-sectional evaluation, a publicly available AI chatbot provided accurate responses to brachytherapy-related FAQs. However, further development and validation focused on accessibility, trustworthiness, and user-centered design are required before these tools can be safely and effectively integrated into patient-care workflows.

查看原文本刊更多论文

人工智能聊天机器人对近距离治疗常见问题的反应评估。

目的：患者越来越多地使用人工智能（AI）聊天机器人获取健康信息。评估它们在专业领域的可靠性，如近距离治疗，对于指导它们的安全使用至关重要。我们评估了一个易于访问的人工智能聊天机器人在回答与近距离治疗相关的常见问题（FAQ）方面的适用性。方法：我们比较了人工智能聊天机器人（ChatGPT 40 -mini）和金标准（GS）权威来源对10个近距离治疗常见问题的回答。四名盲法委员会认证的近距离治疗专家使用指标评估了80对反应，包括准确性、临床适当性、可读性和语气。五个不同文化水平的模拟病人角色被用来评估帮助，可读性和情绪基调。还计算了客观可读性指标。结果：专家对人工智能聊天机器人的准确性（75%高度/大部分准确，而GS为50%）和适当性（77% 对55%）的评价更高，尽管在一项盲法评价中，两个来源都指出了不准确性。模拟患者更喜欢GS反应(62% vs。34%)，尤其是低文化水平的角色，他们认为更容易理解（92%的人认为容易/非常容易，而人工智能的人认为容易/非常容易，这一比例为44%），而且语气更让人放心（42%的人认为 对44%的人认为AI更容易）。24%为人工智能)。客观分析证实，这两个来源都明显超过了推荐的阅读水平（例如，12年级的Flesch-Kincaid），人工智能的反应时间也长得多。AI和GS资源在单个问题上的表现差异很大。结论：在这项盲法横断面评估中，一个公开的人工智能聊天机器人对近距离治疗相关的常见问题提供了准确的回答。然而，在这些工具能够安全有效地集成到患者护理工作流程之前，需要进一步开发和验证可访问性、可信度和以用户为中心的设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Brachytherapy 医学-核医学

CiteScore

3.40

自引率

21.10%

发文量

119

审稿时长

9.1 weeks

期刊介绍： Brachytherapy is an international and multidisciplinary journal that publishes original peer-reviewed articles and selected reviews on the techniques and clinical applications of interstitial and intracavitary radiation in the management of cancers. Laboratory and experimental research relevant to clinical practice is also included. Related disciplines include medical physics, medical oncology, and radiation oncology and radiology. Brachytherapy publishes technical advances, original articles, reviews, and point/counterpoint on controversial issues. Original articles that address any aspect of brachytherapy are invited. Letters to the Editor-in-Chief are encouraged.