Ramez Kouzy , Ibrahim Ibrahim , Paul Nemr , Saketh Vinjamuri , Bassam Ballout , Juan Sebastian Gonzalez Gonzalez Diaz , Dalissa Negron Figueroa , Molly B. El Alam , Zakaria El Kouzi , Comron Hassanzadeh , Osama Mohamad , Chris Weil , Lauren Colbert , Ann Klopp
{"title":"Evaluation of AI chatbot responses to brachytherapy frequently asked questions","authors":"Ramez Kouzy , Ibrahim Ibrahim , Paul Nemr , Saketh Vinjamuri , Bassam Ballout , Juan Sebastian Gonzalez Gonzalez Diaz , Dalissa Negron Figueroa , Molly B. El Alam , Zakaria El Kouzi , Comron Hassanzadeh , Osama Mohamad , Chris Weil , Lauren Colbert , Ann Klopp","doi":"10.1016/j.brachy.2025.10.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Patients are increasingly using artificial intelligence (AI) chatbots for health information. Evaluating their reliability for specialized topics, such as brachytherapy, is crucial for guiding their safe use. We assessed a readily accessible AI chatbot's suitability for answering frequently asked questions (FAQ) related to brachytherapy.</div></div><div><h3>Methods</h3><div>We compared responses from an AI chatbot (ChatGPT 4o-mini) against gold standard (GS) authoritative sources for 10 brachytherapy frequently asked questions. Four blinded board-certified brachytherapy experts evaluated 80 response pairs using metrics, including accuracy, clinical appropriateness, readability, and tone. Five simulated patient personas with varying literacy levels were used to assess helpfulness, readability, and emotional tone. The objective readability metrics were also calculated.</div></div><div><h3>Results</h3><div>Experts rated the AI chatbot higher for accuracy (75% highly/mostly accurate vs. 50% for GS) and appropriateness (77% vs 55%), although inaccuracies were noted in both sources in a blinded review. Simulated patients preferred GS responses (62% vs. 34%), particularly lower-literacy personas, citing better perceived readability (92% easy/very easy vs. 44% for AI) and a more reassuring tone (42% vs. 24% for AI). Objective analysis confirmed that both sources significantly exceeded the recommended reading levels (e.g., >12th grade Flesch-Kincaid), with AI responses being substantially longer. Performance varied considerably across individual questions for both AI and GS sources.</div></div><div><h3>Conclusions</h3><div>In this blinded cross-sectional evaluation, a publicly available AI chatbot provided accurate responses to brachytherapy-related FAQs. However, further development and validation focused on accessibility, trustworthiness, and user-centered design are required before these tools can be safely and effectively integrated into patient-care workflows.</div></div>","PeriodicalId":55334,"journal":{"name":"Brachytherapy","volume":"25 2","pages":"Pages 275-282"},"PeriodicalIF":1.8000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brachytherapy","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S153847212500323X","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/12/16 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
Patients are increasingly using artificial intelligence (AI) chatbots for health information. Evaluating their reliability for specialized topics, such as brachytherapy, is crucial for guiding their safe use. We assessed a readily accessible AI chatbot's suitability for answering frequently asked questions (FAQ) related to brachytherapy.
Methods
We compared responses from an AI chatbot (ChatGPT 4o-mini) against gold standard (GS) authoritative sources for 10 brachytherapy frequently asked questions. Four blinded board-certified brachytherapy experts evaluated 80 response pairs using metrics, including accuracy, clinical appropriateness, readability, and tone. Five simulated patient personas with varying literacy levels were used to assess helpfulness, readability, and emotional tone. The objective readability metrics were also calculated.
Results
Experts rated the AI chatbot higher for accuracy (75% highly/mostly accurate vs. 50% for GS) and appropriateness (77% vs 55%), although inaccuracies were noted in both sources in a blinded review. Simulated patients preferred GS responses (62% vs. 34%), particularly lower-literacy personas, citing better perceived readability (92% easy/very easy vs. 44% for AI) and a more reassuring tone (42% vs. 24% for AI). Objective analysis confirmed that both sources significantly exceeded the recommended reading levels (e.g., >12th grade Flesch-Kincaid), with AI responses being substantially longer. Performance varied considerably across individual questions for both AI and GS sources.
Conclusions
In this blinded cross-sectional evaluation, a publicly available AI chatbot provided accurate responses to brachytherapy-related FAQs. However, further development and validation focused on accessibility, trustworthiness, and user-centered design are required before these tools can be safely and effectively integrated into patient-care workflows.
期刊介绍:
Brachytherapy is an international and multidisciplinary journal that publishes original peer-reviewed articles and selected reviews on the techniques and clinical applications of interstitial and intracavitary radiation in the management of cancers. Laboratory and experimental research relevant to clinical practice is also included. Related disciplines include medical physics, medical oncology, and radiation oncology and radiology. Brachytherapy publishes technical advances, original articles, reviews, and point/counterpoint on controversial issues. Original articles that address any aspect of brachytherapy are invited. Letters to the Editor-in-Chief are encouraged.