Terral A. Patel , Gillian Michaelson , Zoey Morton , Alexandria Harris , Brandon Smith , Richard Bourguillon , Eric Wu , Arturo Eguia , Jessica H. Maxwell
{"title":"Use of ChatGPT for patient education involving HPV-associated oropharyngeal cancer","authors":"Terral A. Patel , Gillian Michaelson , Zoey Morton , Alexandria Harris , Brandon Smith , Richard Bourguillon , Eric Wu , Arturo Eguia , Jessica H. Maxwell","doi":"10.1016/j.amjoto.2025.104642","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study aims to investigate the ability of ChatGPT to generate reliably accurate responses to patient-based queries specifically regarding oropharyngeal squamous cell carcinoma (OPSCC) of the head and neck.</div></div><div><h3>Study design</h3><div>Retrospective review of published abstracts.</div></div><div><h3>Setting</h3><div>Publicly available generative artificial intelligence.</div></div><div><h3>Methods</h3><div>ChatGPT 3.5 (May 2024) was queried with a set of 30 questions pertaining to HPV-associated oropharyngeal cancer that the average patient may ask. This set of questions was queried a total of four times preceded by a different prompt. The answer prompts for each question set were reviewed and graded on a four-part Likert scale. A Flesch-Kincaid reading level was also calculated for each prompt.</div></div><div><h3>Results</h3><div>For all answer prompts (<em>n</em> = 120), 6.6 % were graded as mostly inaccurate, 7.5 % were graded as minorly inaccurate, 41.7 % were graded as accurate, and 44.2 % were graded as accurate and helpful. The average Flesch-Kincaid reading grade level was lowest for the responses without any prompt (11.77). Understandably, the highest grade levels were found in the physician-friend prompt (12.97). Of the 30 references, 25 (83.3 %) were found to be authentic published studies. Of the 25 authentic references, the answers accurately cited information found within the original source for 14 of the references (56 %).</div></div><div><h3>Conclusion</h3><div>ChatGPT was able to produce relatively accurate responses to example patient questions, but there was a high rate of false references. In addition, the reading level of the answer prompts was well above the Centers for Disease Control and Prevention (CDC) recommendations for the average patient.</div></div>","PeriodicalId":7591,"journal":{"name":"American Journal of Otolaryngology","volume":"46 4","pages":"Article 104642"},"PeriodicalIF":1.8000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Otolaryngology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0196070925000456","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
This study aims to investigate the ability of ChatGPT to generate reliably accurate responses to patient-based queries specifically regarding oropharyngeal squamous cell carcinoma (OPSCC) of the head and neck.
Study design
Retrospective review of published abstracts.
Setting
Publicly available generative artificial intelligence.
Methods
ChatGPT 3.5 (May 2024) was queried with a set of 30 questions pertaining to HPV-associated oropharyngeal cancer that the average patient may ask. This set of questions was queried a total of four times preceded by a different prompt. The answer prompts for each question set were reviewed and graded on a four-part Likert scale. A Flesch-Kincaid reading level was also calculated for each prompt.
Results
For all answer prompts (n = 120), 6.6 % were graded as mostly inaccurate, 7.5 % were graded as minorly inaccurate, 41.7 % were graded as accurate, and 44.2 % were graded as accurate and helpful. The average Flesch-Kincaid reading grade level was lowest for the responses without any prompt (11.77). Understandably, the highest grade levels were found in the physician-friend prompt (12.97). Of the 30 references, 25 (83.3 %) were found to be authentic published studies. Of the 25 authentic references, the answers accurately cited information found within the original source for 14 of the references (56 %).
Conclusion
ChatGPT was able to produce relatively accurate responses to example patient questions, but there was a high rate of false references. In addition, the reading level of the answer prompts was well above the Centers for Disease Control and Prevention (CDC) recommendations for the average patient.
期刊介绍:
Be fully informed about developments in otology, neurotology, audiology, rhinology, allergy, laryngology, speech science, bronchoesophagology, facial plastic surgery, and head and neck surgery. Featured sections include original contributions, grand rounds, current reviews, case reports and socioeconomics.