{"title":"自定义聊天机器人的可读性与GPT-4对耳鼻喉科相关患者问题的反应。","authors":"Yossef Alsabawi, Pompeyo R Quesada, David T Rouse","doi":"10.1016/j.amjoto.2025.104717","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Low health literacy among patients hinders comprehension of care instructions and worsens outcomes, yet most otolaryngology patient materials and chatbot responses to medical inquiries exceed the recommended reading level of sixth- to eighth-grade. Whether chatbots can be pre-programmed to provide accurate, plain-language responses has yet to be studied. This study aims to compare response readability of a GPT model customized for plain language with GPT-4 when answering common otolaryngology patient questions.</p><p><strong>Methods: </strong>A custom GPT was created and provided thirty-three questions from Polat et al. (Int J Pediatr Otorhinolaryngol., 2024), and their GPT-4 answers were reused with permission. Questions were grouped by theme. Readability was calculated with Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) via online calculator. A board-certified, practicing otolaryngologist assessed content similarity and accuracy. The primary outcome was readability, measured by FKGL (0-18; equivalent to United States grade level) and FRE (0-100; higher scores indicate greater readability).</p><p><strong>Results: </strong>The custom GPT reduced FKGL by an average of 4.2 grade levels (95 % confidence interval [CI]: 3.2, 5.1; p < 0.001) and increased FRE by an average of 17.3 points (95 % CI: 12.5, 21.7; p < 0.001). Improvements remained significant in three of four theme subgroups (p < 0.05). Readability was consistent across question types, and variances were equal between models. Expert review confirmed overall accuracy and content similarity.</p><p><strong>Conclusion: </strong>Preprogramming a custom GPT to generate plain-language instructions yields outputs that meet Centers for Medicare & Medicaid Services readability targets without significantly compromising content quality. Tailored chatbots could enhance patient communication in otolaryngology clinics and other medical settings.</p>","PeriodicalId":7591,"journal":{"name":"American Journal of Otolaryngology","volume":"46 5","pages":"104717"},"PeriodicalIF":1.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Readability of custom chatbot vs. GPT-4 responses to otolaryngology-related patient questions.\",\"authors\":\"Yossef Alsabawi, Pompeyo R Quesada, David T Rouse\",\"doi\":\"10.1016/j.amjoto.2025.104717\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Low health literacy among patients hinders comprehension of care instructions and worsens outcomes, yet most otolaryngology patient materials and chatbot responses to medical inquiries exceed the recommended reading level of sixth- to eighth-grade. Whether chatbots can be pre-programmed to provide accurate, plain-language responses has yet to be studied. This study aims to compare response readability of a GPT model customized for plain language with GPT-4 when answering common otolaryngology patient questions.</p><p><strong>Methods: </strong>A custom GPT was created and provided thirty-three questions from Polat et al. (Int J Pediatr Otorhinolaryngol., 2024), and their GPT-4 answers were reused with permission. Questions were grouped by theme. Readability was calculated with Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) via online calculator. A board-certified, practicing otolaryngologist assessed content similarity and accuracy. The primary outcome was readability, measured by FKGL (0-18; equivalent to United States grade level) and FRE (0-100; higher scores indicate greater readability).</p><p><strong>Results: </strong>The custom GPT reduced FKGL by an average of 4.2 grade levels (95 % confidence interval [CI]: 3.2, 5.1; p < 0.001) and increased FRE by an average of 17.3 points (95 % CI: 12.5, 21.7; p < 0.001). Improvements remained significant in three of four theme subgroups (p < 0.05). Readability was consistent across question types, and variances were equal between models. Expert review confirmed overall accuracy and content similarity.</p><p><strong>Conclusion: </strong>Preprogramming a custom GPT to generate plain-language instructions yields outputs that meet Centers for Medicare & Medicaid Services readability targets without significantly compromising content quality. Tailored chatbots could enhance patient communication in otolaryngology clinics and other medical settings.</p>\",\"PeriodicalId\":7591,\"journal\":{\"name\":\"American Journal of Otolaryngology\",\"volume\":\"46 5\",\"pages\":\"104717\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Otolaryngology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.amjoto.2025.104717\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/6 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"OTORHINOLARYNGOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Otolaryngology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.amjoto.2025.104717","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/6 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
Readability of custom chatbot vs. GPT-4 responses to otolaryngology-related patient questions.
Background: Low health literacy among patients hinders comprehension of care instructions and worsens outcomes, yet most otolaryngology patient materials and chatbot responses to medical inquiries exceed the recommended reading level of sixth- to eighth-grade. Whether chatbots can be pre-programmed to provide accurate, plain-language responses has yet to be studied. This study aims to compare response readability of a GPT model customized for plain language with GPT-4 when answering common otolaryngology patient questions.
Methods: A custom GPT was created and provided thirty-three questions from Polat et al. (Int J Pediatr Otorhinolaryngol., 2024), and their GPT-4 answers were reused with permission. Questions were grouped by theme. Readability was calculated with Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) via online calculator. A board-certified, practicing otolaryngologist assessed content similarity and accuracy. The primary outcome was readability, measured by FKGL (0-18; equivalent to United States grade level) and FRE (0-100; higher scores indicate greater readability).
Results: The custom GPT reduced FKGL by an average of 4.2 grade levels (95 % confidence interval [CI]: 3.2, 5.1; p < 0.001) and increased FRE by an average of 17.3 points (95 % CI: 12.5, 21.7; p < 0.001). Improvements remained significant in three of four theme subgroups (p < 0.05). Readability was consistent across question types, and variances were equal between models. Expert review confirmed overall accuracy and content similarity.
Conclusion: Preprogramming a custom GPT to generate plain-language instructions yields outputs that meet Centers for Medicare & Medicaid Services readability targets without significantly compromising content quality. Tailored chatbots could enhance patient communication in otolaryngology clinics and other medical settings.
期刊介绍:
Be fully informed about developments in otology, neurotology, audiology, rhinology, allergy, laryngology, speech science, bronchoesophagology, facial plastic surgery, and head and neck surgery. Featured sections include original contributions, grand rounds, current reviews, case reports and socioeconomics.