Artificial Intelligence Versus Professional Standards: A Cross-Sectional Comparative Study of GPT, Gemini, and ENT UK in Delivering Patient Information on ENT Conditions.
Ali Alabdalhussein, Nehal Singhania, Shazaan Nadeem, Mohammed Talib, Derar Al-Domaidat, Ibrahim Jimoh, Waleed Khan, Manish Mair
{"title":"Artificial Intelligence Versus Professional Standards: A Cross-Sectional Comparative Study of GPT, Gemini, and ENT UK in Delivering Patient Information on ENT Conditions.","authors":"Ali Alabdalhussein, Nehal Singhania, Shazaan Nadeem, Mohammed Talib, Derar Al-Domaidat, Ibrahim Jimoh, Waleed Khan, Manish Mair","doi":"10.3390/diseases13090286","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Patient information materials are sensitive and, if poorly written, can cause misunderstanding. This study evaluated and compared the readability, actionability, and quality of patient education materials on laryngology topics generated by ChatGPT, Google Gemini, and ENT UK.</p><p><strong>Methods: </strong>We obtained patient information from ENT UK and generated equivalent content with ChatGPT-4-turbo and Google Gemini 2.5 Pro for six laryngology conditions. We assessed readability (Flesch-Kincaid Grade Level, FKGL; Flesch Reading Ease, FRE), quality (DISCERN), and patient engagement (PEMAT-P for understandability and actionability). Statistical comparisons involved using ANOVA, Tukey's HSD, and Kruskal-Wallis tests.</p><p><strong>Results: </strong>ENT UK showed the highest readability (FRE: 64.6 ± 8.4) and lowest grade level (FKGL: 7.4 ± 1.5), significantly better than that of ChatGPT (FRE: 38.8 ± 10.5, FKGL: 11.0 ± 1.5) and Gemini (FRE: 38.3 ± 8.5, FKGL: 11.9 ± 1.2) (all <i>p</i> < 0.001). DISCERN scores did not differ significantly (ENT UK: 21.3 ± 7.5, GPT: 24.7 ± 9.1, Gemini: 29.5 ± 4.6; <i>p</i> > 0.05). PEMAT-P understandability results were similar (ENT UK: 72.7 ± 8.3%, GPT: 79.1 ± 5.8%, Gemini: 78.5 ± 13.1%), except for lower GPT scores on vocal cord paralysis (<i>p</i> < 0.05). Actionability was also comparable (ENT UK: 46.7 ± 16.3%, GPT: 41.1 ± 24.0%, Gemini: 36.7 ± 19.7%).</p><p><strong>Conclusion: </strong>GPT and Gemini produce patient information of comparable quality and engagement to ENT UK but require higher reading levels and fall short of recommended literacy standards.</p>","PeriodicalId":72832,"journal":{"name":"Diseases (Basel, Switzerland)","volume":"13 9","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468877/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diseases (Basel, Switzerland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/diseases13090286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Patient information materials are sensitive and, if poorly written, can cause misunderstanding. This study evaluated and compared the readability, actionability, and quality of patient education materials on laryngology topics generated by ChatGPT, Google Gemini, and ENT UK.
Methods: We obtained patient information from ENT UK and generated equivalent content with ChatGPT-4-turbo and Google Gemini 2.5 Pro for six laryngology conditions. We assessed readability (Flesch-Kincaid Grade Level, FKGL; Flesch Reading Ease, FRE), quality (DISCERN), and patient engagement (PEMAT-P for understandability and actionability). Statistical comparisons involved using ANOVA, Tukey's HSD, and Kruskal-Wallis tests.
Results: ENT UK showed the highest readability (FRE: 64.6 ± 8.4) and lowest grade level (FKGL: 7.4 ± 1.5), significantly better than that of ChatGPT (FRE: 38.8 ± 10.5, FKGL: 11.0 ± 1.5) and Gemini (FRE: 38.3 ± 8.5, FKGL: 11.9 ± 1.2) (all p < 0.001). DISCERN scores did not differ significantly (ENT UK: 21.3 ± 7.5, GPT: 24.7 ± 9.1, Gemini: 29.5 ± 4.6; p > 0.05). PEMAT-P understandability results were similar (ENT UK: 72.7 ± 8.3%, GPT: 79.1 ± 5.8%, Gemini: 78.5 ± 13.1%), except for lower GPT scores on vocal cord paralysis (p < 0.05). Actionability was also comparable (ENT UK: 46.7 ± 16.3%, GPT: 41.1 ± 24.0%, Gemini: 36.7 ± 19.7%).
Conclusion: GPT and Gemini produce patient information of comparable quality and engagement to ENT UK but require higher reading levels and fall short of recommended literacy standards.