Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.

IF 3.9 3区工程技术 Q2 BIOLOGY

Yale Journal of Biology and Medicine Pub Date : 2024-03-29 eCollection Date: 2024-03-01 DOI:10.59249/ZTOZ1966

Kanhai S Amin, Linda C Mayes, Pavan Khosla, Rushabh H Doshi

{"title":"Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.","authors":"Kanhai S Amin, Linda C Mayes, Pavan Khosla, Rushabh H Doshi","doi":"10.59249/ZTOZ1966","DOIUrl":null,"url":null,"abstract":"<p><p>Enhanced health literacy in children has been empirically linked to better health outcomes over the long term; however, few interventions have been shown to improve health literacy. In this context, we investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children. We tested pediatric conditions using 26 different prompts in ChatGPT-3.5, ChatGPT-4, Microsoft Bing, and Google Bard (now known as Google Gemini). The primary outcome measurement was the reading grade level (RGL) of output as assessed by Gunning Fog, Flesch-Kincaid Grade Level, Automated Readability Index, and Coleman-Liau indices. Word counts were also assessed. Across all models, output for basic prompts such as \"Explain\" and \"What is (are),\" were at, or exceeded, the tenth-grade RGL. When prompts were specified to explain conditions from the first- to twelfth-grade level, we found that LLMs had varying abilities to tailor responses based on grade level. ChatGPT-3.5 provided responses that ranged from the seventh-grade to college freshmen RGL while ChatGPT-4 outputted responses from the tenth-grade to the college senior RGL. Microsoft Bing provided responses from the ninth- to eleventh-grade RGL while Google Bard provided responses from the seventh- to tenth-grade RGL. LLMs face challenges in crafting outputs below a sixth-grade RGL. However, their capability to modify outputs above this threshold, provides a potential mechanism for adolescents to explore, understand, and engage with information regarding their health conditions, spanning from simple to complex terms. Future studies are needed to verify the accuracy and efficacy of these tools.</p>","PeriodicalId":48617,"journal":{"name":"Yale Journal of Biology and Medicine","volume":"97 1","pages":"17-27"},"PeriodicalIF":3.9000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10964816/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Yale Journal of Biology and Medicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.59249/ZTOZ1966","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Enhanced health literacy in children has been empirically linked to better health outcomes over the long term; however, few interventions have been shown to improve health literacy. In this context, we investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children. We tested pediatric conditions using 26 different prompts in ChatGPT-3.5, ChatGPT-4, Microsoft Bing, and Google Bard (now known as Google Gemini). The primary outcome measurement was the reading grade level (RGL) of output as assessed by Gunning Fog, Flesch-Kincaid Grade Level, Automated Readability Index, and Coleman-Liau indices. Word counts were also assessed. Across all models, output for basic prompts such as "Explain" and "What is (are)," were at, or exceeded, the tenth-grade RGL. When prompts were specified to explain conditions from the first- to twelfth-grade level, we found that LLMs had varying abilities to tailor responses based on grade level. ChatGPT-3.5 provided responses that ranged from the seventh-grade to college freshmen RGL while ChatGPT-4 outputted responses from the tenth-grade to the college senior RGL. Microsoft Bing provided responses from the ninth- to eleventh-grade RGL while Google Bard provided responses from the seventh- to tenth-grade RGL. LLMs face challenges in crafting outputs below a sixth-grade RGL. However, their capability to modify outputs above this threshold, provides a potential mechanism for adolescents to explore, understand, and engage with information regarding their health conditions, spanning from simple to complex terms. Future studies are needed to verify the accuracy and efficacy of these tools.

Abstract Image

查看原文本刊更多论文

评估大语言模型在健康扫盲中的功效：综合横断面研究

根据经验，儿童健康素养的提高与更好的长期健康结果有关；然而，很少有干预措施能提高儿童的健康素养。在这种情况下，我们研究了大型语言模型（LLM）是否可以作为提高儿童健康素养的媒介。我们在 ChatGPT-3.5、ChatGPT-4、微软必应和谷歌巴德（现名为谷歌双子座）中使用 26 种不同的提示对儿科情况进行了测试。主要结果测量是输出的阅读等级（RGL），由 Gunning Fog、Flesch-Kincaid Grade Level、Automated Readability Index 和 Coleman-Liau 指数评估。此外，还对字数进行了评估。在所有模型中，"解释 "和 "什么是（是）"等基本提示的输出都达到或超过了十年级的 RGL。当提示语被指定为解释从一年级到十二年级的条件时，我们发现 LLMs 根据年级调整回答的能力各不相同。ChatGPT-3.5 提供了从七年级到大学新生 RGL 的回答，而 ChatGPT-4 输出了从十年级到大学高年级 RGL 的回答。Microsoft Bing 提供了从九年级到十一年级的 RGL，而 Google Bard 提供了从七年级到十年级的 RGL。法律硕士在制作低于六级 RGL 的输出方面面临挑战。然而，LLMs 有能力修改高于这一阈值的输出结果，这为青少年探索、理解和参与有关其健康状况的信息（从简单到复杂的术语）提供了一种潜在的机制。未来的研究需要验证这些工具的准确性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Yale Journal of Biology and Medicine Biochemistry, Genetics and Molecular Biology-General Biochemistry,Genetics and Molecular Biology

CiteScore

5.00

自引率

0.00%

发文量

期刊介绍： The Yale Journal of Biology and Medicine (YJBM) is a graduate and medical student-run, peer-reviewed, open-access journal dedicated to the publication of original research articles, scientific reviews, articles on medical history, personal perspectives on medicine, policy analyses, case reports, and symposia related to biomedical matters. YJBM is published quarterly and aims to publish articles of interest to both physicians and scientists. YJBM is and has been an internationally distributed journal with a long history of landmark articles. Our contributors feature a notable list of philosophers, statesmen, scientists, and physicians, including Ernst Cassirer, Harvey Cushing, Rene Dubos, Edward Kennedy, Donald Seldin, and Jack Strominger. Our Editorial Board consists of students and faculty members from Yale School of Medicine and Yale University Graduate School of Arts & Sciences. All manuscripts submitted to YJBM are first evaluated on the basis of scientific quality, originality, appropriateness, contribution to the field, and style. Suitable manuscripts are then subject to rigorous, fair, and rapid peer review.