Zeynel Mert Asfuroğlu, Hilal Yağar, Ender Gümüşoğlu
{"title":"High accuracy but limited readability of large language model-generated responses to frequently asked questions about Kienböck's disease.","authors":"Zeynel Mert Asfuroğlu, Hilal Yağar, Ender Gümüşoğlu","doi":"10.1186/s12891-024-07983-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study aimed to assess the quality and readability of large language model-generated responses to frequently asked questions (FAQs) about Kienböck's disease (KD).</p><p><strong>Methods: </strong>Nineteen FAQs about KD were selected, and the questions were divided into three categories: general knowledge, diagnosis, and treatment. The questions were inputted into the Chat Generative Pre-trained Transformer 4 (ChatGPT4) webpage using the zero-shot prompting method, and the responses were recorded. Hand surgeons with at least 5 years of experience and advanced English proficiency were individually contacted over instant WhatsApp messaging and requested to assess the responses. The quality of each response was analyzed by 33 experienced hand surgeons using the Global Quality Scale (GQS). The readability was assessed with the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES).</p><p><strong>Results: </strong>The mean GQS score was 4.28 out of a maximum of 5 points. Most raters assessed the quality as good (270 of 627 responses; 43.1%) or excellent (260 of 627 responses; 41.5%). The mean FKGL was 15.5, and the mean FRES was 23.4, both of which are considered above the college graduate level. No statistically significant differences were found in the quality and readability of responses provided for questions related to general knowledge, diagnosis, and treatment.</p><p><strong>Conclusions: </strong>ChatGPT-4 provided high-quality responses to FAQs about KD. However, the primary drawback was the poor readability of these responses. By improving the readability of ChatGPT's output, we can transform it into a valuable information resource for individuals with KD.</p><p><strong>Level of evidence: </strong>Level IV, Observational study.</p>","PeriodicalId":9189,"journal":{"name":"BMC Musculoskeletal Disorders","volume":"25 1","pages":"879"},"PeriodicalIF":2.2000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11536837/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Musculoskeletal Disorders","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12891-024-07983-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: This study aimed to assess the quality and readability of large language model-generated responses to frequently asked questions (FAQs) about Kienböck's disease (KD).
Methods: Nineteen FAQs about KD were selected, and the questions were divided into three categories: general knowledge, diagnosis, and treatment. The questions were inputted into the Chat Generative Pre-trained Transformer 4 (ChatGPT4) webpage using the zero-shot prompting method, and the responses were recorded. Hand surgeons with at least 5 years of experience and advanced English proficiency were individually contacted over instant WhatsApp messaging and requested to assess the responses. The quality of each response was analyzed by 33 experienced hand surgeons using the Global Quality Scale (GQS). The readability was assessed with the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES).
Results: The mean GQS score was 4.28 out of a maximum of 5 points. Most raters assessed the quality as good (270 of 627 responses; 43.1%) or excellent (260 of 627 responses; 41.5%). The mean FKGL was 15.5, and the mean FRES was 23.4, both of which are considered above the college graduate level. No statistically significant differences were found in the quality and readability of responses provided for questions related to general knowledge, diagnosis, and treatment.
Conclusions: ChatGPT-4 provided high-quality responses to FAQs about KD. However, the primary drawback was the poor readability of these responses. By improving the readability of ChatGPT's output, we can transform it into a valuable information resource for individuals with KD.
期刊介绍:
BMC Musculoskeletal Disorders is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of musculoskeletal disorders, as well as related molecular genetics, pathophysiology, and epidemiology.
The scope of the Journal covers research into rheumatic diseases where the primary focus relates specifically to a component(s) of the musculoskeletal system.