{"title":"评估chatgpt - 40对基于患者的圆锥角膜问题的回答的准确性和可读性。","authors":"Ali Safa Balci, Semih Çakmak","doi":"10.1080/09286586.2025.2484760","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, to frequently asked patient-centered questions about keratoconus.</p><p><strong>Methods: </strong>A cross-sectional, observational study was conducted using ChatGPT-4o to answer 30 potential questions that could be asked by patients with keratoconus. The accuracy of the responses was evaluated by two board-certified ophthalmologists and scored on a scale of 1 to 5. Readability was assessed using the Simple Measure of Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), and Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, and follow-up-related questions were analyzed, and statistical comparisons between these categories were performed.</p><p><strong>Results: </strong>The mean accuracy score for the responses was 4.48 ± 0.57 on a 5-point Likert scale. The interrater reliability, with an intraclass correlation coefficient of 0.769, indicated a strong level of agreement. Readability scores revealed a SMOG score of 15.49 ± 1.74, an FKGL score of 14.95 ± 1.95, and an FRE score of 27.41 ± 9.71, indicating that a high level of education is required to comprehend the responses. There was no significant difference in accuracy among the different question categories (<i>p</i> = 0.161), but readability varied significantly, with treatment-related questions being the easiest to understand.</p><p><strong>Conclusion: </strong>ChatGPT-4o provides highly accurate responses to patient-centered questions about keratoconus, though the complexity of its language may limit accessibility for the general population. Further development is needed to enhance the readability of AI-generated medical content.</p>","PeriodicalId":19607,"journal":{"name":"Ophthalmic epidemiology","volume":" ","pages":"1-6"},"PeriodicalIF":1.7000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Accuracy and Readability of ChatGPT-4o's Responses to Patient-Based Questions about Keratoconus.\",\"authors\":\"Ali Safa Balci, Semih Çakmak\",\"doi\":\"10.1080/09286586.2025.2484760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, to frequently asked patient-centered questions about keratoconus.</p><p><strong>Methods: </strong>A cross-sectional, observational study was conducted using ChatGPT-4o to answer 30 potential questions that could be asked by patients with keratoconus. The accuracy of the responses was evaluated by two board-certified ophthalmologists and scored on a scale of 1 to 5. Readability was assessed using the Simple Measure of Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), and Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, and follow-up-related questions were analyzed, and statistical comparisons between these categories were performed.</p><p><strong>Results: </strong>The mean accuracy score for the responses was 4.48 ± 0.57 on a 5-point Likert scale. The interrater reliability, with an intraclass correlation coefficient of 0.769, indicated a strong level of agreement. Readability scores revealed a SMOG score of 15.49 ± 1.74, an FKGL score of 14.95 ± 1.95, and an FRE score of 27.41 ± 9.71, indicating that a high level of education is required to comprehend the responses. There was no significant difference in accuracy among the different question categories (<i>p</i> = 0.161), but readability varied significantly, with treatment-related questions being the easiest to understand.</p><p><strong>Conclusion: </strong>ChatGPT-4o provides highly accurate responses to patient-centered questions about keratoconus, though the complexity of its language may limit accessibility for the general population. Further development is needed to enhance the readability of AI-generated medical content.</p>\",\"PeriodicalId\":19607,\"journal\":{\"name\":\"Ophthalmic epidemiology\",\"volume\":\" \",\"pages\":\"1-6\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ophthalmic epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/09286586.2025.2484760\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmic epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/09286586.2025.2484760","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
Evaluating the Accuracy and Readability of ChatGPT-4o's Responses to Patient-Based Questions about Keratoconus.
Purpose: This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, to frequently asked patient-centered questions about keratoconus.
Methods: A cross-sectional, observational study was conducted using ChatGPT-4o to answer 30 potential questions that could be asked by patients with keratoconus. The accuracy of the responses was evaluated by two board-certified ophthalmologists and scored on a scale of 1 to 5. Readability was assessed using the Simple Measure of Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), and Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, and follow-up-related questions were analyzed, and statistical comparisons between these categories were performed.
Results: The mean accuracy score for the responses was 4.48 ± 0.57 on a 5-point Likert scale. The interrater reliability, with an intraclass correlation coefficient of 0.769, indicated a strong level of agreement. Readability scores revealed a SMOG score of 15.49 ± 1.74, an FKGL score of 14.95 ± 1.95, and an FRE score of 27.41 ± 9.71, indicating that a high level of education is required to comprehend the responses. There was no significant difference in accuracy among the different question categories (p = 0.161), but readability varied significantly, with treatment-related questions being the easiest to understand.
Conclusion: ChatGPT-4o provides highly accurate responses to patient-centered questions about keratoconus, though the complexity of its language may limit accessibility for the general population. Further development is needed to enhance the readability of AI-generated medical content.
期刊介绍:
Ophthalmic Epidemiology is dedicated to the publication of original research into eye and vision health in the fields of epidemiology, public health and the prevention of blindness. Ophthalmic Epidemiology publishes editorials, original research reports, systematic reviews and meta-analysis articles, brief communications and letters to the editor on all subjects related to ophthalmic epidemiology. A broad range of topics is suitable, such as: evaluating the risk of ocular diseases, general and specific study designs, screening program implementation and evaluation, eye health care access, delivery and outcomes, therapeutic efficacy or effectiveness, disease prognosis and quality of life, cost-benefit analysis, biostatistical theory and risk factor analysis. We are looking to expand our engagement with reports of international interest, including those regarding problems affecting developing countries, although reports from all over the world potentially are suitable. Clinical case reports, small case series (not enough for a cohort analysis) articles and animal research reports are not appropriate for this journal.