Michael Balas, Alexander J Kaplan, Kaisra Esmail, Solin Saleh, Rahul A Sharma, Peng Yan, Parnian Arjmand
{"title":"Translating ophthalmic medical jargon with artificial intelligence: a comparative comprehension study.","authors":"Michael Balas, Alexander J Kaplan, Kaisra Esmail, Solin Saleh, Rahul A Sharma, Peng Yan, Parnian Arjmand","doi":"10.1016/j.jcjo.2024.11.003","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Our goal was to evaluate the efficacy of OpenAI's ChatGPT-4.0 large language model (LLM) in translating technical ophthalmology terminology into more comprehensible language for allied health care professionals and compare it with other LLMs.</p><p><strong>Design: </strong>Observational cross-sectional study.</p><p><strong>Participants: </strong>Five ophthalmologists each contributed three clinical encounter notes, totaling 15 reports for analysis.</p><p><strong>Methods: </strong>Notes were translated into more comprehensible language using ChatGPT-4.0, ChatGPT-4o, Claude 3 Sonnet, and Google Gemini. Ten family physicians, masked to whether the note was original or translated by ChatGPT-4.0, independently evaluated both sets using Likert scales to assess comprehension and utility for clinical decision-making. Readability was evaluated using Flesch Reading Ease and Flesch-Kincaid Grade Level scores. Five ophthalmologist raters compared performance between LLMs and identified translation errors.</p><p><strong>Results: </strong>ChatGPT-4.0 translations significantly outperformed the original notes in terms of comprehension (mean score of 4.7/5.0 vs 3.7/5.0; p < 0.001) and perceived usefulness (mean score of 4.6/5.0 vs 3.8/5.0; p < 0.005). Readability analysis demonstrated mildly increased linguistic complexity in the translated notes. ChatGPT-4.0 was preferred in 8 of 15 cases, ChatGPT-4o in 4, Gemini in 3, and Claude 3 Sonnet in 0 cases. All models exhibited some translation errors, but ChatGPT-4o and ChatGPT-4.0 had fewer inaccuracies.</p><p><strong>Conclusions: </strong>ChatGPT-4.0 can significantly enhance the comprehensibility of ophthalmic notes, facilitating better interprofessional communication and suggesting a promising role for LLMs in medical translation. However, the results also underscore the need for ongoing refinement and careful implementation of such technologies. Further research is needed to validate these findings across a broader range of specialties and languages.</p>","PeriodicalId":9606,"journal":{"name":"Canadian journal of ophthalmology. Journal canadien d'ophtalmologie","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian journal of ophthalmology. Journal canadien d'ophtalmologie","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jcjo.2024.11.003","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Our goal was to evaluate the efficacy of OpenAI's ChatGPT-4.0 large language model (LLM) in translating technical ophthalmology terminology into more comprehensible language for allied health care professionals and compare it with other LLMs.
Design: Observational cross-sectional study.
Participants: Five ophthalmologists each contributed three clinical encounter notes, totaling 15 reports for analysis.
Methods: Notes were translated into more comprehensible language using ChatGPT-4.0, ChatGPT-4o, Claude 3 Sonnet, and Google Gemini. Ten family physicians, masked to whether the note was original or translated by ChatGPT-4.0, independently evaluated both sets using Likert scales to assess comprehension and utility for clinical decision-making. Readability was evaluated using Flesch Reading Ease and Flesch-Kincaid Grade Level scores. Five ophthalmologist raters compared performance between LLMs and identified translation errors.
Results: ChatGPT-4.0 translations significantly outperformed the original notes in terms of comprehension (mean score of 4.7/5.0 vs 3.7/5.0; p < 0.001) and perceived usefulness (mean score of 4.6/5.0 vs 3.8/5.0; p < 0.005). Readability analysis demonstrated mildly increased linguistic complexity in the translated notes. ChatGPT-4.0 was preferred in 8 of 15 cases, ChatGPT-4o in 4, Gemini in 3, and Claude 3 Sonnet in 0 cases. All models exhibited some translation errors, but ChatGPT-4o and ChatGPT-4.0 had fewer inaccuracies.
Conclusions: ChatGPT-4.0 can significantly enhance the comprehensibility of ophthalmic notes, facilitating better interprofessional communication and suggesting a promising role for LLMs in medical translation. However, the results also underscore the need for ongoing refinement and careful implementation of such technologies. Further research is needed to validate these findings across a broader range of specialties and languages.
期刊介绍:
Official journal of the Canadian Ophthalmological Society.
The Canadian Journal of Ophthalmology (CJO) is the official journal of the Canadian Ophthalmological Society and is committed to timely publication of original, peer-reviewed ophthalmology and vision science articles.