眼科大语言模型的评价：定量与定性方法。

IF 2.6 2区医学 Q1 OPHTHALMOLOGY

Current Opinion in Ophthalmology Pub Date : 2025-09-04 DOI:10.1097/ICU.0000000000001171

Ting Fang Tan, Arun J Thirunavukarasu, Chrystie Quek, Daniel S W Ting

{"title":"眼科大语言模型的评价：定量与定性方法。","authors":"Ting Fang Tan, Arun J Thirunavukarasu, Chrystie Quek, Daniel S W Ting","doi":"10.1097/ICU.0000000000001171","DOIUrl":null,"url":null,"abstract":"Purpose of review: Alongside the development of large language models (LLMs) and generative artificial intelligence (AI) applications across a diverse range of clinical applications in Ophthalmology, this review highlights the importance of evaluation of LLM applications by discussing evaluation metrics commonly adopted.Recent findings: Generative AI applications have demonstrated encouraging performance in clinical applications of Ophthalmology. Beyond accuracy, evaluation in the form of quantitative and qualitative metrics facilitate a more nuanced assessment of LLM output responses. Several challenges limit evaluation including the lack of consensus on standardized benchmarks, and limited availability of robust and curated clinical datasets.Summary: This review outlines the spectrum of quantitative and qualitative evaluation metrics adopted in existing studies, highlights key challenges in LLM evaluation, to catalyze further work towards standardized and domain-specific evaluation. Robust evaluation to effectively validate clinical LLM applications is crucial in closing the gap towards clinical integration.","PeriodicalId":50604,"journal":{"name":"Current Opinion in Ophthalmology","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of ophthalmic large language models: quantitative vs. qualitative methods.\",\"authors\":\"Ting Fang Tan, Arun J Thirunavukarasu, Chrystie Quek, Daniel S W Ting\",\"doi\":\"10.1097/ICU.0000000000001171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose of review: Alongside the development of large language models (LLMs) and generative artificial intelligence (AI) applications across a diverse range of clinical applications in Ophthalmology, this review highlights the importance of evaluation of LLM applications by discussing evaluation metrics commonly adopted.Recent findings: Generative AI applications have demonstrated encouraging performance in clinical applications of Ophthalmology. Beyond accuracy, evaluation in the form of quantitative and qualitative metrics facilitate a more nuanced assessment of LLM output responses. Several challenges limit evaluation including the lack of consensus on standardized benchmarks, and limited availability of robust and curated clinical datasets.Summary: This review outlines the spectrum of quantitative and qualitative evaluation metrics adopted in existing studies, highlights key challenges in LLM evaluation, to catalyze further work towards standardized and domain-specific evaluation. Robust evaluation to effectively validate clinical LLM applications is crucial in closing the gap towards clinical integration.\",\"PeriodicalId\":50604,\"journal\":{\"name\":\"Current Opinion in Ophthalmology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Opinion in Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/ICU.0000000000001171\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Opinion in Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/ICU.0000000000001171","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

综述目的：随着大型语言模型（LLM）和生成式人工智能（AI）应用在眼科各种临床应用中的发展，本综述通过讨论常用的评估指标来强调评估LLM应用的重要性。最近的研究结果：生成式人工智能应用在眼科的临床应用中表现出令人鼓舞的表现。除了准确性之外，定量和定性指标形式的评估有助于对法学硕士输出响应进行更细致的评估。一些挑战限制了评估，包括缺乏对标准化基准的共识，以及可靠和精心策划的临床数据集的有限可用性。摘要：本综述概述了现有研究中采用的定量和定性评估指标的范围，强调了法学硕士评估中的关键挑战，以促进进一步标准化和特定领域评估的工作。有效验证临床法学硕士应用的稳健评估对于缩小临床整合的差距至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of ophthalmic large language models: quantitative vs. qualitative methods.

Purpose of review: Alongside the development of large language models (LLMs) and generative artificial intelligence (AI) applications across a diverse range of clinical applications in Ophthalmology, this review highlights the importance of evaluation of LLM applications by discussing evaluation metrics commonly adopted.

Recent findings: Generative AI applications have demonstrated encouraging performance in clinical applications of Ophthalmology. Beyond accuracy, evaluation in the form of quantitative and qualitative metrics facilitate a more nuanced assessment of LLM output responses. Several challenges limit evaluation including the lack of consensus on standardized benchmarks, and limited availability of robust and curated clinical datasets.

Summary: This review outlines the spectrum of quantitative and qualitative evaluation metrics adopted in existing studies, highlights key challenges in LLM evaluation, to catalyze further work towards standardized and domain-specific evaluation. Robust evaluation to effectively validate clinical LLM applications is crucial in closing the gap towards clinical integration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Current Opinion in Ophthalmology 医学-眼科学

CiteScore

6.80

自引率

5.40%

发文量

120

审稿时长

6-12 weeks

期刊介绍： Current Opinion in Ophthalmology is an indispensable resource featuring key up-to-date and important advances in the field from around the world. With renowned guest editors for each section, every bimonthly issue of Current Opinion in Ophthalmology delivers a fresh insight into topics such as glaucoma, refractive surgery and corneal and external disorders. With ten sections in total, the journal provides a convenient and thorough review of the field and will be of interest to researchers, clinicians and other healthcare professionals alike.