Ting Fang Tan, Arun J Thirunavukarasu, Chrystie Quek, Daniel S W Ting
{"title":"眼科大语言模型的评价:定量与定性方法。","authors":"Ting Fang Tan, Arun J Thirunavukarasu, Chrystie Quek, Daniel S W Ting","doi":"10.1097/ICU.0000000000001171","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose of review: </strong>Alongside the development of large language models (LLMs) and generative artificial intelligence (AI) applications across a diverse range of clinical applications in Ophthalmology, this review highlights the importance of evaluation of LLM applications by discussing evaluation metrics commonly adopted.</p><p><strong>Recent findings: </strong>Generative AI applications have demonstrated encouraging performance in clinical applications of Ophthalmology. Beyond accuracy, evaluation in the form of quantitative and qualitative metrics facilitate a more nuanced assessment of LLM output responses. Several challenges limit evaluation including the lack of consensus on standardized benchmarks, and limited availability of robust and curated clinical datasets.</p><p><strong>Summary: </strong>This review outlines the spectrum of quantitative and qualitative evaluation metrics adopted in existing studies, highlights key challenges in LLM evaluation, to catalyze further work towards standardized and domain-specific evaluation. Robust evaluation to effectively validate clinical LLM applications is crucial in closing the gap towards clinical integration.</p>","PeriodicalId":50604,"journal":{"name":"Current Opinion in Ophthalmology","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of ophthalmic large language models: quantitative vs. qualitative methods.\",\"authors\":\"Ting Fang Tan, Arun J Thirunavukarasu, Chrystie Quek, Daniel S W Ting\",\"doi\":\"10.1097/ICU.0000000000001171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose of review: </strong>Alongside the development of large language models (LLMs) and generative artificial intelligence (AI) applications across a diverse range of clinical applications in Ophthalmology, this review highlights the importance of evaluation of LLM applications by discussing evaluation metrics commonly adopted.</p><p><strong>Recent findings: </strong>Generative AI applications have demonstrated encouraging performance in clinical applications of Ophthalmology. Beyond accuracy, evaluation in the form of quantitative and qualitative metrics facilitate a more nuanced assessment of LLM output responses. Several challenges limit evaluation including the lack of consensus on standardized benchmarks, and limited availability of robust and curated clinical datasets.</p><p><strong>Summary: </strong>This review outlines the spectrum of quantitative and qualitative evaluation metrics adopted in existing studies, highlights key challenges in LLM evaluation, to catalyze further work towards standardized and domain-specific evaluation. Robust evaluation to effectively validate clinical LLM applications is crucial in closing the gap towards clinical integration.</p>\",\"PeriodicalId\":50604,\"journal\":{\"name\":\"Current Opinion in Ophthalmology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Opinion in Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/ICU.0000000000001171\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Opinion in Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/ICU.0000000000001171","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
Evaluation of ophthalmic large language models: quantitative vs. qualitative methods.
Purpose of review: Alongside the development of large language models (LLMs) and generative artificial intelligence (AI) applications across a diverse range of clinical applications in Ophthalmology, this review highlights the importance of evaluation of LLM applications by discussing evaluation metrics commonly adopted.
Recent findings: Generative AI applications have demonstrated encouraging performance in clinical applications of Ophthalmology. Beyond accuracy, evaluation in the form of quantitative and qualitative metrics facilitate a more nuanced assessment of LLM output responses. Several challenges limit evaluation including the lack of consensus on standardized benchmarks, and limited availability of robust and curated clinical datasets.
Summary: This review outlines the spectrum of quantitative and qualitative evaluation metrics adopted in existing studies, highlights key challenges in LLM evaluation, to catalyze further work towards standardized and domain-specific evaluation. Robust evaluation to effectively validate clinical LLM applications is crucial in closing the gap towards clinical integration.
期刊介绍:
Current Opinion in Ophthalmology is an indispensable resource featuring key up-to-date and important advances in the field from around the world. With renowned guest editors for each section, every bimonthly issue of Current Opinion in Ophthalmology delivers a fresh insight into topics such as glaucoma, refractive surgery and corneal and external disorders. With ten sections in total, the journal provides a convenient and thorough review of the field and will be of interest to researchers, clinicians and other healthcare professionals alike.