{"title":"Vision language models in ophthalmology.","authors":"Gilbert Lim, Kabilan Elangovan, Liyuan Jin","doi":"10.1097/ICU.0000000000001089","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose of review: </strong>Vision Language Models are an emerging paradigm in artificial intelligence that offers the potential to natively analyze both image and textual data simultaneously, within a single model. The fusion of these two modalities is of particular relevance to ophthalmology, which has historically involved specialized imaging techniques such as angiography, optical coherence tomography, and fundus photography, while also interfacing with electronic health records that include free text descriptions. This review then surveys the fast-evolving field of Vision Language Models as they apply to current ophthalmologic research and practice.</p><p><strong>Recent findings: </strong>Although models incorporating both image and text data have a long provenance in ophthalmology, effective multimodal Vision Language Models are a recent development exploiting advances in technologies such as transformer and autoencoder models.</p><p><strong>Summary: </strong>Vision Language Models offer the potential to assist and streamline the existing clinical workflow in ophthalmology, whether previsit, during, or post-visit. There are, however, also important challenges to be overcome, particularly regarding patient privacy and explainability of model recommendations.</p>","PeriodicalId":50604,"journal":{"name":"Current Opinion in Ophthalmology","volume":" ","pages":"487-493"},"PeriodicalIF":3.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Opinion in Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/ICU.0000000000001089","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose of review: Vision Language Models are an emerging paradigm in artificial intelligence that offers the potential to natively analyze both image and textual data simultaneously, within a single model. The fusion of these two modalities is of particular relevance to ophthalmology, which has historically involved specialized imaging techniques such as angiography, optical coherence tomography, and fundus photography, while also interfacing with electronic health records that include free text descriptions. This review then surveys the fast-evolving field of Vision Language Models as they apply to current ophthalmologic research and practice.
Recent findings: Although models incorporating both image and text data have a long provenance in ophthalmology, effective multimodal Vision Language Models are a recent development exploiting advances in technologies such as transformer and autoencoder models.
Summary: Vision Language Models offer the potential to assist and streamline the existing clinical workflow in ophthalmology, whether previsit, during, or post-visit. There are, however, also important challenges to be overcome, particularly regarding patient privacy and explainability of model recommendations.
期刊介绍:
Current Opinion in Ophthalmology is an indispensable resource featuring key up-to-date and important advances in the field from around the world. With renowned guest editors for each section, every bimonthly issue of Current Opinion in Ophthalmology delivers a fresh insight into topics such as glaucoma, refractive surgery and corneal and external disorders. With ten sections in total, the journal provides a convenient and thorough review of the field and will be of interest to researchers, clinicians and other healthcare professionals alike.