从视觉问答到眼科中的智能人工智能代理

IF 3.5 2区医学 Q1 OPHTHALMOLOGY

British Journal of Ophthalmology Pub Date : 2025-08-27 DOI:10.1136/bjo-2024-326097

Xiaolan Chen, Ruoyu Chen, Pusheng Xu, Xiaojie Wan, Weiyi Zhang, Bingjie Yan, Xianwen Shang, Mingguang He, Danli Shi

{"title":"从视觉问答到眼科中的智能人工智能代理","authors":"Xiaolan Chen, Ruoyu Chen, Pusheng Xu, Xiaojie Wan, Weiyi Zhang, Bingjie Yan, Xianwen Shang, Mingguang He, Danli Shi","doi":"10.1136/bjo-2024-326097","DOIUrl":null,"url":null,"abstract":"Ophthalmic practice involves the integration of diverse clinical data and interactive decision-making, posing challenges for traditional artificial intelligence (AI) systems. Visual question answering (VQA) addresses this by combining computer vision and natural language processing to interpret medical images through user-driven queries. Evolving from VQA, multimodal AI agents enable continuous dialogue, tool use and context-aware clinical decision support. This review explores recent developments in ophthalmic conversational AI, spanning theoretical advances and practical implementations. We highlight the transformative role of large language models (LLMs) in improving reasoning, adaptability and task execution. However, key obstacles remain, including limited multimodal datasets, absence of standardised evaluation protocols, and challenges in clinical integration. We outline these limitations and propose future research directions to support the development of robust, LLM-driven AI systems. Realising their full potential will depend on close collaboration between AI researchers and the ophthalmic community.","PeriodicalId":9313,"journal":{"name":"British Journal of Ophthalmology","volume":"22 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From visual question answering to intelligent AI agents in ophthalmology\",\"authors\":\"Xiaolan Chen, Ruoyu Chen, Pusheng Xu, Xiaojie Wan, Weiyi Zhang, Bingjie Yan, Xianwen Shang, Mingguang He, Danli Shi\",\"doi\":\"10.1136/bjo-2024-326097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ophthalmic practice involves the integration of diverse clinical data and interactive decision-making, posing challenges for traditional artificial intelligence (AI) systems. Visual question answering (VQA) addresses this by combining computer vision and natural language processing to interpret medical images through user-driven queries. Evolving from VQA, multimodal AI agents enable continuous dialogue, tool use and context-aware clinical decision support. This review explores recent developments in ophthalmic conversational AI, spanning theoretical advances and practical implementations. We highlight the transformative role of large language models (LLMs) in improving reasoning, adaptability and task execution. However, key obstacles remain, including limited multimodal datasets, absence of standardised evaluation protocols, and challenges in clinical integration. We outline these limitations and propose future research directions to support the development of robust, LLM-driven AI systems. Realising their full potential will depend on close collaboration between AI researchers and the ophthalmic community.\",\"PeriodicalId\":9313,\"journal\":{\"name\":\"British Journal of Ophthalmology\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"British Journal of Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/bjo-2024-326097\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bjo-2024-326097","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

眼科实践涉及多种临床数据的整合和交互式决策，对传统的人工智能（AI）系统提出了挑战。视觉问答（VQA）通过结合计算机视觉和自然语言处理，通过用户驱动的查询来解释医学图像，解决了这个问题。从VQA发展而来，多模式人工智能代理能够实现持续对话、工具使用和情境感知临床决策支持。这篇综述探讨了眼科会话人工智能的最新发展，涵盖了理论进展和实际实施。我们强调了大型语言模型（llm）在提高推理、适应性和任务执行方面的变革作用。然而，主要障碍仍然存在，包括有限的多模式数据集，缺乏标准化的评估方案，以及临床整合方面的挑战。我们概述了这些限制，并提出了未来的研究方向，以支持强大的，法学硕士驱动的人工智能系统的发展。充分发挥其潜力将取决于人工智能研究人员与眼科社区之间的密切合作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From visual question answering to intelligent AI agents in ophthalmology

Ophthalmic practice involves the integration of diverse clinical data and interactive decision-making, posing challenges for traditional artificial intelligence (AI) systems. Visual question answering (VQA) addresses this by combining computer vision and natural language processing to interpret medical images through user-driven queries. Evolving from VQA, multimodal AI agents enable continuous dialogue, tool use and context-aware clinical decision support. This review explores recent developments in ophthalmic conversational AI, spanning theoretical advances and practical implementations. We highlight the transformative role of large language models (LLMs) in improving reasoning, adaptability and task execution. However, key obstacles remain, including limited multimodal datasets, absence of standardised evaluation protocols, and challenges in clinical integration. We outline these limitations and propose future research directions to support the development of robust, LLM-driven AI systems. Realising their full potential will depend on close collaboration between AI researchers and the ophthalmic community.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

British Journal of Ophthalmology 医学-眼科学

CiteScore

10.30

自引率

2.40%

发文量

213

审稿时长

3-6 weeks

期刊介绍： The British Journal of Ophthalmology (BJO) is an international peer-reviewed journal for ophthalmologists and visual science specialists. BJO publishes clinical investigations, clinical observations, and clinically relevant laboratory investigations related to ophthalmology. It also provides major reviews and also publishes manuscripts covering regional issues in a global context.