GP-VLS: A general-purpose vision language model for surgery

Samuel Schmidgall, Joseph Cho, Cyril Zakka, William Hiesinger
{"title":"GP-VLS: A general-purpose vision language model for surgery","authors":"Samuel Schmidgall, Joseph Cho, Cyril Zakka, William Hiesinger","doi":"arxiv-2407.19305","DOIUrl":null,"url":null,"abstract":"Surgery requires comprehensive medical knowledge, visual assessment skills,\nand procedural expertise. While recent surgical AI models have focused on\nsolving task-specific problems, there is a need for general-purpose systems\nthat can understand surgical scenes and interact through natural language. This\npaper introduces GP-VLS, a general-purpose vision language model for surgery\nthat integrates medical and surgical knowledge with visual scene understanding.\nFor comprehensively evaluating general-purpose surgical models, we propose\nSurgiQual, which evaluates across medical and surgical knowledge benchmarks as\nwell as surgical vision-language questions. To train GP-VLS, we develop six new\ndatasets spanning medical knowledge, surgical textbooks, and vision-language\npairs for tasks like phase recognition and tool identification. We show that\nGP-VLS significantly outperforms existing open- and closed-source models on\nsurgical vision-language tasks, with 8-21% improvements in accuracy across\nSurgiQual benchmarks. GP-VLS also demonstrates strong performance on medical\nand surgical knowledge tests compared to open-source alternatives. Overall,\nGP-VLS provides an open-source foundation for developing AI assistants to\nsupport surgeons across a wide range of tasks and scenarios.","PeriodicalId":501572,"journal":{"name":"arXiv - QuanBio - Tissues and Organs","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Tissues and Organs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Surgery requires comprehensive medical knowledge, visual assessment skills, and procedural expertise. While recent surgical AI models have focused on solving task-specific problems, there is a need for general-purpose systems that can understand surgical scenes and interact through natural language. This paper introduces GP-VLS, a general-purpose vision language model for surgery that integrates medical and surgical knowledge with visual scene understanding. For comprehensively evaluating general-purpose surgical models, we propose SurgiQual, which evaluates across medical and surgical knowledge benchmarks as well as surgical vision-language questions. To train GP-VLS, we develop six new datasets spanning medical knowledge, surgical textbooks, and vision-language pairs for tasks like phase recognition and tool identification. We show that GP-VLS significantly outperforms existing open- and closed-source models on surgical vision-language tasks, with 8-21% improvements in accuracy across SurgiQual benchmarks. GP-VLS also demonstrates strong performance on medical and surgical knowledge tests compared to open-source alternatives. Overall, GP-VLS provides an open-source foundation for developing AI assistants to support surgeons across a wide range of tasks and scenarios.
GP-VLS:用于外科手术的通用视觉语言模型
外科手术需要全面的医学知识、视觉评估技能和程序专业知识。虽然最近的手术人工智能模型都集中在解决特定任务的问题上,但仍需要能理解手术场景并通过自然语言进行交互的通用系统。本文介绍了 GP-VLS,这是一种用于外科手术的通用视觉语言模型,它将医学和外科知识与视觉场景理解融为一体。为了全面评估通用外科模型,我们提出了 SurgiQual,它可以评估医学和外科知识基准以及外科视觉语言问题。为了训练 GP-VLS,我们开发了六个新的数据集,涵盖医学知识、外科教科书以及相位识别和工具识别等任务的视觉语言对。我们的研究表明,GP-VLS 在外科视觉语言任务上的表现明显优于现有的开源和闭源模型,在 SurgiQual 基准中的准确率提高了 8-21%。与开源替代方案相比,GP-VLS 还在医学和外科知识测试中表现出强劲的性能。总之,GP-VLS 为开发人工智能助手提供了一个开源基础,可以在广泛的任务和场景中为外科医生提供支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信