GP-VLS: A general-purpose vision language model for surgery

arXiv - QuanBio - Tissues and Organs Pub Date : 2024-07-27 DOI:arxiv-2407.19305

Samuel Schmidgall, Joseph Cho, Cyril Zakka, William Hiesinger

{"title":"GP-VLS: A general-purpose vision language model for surgery","authors":"Samuel Schmidgall, Joseph Cho, Cyril Zakka, William Hiesinger","doi":"arxiv-2407.19305","DOIUrl":null,"url":null,"abstract":"Surgery requires comprehensive medical knowledge, visual assessment skills,\nand procedural expertise. While recent surgical AI models have focused on\nsolving task-specific problems, there is a need for general-purpose systems\nthat can understand surgical scenes and interact through natural language. This\npaper introduces GP-VLS, a general-purpose vision language model for surgery\nthat integrates medical and surgical knowledge with visual scene understanding.\nFor comprehensively evaluating general-purpose surgical models, we propose\nSurgiQual, which evaluates across medical and surgical knowledge benchmarks as\nwell as surgical vision-language questions. To train GP-VLS, we develop six new\ndatasets spanning medical knowledge, surgical textbooks, and vision-language\npairs for tasks like phase recognition and tool identification. We show that\nGP-VLS significantly outperforms existing open- and closed-source models on\nsurgical vision-language tasks, with 8-21% improvements in accuracy across\nSurgiQual benchmarks. GP-VLS also demonstrates strong performance on medical\nand surgical knowledge tests compared to open-source alternatives. Overall,\nGP-VLS provides an open-source foundation for developing AI assistants to\nsupport surgeons across a wide range of tasks and scenarios.","PeriodicalId":501572,"journal":{"name":"arXiv - QuanBio - Tissues and Organs","volume":"51 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Tissues and Organs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Surgery requires comprehensive medical knowledge, visual assessment skills, and procedural expertise. While recent surgical AI models have focused on solving task-specific problems, there is a need for general-purpose systems that can understand surgical scenes and interact through natural language. This paper introduces GP-VLS, a general-purpose vision language model for surgery that integrates medical and surgical knowledge with visual scene understanding. For comprehensively evaluating general-purpose surgical models, we propose SurgiQual, which evaluates across medical and surgical knowledge benchmarks as well as surgical vision-language questions. To train GP-VLS, we develop six new datasets spanning medical knowledge, surgical textbooks, and vision-language pairs for tasks like phase recognition and tool identification. We show that GP-VLS significantly outperforms existing open- and closed-source models on surgical vision-language tasks, with 8-21% improvements in accuracy across SurgiQual benchmarks. GP-VLS also demonstrates strong performance on medical and surgical knowledge tests compared to open-source alternatives. Overall, GP-VLS provides an open-source foundation for developing AI assistants to support surgeons across a wide range of tasks and scenarios.

查看原文本刊更多论文

GP-VLS：用于外科手术的通用视觉语言模型

外科手术需要全面的医学知识、视觉评估技能和程序专业知识。虽然最近的手术人工智能模型都集中在解决特定任务的问题上，但仍需要能理解手术场景并通过自然语言进行交互的通用系统。本文介绍了 GP-VLS，这是一种用于外科手术的通用视觉语言模型，它将医学和外科知识与视觉场景理解融为一体。为了全面评估通用外科模型，我们提出了 SurgiQual，它可以评估医学和外科知识基准以及外科视觉语言问题。为了训练 GP-VLS，我们开发了六个新的数据集，涵盖医学知识、外科教科书以及相位识别和工具识别等任务的视觉语言对。我们的研究表明，GP-VLS 在外科视觉语言任务上的表现明显优于现有的开源和闭源模型，在 SurgiQual 基准中的准确率提高了 8-21%。与开源替代方案相比，GP-VLS 还在医学和外科知识测试中表现出强劲的性能。总之，GP-VLS 为开发人工智能助手提供了一个开源基础，可以在广泛的任务和场景中为外科医生提供支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Tissues and Organs

自引率

0.00%

发文量