提高机器人辅助手术自主性的通用基础模型

IF 18.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Nature Machine Intelligence Pub Date : 2024-11-01 DOI:10.1038/s42256-024-00917-4

Samuel Schmidgall, Ji Woong Kim, Alan Kuntz, Ahmed Ezzat Ghazi, Axel Krieger

{"title":"提高机器人辅助手术自主性的通用基础模型","authors":"Samuel Schmidgall, Ji Woong Kim, Alan Kuntz, Ahmed Ezzat Ghazi, Axel Krieger","doi":"10.1038/s42256-024-00917-4","DOIUrl":null,"url":null,"abstract":"The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise towards being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: there is a lack of existing large-scale open-source data to train models; it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue; and surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This Perspective aims to provide a path towards increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision–language–action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide four guiding actions towards increased autonomy in robot-assisted surgery. Schmidgall et al. describe a pathway for building general-purpose machine learning models for robot-assisted surgery, including mechanisms for avoiding risk and handing over control to surgeons, and improving safety and outcomes beyond demonstration data.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 11","pages":"1275-1283"},"PeriodicalIF":18.8000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"General-purpose foundation models for increased autonomy in robot-assisted surgery\",\"authors\":\"Samuel Schmidgall, Ji Woong Kim, Alan Kuntz, Ahmed Ezzat Ghazi, Axel Krieger\",\"doi\":\"10.1038/s42256-024-00917-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise towards being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: there is a lack of existing large-scale open-source data to train models; it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue; and surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This Perspective aims to provide a path towards increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision–language–action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide four guiding actions towards increased autonomy in robot-assisted surgery. Schmidgall et al. describe a pathway for building general-purpose machine learning models for robot-assisted surgery, including mechanisms for avoiding risk and handing over control to surgeons, and improving safety and outcomes beyond demonstration data.\",\"PeriodicalId\":48533,\"journal\":{\"name\":\"Nature Machine Intelligence\",\"volume\":\"6 11\",\"pages\":\"1275-1283\"},\"PeriodicalIF\":18.8000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.nature.com/articles/s42256-024-00917-4\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-024-00917-4","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

端到端机器人学习的主流模式侧重于优化特定任务目标，以解决单一的机器人问题，如拾取物体或到达目标位置。然而，最近在机器人大容量模型方面的研究表明，在大量不同的、与任务无关的视频演示数据集上进行训练很有前途。这些模型对未知环境的泛化程度令人印象深刻，尤其是在数据量和模型复杂度不断增加的情况下。从数据中学习的手术机器人系统一直难以像其他机器人学习领域那样快速发展，原因有以下几点：缺乏现有的大规模开源数据来训练模型；由于模拟无法与生物组织的物理和视觉复杂性相匹配，因此对这些机器人在手术过程中的软体变形进行建模具有挑战性；手术机器人在临床试验中存在伤害患者的风险，因此需要采取更广泛的安全措施。本视角旨在通过为手术机器人开发多模式、多任务、视觉-语言-动作模型，为提高机器人辅助手术中的机器人自主性提供一条途径。最终，我们认为手术机器人具有得天独厚的优势，可以从通用模型中获益，并为提高机器人辅助手术的自主性提供四项指导行动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

General-purpose foundation models for increased autonomy in robot-assisted surgery

查看原文本刊更多论文

General-purpose foundation models for increased autonomy in robot-assisted surgery

The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise towards being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: there is a lack of existing large-scale open-source data to train models; it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue; and surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This Perspective aims to provide a path towards increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision–language–action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide four guiding actions towards increased autonomy in robot-assisted surgery. Schmidgall et al. describe a pathway for building general-purpose machine learning models for robot-assisted surgery, including mechanisms for avoiding risk and handing over control to surgeons, and improving safety and outcomes beyond demonstration data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Machine Intelligence Multiple-

CiteScore

36.90

自引率

2.10%

发文量

127

期刊介绍： Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.