Intelligent control of robotic X-ray devices using a language-promptable digital twin.

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2025-06-01 Epub Date: 2025-04-09 DOI:10.1007/s11548-025-03351-y

Benjamin D Killeen, Anushri Suresh, Catalina Gomez, Blanca Íñigo, Christopher Bailey, Mathias Unberath

{"title":"Intelligent control of robotic X-ray devices using a language-promptable digital twin.","authors":"Benjamin D Killeen, Anushri Suresh, Catalina Gomez, Blanca Íñigo, Christopher Bailey, Mathias Unberath","doi":"10.1007/s11548-025-03351-y","DOIUrl":null,"url":null,"abstract":"Purpose: Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls easily accessible.Please confirm if the author names are presented accurately and in the correct sequence (given name, middle name/initial, family name). Author 1 Given name: [Benjamin D.] Last name [Killeen]. Also, kindly confirm the details in the metadata are correct. However, enabling language interfaces requires specialized artificial intelligence (AI) models that interpret X-ray images to create a semantic representation for language-based reasoning. The fixed outputs of such AI models fundamentally limits the functionality of language controls that users may access. Incorporating flexible and language-aligned AI models that can be prompted through language control facilitates more flexible interfaces for a much wider variety of tasks and procedures.Methods: Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This allows for multiple autonomous capabilities, including visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling complex language control commands like \"Focus in on the lower lumbar vertebrae.\"Results: In a cadaver study, multiple users were able to visualize, localize, and collimate around structures across the torso region using only verbal commands to control a robotic X-ray system, with 84% end-to-end success. In post hoc analysis of randomly oriented images, our patient digital twin was able to localize 35 commonly requested structures from a given image to within <math><mrow><mn>51.68</mn> <mo>±</mo> <mn>30.84</mn></mrow> </math> mm, which enables localization and isolation of the object from arbitrary orientations.Conclusion: Overall, we show how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. Existing foundation models for intra-operative X-ray image analysis exhibit certain failure modes. Nevertheless, our results suggest that as these models become more capable, they can facilitate highly flexible, intelligent robotic C-arms.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1125-1134"},"PeriodicalIF":2.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03351-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/9 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls easily accessible.Please confirm if the author names are presented accurately and in the correct sequence (given name, middle name/initial, family name). Author 1 Given name: [Benjamin D.] Last name [Killeen]. Also, kindly confirm the details in the metadata are correct. However, enabling language interfaces requires specialized artificial intelligence (AI) models that interpret X-ray images to create a semantic representation for language-based reasoning. The fixed outputs of such AI models fundamentally limits the functionality of language controls that users may access. Incorporating flexible and language-aligned AI models that can be prompted through language control facilitates more flexible interfaces for a much wider variety of tasks and procedures.

Methods: Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This allows for multiple autonomous capabilities, including visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling complex language control commands like "Focus in on the lower lumbar vertebrae."

Results: In a cadaver study, multiple users were able to visualize, localize, and collimate around structures across the torso region using only verbal commands to control a robotic X-ray system, with 84% end-to-end success. In post hoc analysis of randomly oriented images, our patient digital twin was able to localize 35 commonly requested structures from a given image to within $51.68 \pm 30.84$ mm, which enables localization and isolation of the object from arbitrary orientations.

Conclusion: Overall, we show how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. Existing foundation models for intra-operative X-ray image analysis exhibit certain failure modes. Nevertheless, our results suggest that as these models become more capable, they can facilitate highly flexible, intelligent robotic C-arms.

查看原文本刊更多论文

使用语言提示数字双胞胎智能控制机器人x射线设备。

目的：自然语言为控制机器人c臂x射线系统提供了方便、灵活的界面，使先进的功能和控制易于访问。请确认作者姓名是否准确且顺序正确（名，中间名/首字母，姓）。作者1名：[Benjamin D.]姓[Killeen]。另外，请确认元数据中的详细信息是否正确。然而，启用语言接口需要专门的人工智能（AI）模型来解释x射线图像，以便为基于语言的推理创建语义表示。这种人工智能模型的固定输出从根本上限制了用户可以访问的语言控制功能。结合灵活和语言一致的人工智能模型，可以通过语言控制提示，为更广泛的任务和程序提供更灵活的界面。方法：使用x射线图像分割的语言对齐基础模型，我们的系统基于所需解剖结构的稀疏重建不断更新患者数字双胞胎。这允许多种自主功能，包括可视化，特定于患者的视图查找，以及从新视点自动校准，支持复杂的语言控制命令，如“关注下腰椎”。结果：在一项尸体研究中，多个用户仅使用口头命令就可以控制机器人x射线系统，实现躯干区域结构的可视化、定位和准直，端到端成功率为84%。在随机定向图像的事后分析中，我们的患者数字双胞胎能够从给定图像中定位35个常用结构，范围在51.68±30.84 mm，这使得物体从任意方向定位和隔离成为可能。结论：总的来说，我们展示了智能机器人x射线系统如何直接结合医生表达的意图。现有的术中x线图像分析基础模型存在一定的失效模式。然而，我们的研究结果表明，随着这些模型变得更有能力，它们可以促进高度灵活、智能的机器人c型臂。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.