Benjamin D Killeen, Anushri Suresh, Catalina Gomez, Blanca Íñigo, Christopher Bailey, Mathias Unberath
{"title":"Intelligent control of robotic X-ray devices using a language-promptable digital twin.","authors":"Benjamin D Killeen, Anushri Suresh, Catalina Gomez, Blanca Íñigo, Christopher Bailey, Mathias Unberath","doi":"10.1007/s11548-025-03351-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls easily accessible.Please confirm if the author names are presented accurately and in the correct sequence (given name, middle name/initial, family name). Author 1 Given name: [Benjamin D.] Last name [Killeen]. Also, kindly confirm the details in the metadata are correct. However, enabling language interfaces requires specialized artificial intelligence (AI) models that interpret X-ray images to create a semantic representation for language-based reasoning. The fixed outputs of such AI models fundamentally limits the functionality of language controls that users may access. Incorporating flexible and language-aligned AI models that can be prompted through language control facilitates more flexible interfaces for a much wider variety of tasks and procedures.</p><p><strong>Methods: </strong>Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This allows for multiple autonomous capabilities, including visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling complex language control commands like \"Focus in on the lower lumbar vertebrae.\"</p><p><strong>Results: </strong>In a cadaver study, multiple users were able to visualize, localize, and collimate around structures across the torso region using only verbal commands to control a robotic X-ray system, with 84% end-to-end success. In post hoc analysis of randomly oriented images, our patient digital twin was able to localize 35 commonly requested structures from a given image to within <math><mrow><mn>51.68</mn> <mo>±</mo> <mn>30.84</mn></mrow> </math> mm, which enables localization and isolation of the object from arbitrary orientations.</p><p><strong>Conclusion: </strong>Overall, we show how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. Existing foundation models for intra-operative X-ray image analysis exhibit certain failure modes. Nevertheless, our results suggest that as these models become more capable, they can facilitate highly flexible, intelligent robotic C-arms.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03351-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls easily accessible.Please confirm if the author names are presented accurately and in the correct sequence (given name, middle name/initial, family name). Author 1 Given name: [Benjamin D.] Last name [Killeen]. Also, kindly confirm the details in the metadata are correct. However, enabling language interfaces requires specialized artificial intelligence (AI) models that interpret X-ray images to create a semantic representation for language-based reasoning. The fixed outputs of such AI models fundamentally limits the functionality of language controls that users may access. Incorporating flexible and language-aligned AI models that can be prompted through language control facilitates more flexible interfaces for a much wider variety of tasks and procedures.
Methods: Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This allows for multiple autonomous capabilities, including visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling complex language control commands like "Focus in on the lower lumbar vertebrae."
Results: In a cadaver study, multiple users were able to visualize, localize, and collimate around structures across the torso region using only verbal commands to control a robotic X-ray system, with 84% end-to-end success. In post hoc analysis of randomly oriented images, our patient digital twin was able to localize 35 commonly requested structures from a given image to within mm, which enables localization and isolation of the object from arbitrary orientations.
Conclusion: Overall, we show how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. Existing foundation models for intra-operative X-ray image analysis exhibit certain failure modes. Nevertheless, our results suggest that as these models become more capable, they can facilitate highly flexible, intelligent robotic C-arms.
期刊介绍:
The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.