Jixiu Li, Yisen Huang, W. Ng, T. Cheng, Xixin Wu, Q. Dou, Helen M. Meng, P. Heng, Yunhui Liu, S. Chan, D. Navarro-Alarcon, Calvin Sze Hang Ng, Philip Wai Yan Chiu, Zheng Li
{"title":"Speech-Vision Based Multi-Modal AI Control of a Magnetic Anchored and Actuated Endoscope","authors":"Jixiu Li, Yisen Huang, W. Ng, T. Cheng, Xixin Wu, Q. Dou, Helen M. Meng, P. Heng, Yunhui Liu, S. Chan, D. Navarro-Alarcon, Calvin Sze Hang Ng, Philip Wai Yan Chiu, Zheng Li","doi":"10.1109/ROBIO55434.2022.10011904","DOIUrl":null,"url":null,"abstract":"In minimally invasive surgery (MIS), controlling the endoscope view is crucial for the operation. Many robotic endoscope holders were developed aiming to address this prob-lem,. These systems rely on joystick, foot pedal, simple voice command, etc. to control the robot. These methods requires surgeons extra effort and are not intuitive enough. In this paper, we propose a speech-vision based multi-modal AI approach, which integrates deep learning based instrument detection, automatic speech recognition and robot visual servo control. Surgeons could communicate with the endoscope by speech to indicate their view preference, such as the instrument to be tracked. The instrument is detected by the deep learning neural network. Then the endoscope takes the detected instrument as the target and follows it with the visual servo controller. This method is applied to a magnetic anchored and guided endoscope and evaluated experimentally. Preliminary results demonstrated this approach is effective and requires little efforts for the surgeon to control the endoscope view intuitively.","PeriodicalId":151112,"journal":{"name":"2022 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO55434.2022.10011904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In minimally invasive surgery (MIS), controlling the endoscope view is crucial for the operation. Many robotic endoscope holders were developed aiming to address this prob-lem,. These systems rely on joystick, foot pedal, simple voice command, etc. to control the robot. These methods requires surgeons extra effort and are not intuitive enough. In this paper, we propose a speech-vision based multi-modal AI approach, which integrates deep learning based instrument detection, automatic speech recognition and robot visual servo control. Surgeons could communicate with the endoscope by speech to indicate their view preference, such as the instrument to be tracked. The instrument is detected by the deep learning neural network. Then the endoscope takes the detected instrument as the target and follows it with the visual servo controller. This method is applied to a magnetic anchored and guided endoscope and evaluated experimentally. Preliminary results demonstrated this approach is effective and requires little efforts for the surgeon to control the endoscope view intuitively.