{"title":"Guest Editorial: Special Issue on Al Technologies and Applications in Medical Robots","authors":"Xiaozhi Qi, Zhongliang Jiang, Ying Hu, Jianwei Zhang","doi":"10.1049/cit2.70019","DOIUrl":null,"url":null,"abstract":"<p>The integration of artificial intelligence (AI) into medical robotics has emerged as a cornerstone of modern healthcare, driving transformative advancements in precision, adaptability and patient outcomes. Although computational tools have long supported diagnostic processes, their role is evolving beyond passive assistance to become active collaborators in therapeutic decision-making. In this paradigm, knowledge-driven deep learning systems are redefining possibilities—enabling robots to interpret complex data, adapt to dynamic clinical environments and execute tasks with human-like contextual awareness.</p><p>The purpose of this special issue is to showcase the latest developments in the application of AI technology in medical robots. The main content includes but is not limited to passive data adaptation, force feedback tracking, image processing and diagnosis, surgical navigation, exoskeleton systems etc. These studies cover various application scenarios of medical robots, with the ultimate goal of maximising AI autonomy.</p><p>We have received 31 paper submissions from around the world, and after a rigorous peer review process, we have finally selected 9 papers for publication. The selected collection of papers covers various fascinating research topics, all of which have achieved key breakthroughs in their respective fields. We believe that these accepted papers have guiding significance for their research fields and can help researchers enhance their understanding of current trends. Sincere thanks to the authors who chose our platform and all the staff who provided assistance for the publication of these papers.</p><p>In the article ‘Model adaptation via credible local context representation’, Tang et al. pointed out that conventional model transfer techniques require labelled source data, which makes them inapplicable in privacy-sensitive medical domains. To address these critical problems of source-free domain adaptation (SFDA), they proposed a credible local context representation (CLCR) method that significantly enhances model generalisation through geometric structure mining in feature space. This method innovatively constructs a two-stage learning framework: introducing a data-enhanced mutual information regularisation term in the pretraining stage of the source model to enhance the model's learning of sample discriminative features; design a deep space fixed step walking strategy during the target domain adaptation phase, dynamically capture the local credible contextual features of each target sample and use them as pseudo-labels for semantic fusion. Experiments on the three benchmark datasets of Office-31, Office Home and VisDA show that CLCR achieves an average accuracy of 89.2% in 12 cross-domain tasks, which is 3.1% higher than the existing optimal SFDA method and even surpasses some domain adaptation methods that require the participation of source data. This work provides a new approach to address the privacy performance conflict in cross-institutional model transfer in healthcare, and its context discovery mechanism has universal significance for unsupervised representation learning.</p><p>In the article ‘A human-robot collaboration method for uncertain surface scanning’, Zhao et al. introduces a human–robot collaboration framework for uncertain surface scanning that synergises teleoperation with adaptive force control. The system enables operators to remotely guide scanning trajectories, whereas an admittance controller maintains constant contact force through real-time stiffness adjustment, achieving ± 1 N tracking precision on surfaces with unknown stiffness. Autonomous tool reorientation, triggered when angular deviation exceeds 5°, ensures perpendicular alignment through friction-compensated force perception. Experimental validation, using a mock ultrasound probe, demonstrated 63% workload reduction compared to pure teleoperation, successfully handling both spongy and spring-supported phantoms. The hybrid control architecture decouples human guidance from robotic compliance, permitting simultaneous XY-axis motion control and Z-axis force regulation without prior environmental modelling. This approach bridges human intuition with robotic precision, particularly valuable for medical scanning applications requiring safe tissue interaction.</p><p>In the research entitled ‘AESR3D: 3D Overcomplete Autoencoder for Trabecular CT Super Resolution’, Zhang et al. proposed AESR3D, a 3D overcomplete autoencoder framework, to address the limitations of osteoporosis diagnosis by enhancing low-resolution trabecular CT scans. Current reliance on bone mineral density (BMD) overlooks microstructural deterioration critical for biomechanical strength. AESR3D combines a hybrid CNN-transformer architecture with dual-task regularisation—simultaneously optimising super-resolution reconstruction and low-resolution restoration—to prevent overfitting while recovering structural details. The model achieves state-of-the-art performance (SSIM: 0.996) and demonstrates strong correlation with high-resolution ground truth in trabecular metrics (ICC = 0.917). By integrating unsupervised <i>K</i>-means segmentation, it enables precise visualisation of bone microarchitecture without labelled data. Outperforming existing medical/natural image SR methods, AESR3D bridges micro-CT research and clinical CT applications, offering a noninvasive tool for enhanced osteoporosis assessment and advancing diagnostic accuracy in bone quality evaluation.</p><p>In the paper ‘Segmentation versus Detection: Development and Evaluation of Deep Learning Models for PIRADS Lesions Localisation on Bi-Parametric Prostate MRI’, Min et al. address the critical challenge of automated prostate cancer detection in bi-parametric MRI (bp-MRI) by rigorously comparing segmentation (nnUNet) and object detection (nnDetection) deep learning approaches. Prostate cancer, a leading cause of male mortality, demands precise early diagnosis, yet MRI interpretation remains radiologist-dependent and time-intensive. The authors introduce novel lesion-level sensitivity and precision metrics, overcoming limitations of traditional voxel-wise evaluations, and propose ensemble methods to synergise the strengths of both models. Results demonstrate nnDetection's superior lesion-level sensitivity (80.78% vs. 60.40% for PIRADS ≥ 3 lesions at 3 false positives), whereas nnUNet excels in voxel-level accuracy (DSC 0.46 vs. 0.35). Ensemble techniques further enhance performance, achieving 82.24% lesion-level sensitivity, underscoring their potential to balance detection robustness and spatial precision. Validated on external datasets, the framework highlights the clinical viability of combining segmentation and detection paradigms, particularly for MRI-guided biopsies requiring high sensitivity. This work advances computer-aided diagnosis by bridging methodological gaps and providing metrics aligned with clinical priorities, offering a scalable pathway towards improved prostate cancer management through AI-driven lesion localisation.</p><p>In the paper ‘Needle Detection and Localisation for Robot-assisted Subretinal Injection using Deep Learning’, Zhou et al. address the critical challenge of precise needle detection and localisation in robot-assisted subretinal injection, a high-stakes ophthalmic procedure requiring micrometre-level accuracy. Leveraging microscope-integrated optical coherence tomography (MI-OCT), the authors propose a robust framework combining ROI cropping and deep learning to overcome limitations in manual needle tracking caused by tissue deformation and specular noise. Five convolutional neural network architectures were evaluated, with the top-performing model (Network II) achieving 100% detection success on ex vivo porcine eyes and localising needle segments with an Intersection-over-Union of 0.55. By analysing bounding box edges, the method demonstrated sub-10 μm accuracy in depth estimation, crucial for navigating the delicate retinal layers. The integration of neighbouring OCT scans enhanced spatial context awareness, outperforming geometric feature-based approaches. This work advances intraoperative imaging-guided robotics by enabling real-time, deformation-resistant needle tracking, potentially reducing surgical risks in gene therapy delivery and subretinal haemorrhage treatment. The validated framework bridges a critical gap in ophthalmic robotics, offering a pathway towards safer, more precise robotic interventions in retinal surgery.</p><p>In the paper ‘A method for automatic feature points extraction of pelvic surface based on PointMLP_RegNet’, Kou et al. note that the precise extraction of anatomical landmarks from complex pelvic structures is critical for enhancing 3D/3D registration accuracy in robot-assisted fracture reduction. Addressing challenges in manual and conventional automated methods, this study introduces PointMLP_RegNet, a deep learning framework adapted from PointMLP by replacing its classification layer with a regression module to predict spatial coordinates of 10 pelvic landmarks. Trained on a clinical dataset of 40 patient-derived CT-reconstructed point clouds augmented via downsampling, translation, rotation and noise injection, the model demonstrated robust performance through leave-one-out cross-validation. Results revealed sub-5 mm accuracy across all landmarks, with 80% achieving errors below 4 mm, surpassing PointNet++ and PointNet in precision (reducing mean error by 20%–30%) while maintaining superior computational efficiency (0.688 M parameters). By automating feature extraction, the method minimises human variability, streamlines intraoperative registration and improves surgical planning reliability. This innovation bridges technical gaps in pelvic fracture robotics, offering a scalable solution for clinical adoption and underscoring the transformative potential of tailored deep learning architectures in orthopaedic navigation systems.</p><p>In the paper ‘Rehabilitation Exoskeleton System with Bidirectional Virtual Reality Feedback Training Strategy’, Gao et al. introduced a VR-integrated exoskeleton system for stroke rehabilitation, combining immersive 3D environments with real-time bidirectional feedback to enhance neural retraining. The system employs a novel muscle activation model merging linear and nonlinear contraction dynamics, addressing limitations of traditional Hill-based models, whereas a WOA-GRNN algorithm achieves precise muscle strength prediction (RMSE: 0.0173, MAPE: 1.25%). Experiments with healthy participants demonstrated synchronised exoskeleton-VR motion mapping and involuntary muscle responses to virtual stimuli, validating neural pathway engagement. Notably, 75% of subjects exhibited subconscious arm movements during VR-induced phantom limb activation, suggesting enhanced proprioceptive integration. This bidirectional feedback framework advances personalised rehabilitation by objectively quantifying recovery through sEMG-driven metrics while maintaining patient engagement through adaptive virtual tasks.</p><p>In the paper ‘A Demonstration Trajectory Segmentation Approach for Wheelchair-mounted Robotic Arms’, Chi et al. proposed a novel trajectory segmentation approach for wheelchair-mounted assistive robots, aiming to enhance their ability to learn and reproduce complex tasks in unstructured environments. The proposed GTW-BP-AR-HMM method integrates the generalised time warping (GTW) algorithm with a beta process autoregressive hidden Markov model (BP-AR-HMM) to address challenges in aligning and segmenting variable-length demonstration trajectories. By first aligning multiple task demonstrations temporally using GTW, the framework mitigates inconsistencies in trajectory lengths, a critical limitation of traditional BP-AR-HMM. Subsequent segmentation identifies motion primitives, enabling the creation of reusable task libraries. Validation on a 6-DOF robotic arm demonstrated high accuracy in segmenting tasks such as holding a water glass and eating, with segmentation points closely matching manual annotations. This approach reduces reliance on expert input, simplifying the demonstration process for nonspecialists while improving the robot's adaptability to user-specific needs. The work underscores the potential of combining temporal alignment and probabilistic modelling to advance assistive robotics in healthcare and home settings.</p><p>In the paper ‘Processing Water-Medium Spinal Endoscopic Images Based on Dual Transmittance’, Hu and Zhang proposed a novel dual-transmittance fusion method to enhance water-medium spinal endoscopic images degraded by suspended contaminants during minimally invasive procedures. By adapting an underwater imaging model to spinal endoscopy, the authors estimate transmittance through boundary constraints and local contrast analysis, addressing light scattering and absorption caused by turbid surgical environments. The fusion of these transmittance maps, optimised via guided filtering, minimises artefacts while preserving structural integrity. Ambient light estimation using a “Shades of Grey” algorithm further ensures balanced colour correction. Experimental validation against classical methods—including WGIF, AGCWD and MSRCR—demonstrates superior performance in entropy, contrast and structural similarity metrics, effectively restoring tissue textures without overexposure or distortion. This physics-informed approach bridges computational efficiency with clinical utility, offering real-time image clarity for precise intraoperative navigation. The method's robustness across diverse degradation scenarios, from blood contamination to tool shadows, positions it as a pivotal advancement in enhancing visualisation for complex spinal surgeries, promising improved surgical accuracy and safety.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"635-637"},"PeriodicalIF":8.4000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70019","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.70019","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The integration of artificial intelligence (AI) into medical robotics has emerged as a cornerstone of modern healthcare, driving transformative advancements in precision, adaptability and patient outcomes. Although computational tools have long supported diagnostic processes, their role is evolving beyond passive assistance to become active collaborators in therapeutic decision-making. In this paradigm, knowledge-driven deep learning systems are redefining possibilities—enabling robots to interpret complex data, adapt to dynamic clinical environments and execute tasks with human-like contextual awareness.
The purpose of this special issue is to showcase the latest developments in the application of AI technology in medical robots. The main content includes but is not limited to passive data adaptation, force feedback tracking, image processing and diagnosis, surgical navigation, exoskeleton systems etc. These studies cover various application scenarios of medical robots, with the ultimate goal of maximising AI autonomy.
We have received 31 paper submissions from around the world, and after a rigorous peer review process, we have finally selected 9 papers for publication. The selected collection of papers covers various fascinating research topics, all of which have achieved key breakthroughs in their respective fields. We believe that these accepted papers have guiding significance for their research fields and can help researchers enhance their understanding of current trends. Sincere thanks to the authors who chose our platform and all the staff who provided assistance for the publication of these papers.
In the article ‘Model adaptation via credible local context representation’, Tang et al. pointed out that conventional model transfer techniques require labelled source data, which makes them inapplicable in privacy-sensitive medical domains. To address these critical problems of source-free domain adaptation (SFDA), they proposed a credible local context representation (CLCR) method that significantly enhances model generalisation through geometric structure mining in feature space. This method innovatively constructs a two-stage learning framework: introducing a data-enhanced mutual information regularisation term in the pretraining stage of the source model to enhance the model's learning of sample discriminative features; design a deep space fixed step walking strategy during the target domain adaptation phase, dynamically capture the local credible contextual features of each target sample and use them as pseudo-labels for semantic fusion. Experiments on the three benchmark datasets of Office-31, Office Home and VisDA show that CLCR achieves an average accuracy of 89.2% in 12 cross-domain tasks, which is 3.1% higher than the existing optimal SFDA method and even surpasses some domain adaptation methods that require the participation of source data. This work provides a new approach to address the privacy performance conflict in cross-institutional model transfer in healthcare, and its context discovery mechanism has universal significance for unsupervised representation learning.
In the article ‘A human-robot collaboration method for uncertain surface scanning’, Zhao et al. introduces a human–robot collaboration framework for uncertain surface scanning that synergises teleoperation with adaptive force control. The system enables operators to remotely guide scanning trajectories, whereas an admittance controller maintains constant contact force through real-time stiffness adjustment, achieving ± 1 N tracking precision on surfaces with unknown stiffness. Autonomous tool reorientation, triggered when angular deviation exceeds 5°, ensures perpendicular alignment through friction-compensated force perception. Experimental validation, using a mock ultrasound probe, demonstrated 63% workload reduction compared to pure teleoperation, successfully handling both spongy and spring-supported phantoms. The hybrid control architecture decouples human guidance from robotic compliance, permitting simultaneous XY-axis motion control and Z-axis force regulation without prior environmental modelling. This approach bridges human intuition with robotic precision, particularly valuable for medical scanning applications requiring safe tissue interaction.
In the research entitled ‘AESR3D: 3D Overcomplete Autoencoder for Trabecular CT Super Resolution’, Zhang et al. proposed AESR3D, a 3D overcomplete autoencoder framework, to address the limitations of osteoporosis diagnosis by enhancing low-resolution trabecular CT scans. Current reliance on bone mineral density (BMD) overlooks microstructural deterioration critical for biomechanical strength. AESR3D combines a hybrid CNN-transformer architecture with dual-task regularisation—simultaneously optimising super-resolution reconstruction and low-resolution restoration—to prevent overfitting while recovering structural details. The model achieves state-of-the-art performance (SSIM: 0.996) and demonstrates strong correlation with high-resolution ground truth in trabecular metrics (ICC = 0.917). By integrating unsupervised K-means segmentation, it enables precise visualisation of bone microarchitecture without labelled data. Outperforming existing medical/natural image SR methods, AESR3D bridges micro-CT research and clinical CT applications, offering a noninvasive tool for enhanced osteoporosis assessment and advancing diagnostic accuracy in bone quality evaluation.
In the paper ‘Segmentation versus Detection: Development and Evaluation of Deep Learning Models for PIRADS Lesions Localisation on Bi-Parametric Prostate MRI’, Min et al. address the critical challenge of automated prostate cancer detection in bi-parametric MRI (bp-MRI) by rigorously comparing segmentation (nnUNet) and object detection (nnDetection) deep learning approaches. Prostate cancer, a leading cause of male mortality, demands precise early diagnosis, yet MRI interpretation remains radiologist-dependent and time-intensive. The authors introduce novel lesion-level sensitivity and precision metrics, overcoming limitations of traditional voxel-wise evaluations, and propose ensemble methods to synergise the strengths of both models. Results demonstrate nnDetection's superior lesion-level sensitivity (80.78% vs. 60.40% for PIRADS ≥ 3 lesions at 3 false positives), whereas nnUNet excels in voxel-level accuracy (DSC 0.46 vs. 0.35). Ensemble techniques further enhance performance, achieving 82.24% lesion-level sensitivity, underscoring their potential to balance detection robustness and spatial precision. Validated on external datasets, the framework highlights the clinical viability of combining segmentation and detection paradigms, particularly for MRI-guided biopsies requiring high sensitivity. This work advances computer-aided diagnosis by bridging methodological gaps and providing metrics aligned with clinical priorities, offering a scalable pathway towards improved prostate cancer management through AI-driven lesion localisation.
In the paper ‘Needle Detection and Localisation for Robot-assisted Subretinal Injection using Deep Learning’, Zhou et al. address the critical challenge of precise needle detection and localisation in robot-assisted subretinal injection, a high-stakes ophthalmic procedure requiring micrometre-level accuracy. Leveraging microscope-integrated optical coherence tomography (MI-OCT), the authors propose a robust framework combining ROI cropping and deep learning to overcome limitations in manual needle tracking caused by tissue deformation and specular noise. Five convolutional neural network architectures were evaluated, with the top-performing model (Network II) achieving 100% detection success on ex vivo porcine eyes and localising needle segments with an Intersection-over-Union of 0.55. By analysing bounding box edges, the method demonstrated sub-10 μm accuracy in depth estimation, crucial for navigating the delicate retinal layers. The integration of neighbouring OCT scans enhanced spatial context awareness, outperforming geometric feature-based approaches. This work advances intraoperative imaging-guided robotics by enabling real-time, deformation-resistant needle tracking, potentially reducing surgical risks in gene therapy delivery and subretinal haemorrhage treatment. The validated framework bridges a critical gap in ophthalmic robotics, offering a pathway towards safer, more precise robotic interventions in retinal surgery.
In the paper ‘A method for automatic feature points extraction of pelvic surface based on PointMLP_RegNet’, Kou et al. note that the precise extraction of anatomical landmarks from complex pelvic structures is critical for enhancing 3D/3D registration accuracy in robot-assisted fracture reduction. Addressing challenges in manual and conventional automated methods, this study introduces PointMLP_RegNet, a deep learning framework adapted from PointMLP by replacing its classification layer with a regression module to predict spatial coordinates of 10 pelvic landmarks. Trained on a clinical dataset of 40 patient-derived CT-reconstructed point clouds augmented via downsampling, translation, rotation and noise injection, the model demonstrated robust performance through leave-one-out cross-validation. Results revealed sub-5 mm accuracy across all landmarks, with 80% achieving errors below 4 mm, surpassing PointNet++ and PointNet in precision (reducing mean error by 20%–30%) while maintaining superior computational efficiency (0.688 M parameters). By automating feature extraction, the method minimises human variability, streamlines intraoperative registration and improves surgical planning reliability. This innovation bridges technical gaps in pelvic fracture robotics, offering a scalable solution for clinical adoption and underscoring the transformative potential of tailored deep learning architectures in orthopaedic navigation systems.
In the paper ‘Rehabilitation Exoskeleton System with Bidirectional Virtual Reality Feedback Training Strategy’, Gao et al. introduced a VR-integrated exoskeleton system for stroke rehabilitation, combining immersive 3D environments with real-time bidirectional feedback to enhance neural retraining. The system employs a novel muscle activation model merging linear and nonlinear contraction dynamics, addressing limitations of traditional Hill-based models, whereas a WOA-GRNN algorithm achieves precise muscle strength prediction (RMSE: 0.0173, MAPE: 1.25%). Experiments with healthy participants demonstrated synchronised exoskeleton-VR motion mapping and involuntary muscle responses to virtual stimuli, validating neural pathway engagement. Notably, 75% of subjects exhibited subconscious arm movements during VR-induced phantom limb activation, suggesting enhanced proprioceptive integration. This bidirectional feedback framework advances personalised rehabilitation by objectively quantifying recovery through sEMG-driven metrics while maintaining patient engagement through adaptive virtual tasks.
In the paper ‘A Demonstration Trajectory Segmentation Approach for Wheelchair-mounted Robotic Arms’, Chi et al. proposed a novel trajectory segmentation approach for wheelchair-mounted assistive robots, aiming to enhance their ability to learn and reproduce complex tasks in unstructured environments. The proposed GTW-BP-AR-HMM method integrates the generalised time warping (GTW) algorithm with a beta process autoregressive hidden Markov model (BP-AR-HMM) to address challenges in aligning and segmenting variable-length demonstration trajectories. By first aligning multiple task demonstrations temporally using GTW, the framework mitigates inconsistencies in trajectory lengths, a critical limitation of traditional BP-AR-HMM. Subsequent segmentation identifies motion primitives, enabling the creation of reusable task libraries. Validation on a 6-DOF robotic arm demonstrated high accuracy in segmenting tasks such as holding a water glass and eating, with segmentation points closely matching manual annotations. This approach reduces reliance on expert input, simplifying the demonstration process for nonspecialists while improving the robot's adaptability to user-specific needs. The work underscores the potential of combining temporal alignment and probabilistic modelling to advance assistive robotics in healthcare and home settings.
In the paper ‘Processing Water-Medium Spinal Endoscopic Images Based on Dual Transmittance’, Hu and Zhang proposed a novel dual-transmittance fusion method to enhance water-medium spinal endoscopic images degraded by suspended contaminants during minimally invasive procedures. By adapting an underwater imaging model to spinal endoscopy, the authors estimate transmittance through boundary constraints and local contrast analysis, addressing light scattering and absorption caused by turbid surgical environments. The fusion of these transmittance maps, optimised via guided filtering, minimises artefacts while preserving structural integrity. Ambient light estimation using a “Shades of Grey” algorithm further ensures balanced colour correction. Experimental validation against classical methods—including WGIF, AGCWD and MSRCR—demonstrates superior performance in entropy, contrast and structural similarity metrics, effectively restoring tissue textures without overexposure or distortion. This physics-informed approach bridges computational efficiency with clinical utility, offering real-time image clarity for precise intraoperative navigation. The method's robustness across diverse degradation scenarios, from blood contamination to tool shadows, positions it as a pivotal advancement in enhancing visualisation for complex spinal surgeries, promising improved surgical accuracy and safety.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.