CASHNet: Context-Aware Semantics-driven Hierarchical Network for Hybrid Diffeomorphic CT-CBCT Image Registration.

IF 9.8 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging Pub Date : 2025-09-09 DOI:10.1109/tmi.2025.3607700

Xiaoru Gao,Housheng Xie,Donghua Hang,Guoyan Zheng

{"title":"CASHNet: Context-Aware Semantics-driven Hierarchical Network for Hybrid Diffeomorphic CT-CBCT Image Registration.","authors":"Xiaoru Gao,Housheng Xie,Donghua Hang,Guoyan Zheng","doi":"10.1109/tmi.2025.3607700","DOIUrl":null,"url":null,"abstract":"Computed Tomography (CT) to Cone-Beam Computed Tomography (CBCT) image registration is crucial for image-guided radiotherapy and surgical procedures. However, achieving accurate CT-CBCT registration remains challenging due to various factors such as inconsistent intensities, low contrast resolution and imaging artifacts. In this study, we propose a Context-Aware Semantics-driven Hierarchical Network (referred to as CASHNet), which hierarchically integrates context-aware semantics-encoded features into a coarse-to-fine registration scheme, to explicitly enhance semantic structural perception during progressive alignment. Moreover, it leverages diffeomorphisms to integrate rigid and non-rigid registration within a single end-to-end trainable network, enabling anatomically plausible deformations and preserving topological consistency. CASHNet comprises a Siamese Mamba-based multi-scale feature encoder and a coarse-to-fine registration decoder, which integrates a Rigid Registration (RR) module with multiple Semantics-guided Velocity Estimation and Feature Alignment (SVEFA) modules operating at different resolutions. Each SVEFA module comprises three carefully designed components: i) a cross-resolution feature aggregation (CFA) component that synthesizes enhanced global contextual representations, ii) a semantics perception and encoding (SPE) component that captures and encodes local semantic information, and iii) an incremental velocity estimation and feature alignment (IVEFA) component that leverages contextual and semantic features to update velocity fields and to align features. These modules work synergistically to boost the overall registration performance. Extensive experiments on three typical yet challenging CT-CBCT datasets of both soft and hard tissues demonstrate the superiority of our proposed method over other state-of-the-art methods. The code will be publicly available at https://github.com/xiaorugao999/CASHNet.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"14 1","pages":""},"PeriodicalIF":9.8000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Medical Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/tmi.2025.3607700","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Computed Tomography (CT) to Cone-Beam Computed Tomography (CBCT) image registration is crucial for image-guided radiotherapy and surgical procedures. However, achieving accurate CT-CBCT registration remains challenging due to various factors such as inconsistent intensities, low contrast resolution and imaging artifacts. In this study, we propose a Context-Aware Semantics-driven Hierarchical Network (referred to as CASHNet), which hierarchically integrates context-aware semantics-encoded features into a coarse-to-fine registration scheme, to explicitly enhance semantic structural perception during progressive alignment. Moreover, it leverages diffeomorphisms to integrate rigid and non-rigid registration within a single end-to-end trainable network, enabling anatomically plausible deformations and preserving topological consistency. CASHNet comprises a Siamese Mamba-based multi-scale feature encoder and a coarse-to-fine registration decoder, which integrates a Rigid Registration (RR) module with multiple Semantics-guided Velocity Estimation and Feature Alignment (SVEFA) modules operating at different resolutions. Each SVEFA module comprises three carefully designed components: i) a cross-resolution feature aggregation (CFA) component that synthesizes enhanced global contextual representations, ii) a semantics perception and encoding (SPE) component that captures and encodes local semantic information, and iii) an incremental velocity estimation and feature alignment (IVEFA) component that leverages contextual and semantic features to update velocity fields and to align features. These modules work synergistically to boost the overall registration performance. Extensive experiments on three typical yet challenging CT-CBCT datasets of both soft and hard tissues demonstrate the superiority of our proposed method over other state-of-the-art methods. The code will be publicly available at https://github.com/xiaorugao999/CASHNet.

查看原文本刊更多论文

基于上下文感知语义驱动的CT-CBCT混合差分图像配准层次网络。

计算机断层扫描（CT）到锥形束计算机断层扫描（CBCT）图像配准对于图像引导的放射治疗和外科手术至关重要。然而，由于各种因素，如强度不一致、对比度分辨率低和成像伪影，实现准确的CT-CBCT配准仍然具有挑战性。在这项研究中，我们提出了一个上下文感知语义驱动的分层网络（CASHNet），它分层地将上下文感知语义编码的特征集成到一个从粗到精的注册方案中，以显式地增强在逐步对齐过程中的语义结构感知。此外，它利用微分同态在单个端到端可训练网络中集成刚性和非刚性注册，从而实现解剖学上合理的变形并保持拓扑一致性。CASHNet包括一个基于暹罗曼巴的多尺度特征编码器和一个粗到细的配准解码器，该解码器集成了一个刚性配准（RR）模块和多个以不同分辨率运行的语义引导的速度估计和特征对齐（SVEFA）模块。每个SVEFA模块由三个精心设计的组件组成：1)合成增强的全局上下文表示的跨分辨率特征聚合（CFA）组件，2)捕获和编码局部语义信息的语义感知和编码（SPE）组件，3)利用上下文和语义特征更新速度场和对齐特征的增量速度估计和特征对齐（IVEFA）组件。这些模块协同工作以提高整体注册性能。在三个典型但具有挑战性的软组织和硬组织CT-CBCT数据集上进行的大量实验表明，我们提出的方法优于其他最先进的方法。代码将在https://github.com/xiaorugao999/CASHNet上公开。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Medical Imaging 医学-成像科学与照相技术

CiteScore

21.80

自引率

5.70%

发文量

637

审稿时长

5.6 months

期刊介绍： The IEEE Transactions on Medical Imaging (T-MI) is a journal that welcomes the submission of manuscripts focusing on various aspects of medical imaging. The journal encourages the exploration of body structure, morphology, and function through different imaging techniques, including ultrasound, X-rays, magnetic resonance, radionuclides, microwaves, and optical methods. It also promotes contributions related to cell and molecular imaging, as well as all forms of microscopy. T-MI publishes original research papers that cover a wide range of topics, including but not limited to novel acquisition techniques, medical image processing and analysis, visualization and performance, pattern recognition, machine learning, and other related methods. The journal particularly encourages highly technical studies that offer new perspectives. By emphasizing the unification of medicine, biology, and imaging, T-MI seeks to bridge the gap between instrumentation, hardware, software, mathematics, physics, biology, and medicine by introducing new analysis methods. While the journal welcomes strong application papers that describe novel methods, it directs papers that focus solely on important applications using medically adopted or well-established methods without significant innovation in methodology to other journals. T-MI is indexed in Pubmed® and Medline®, which are products of the United States National Library of Medicine.