{"title":"Improving 3D Object Detection in Neural Radiance Fields With Channel Attention","authors":"Minling Zhu, Yadong Gong, Dongbing Gu, Chunwei Tian","doi":"10.1049/cit2.70045","DOIUrl":"https://doi.org/10.1049/cit2.70045","url":null,"abstract":"<p>In recent years, 3D object detection using neural radiance fields (NeRF) has advanced significantly, yet challenges remain in effectively utilising the density field. Current methods often treat NeRF as a geometry learning tool or rely on volume rendering, neglecting the density field's potential and feature dependencies. To address this, we propose NeRF-C3D, a novel framework incorporating a multi-scale feature fusion module with channel attention (MFCA). MFCA leverages channel attention to model feature dependencies, dynamically adjusting channel weights during fusion to enhance important features and suppress redundancy. This optimises density field representation and improves feature discriminability. Experiments on 3D-FRONT, Hypersim, and ScanNet demonstrate NeRF-C3D's superior performance validating MFCA's effectiveness in capturing feature relationships and showcasing its innovation in NeRF-based 3D detection.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1446-1458"},"PeriodicalIF":7.3,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nanfang Xu, Shanshan Liu, Yuepeng Chen, Kailai Zhang, Chenyi Guo, Cheng Zhang, Fei Xu, Qifeng Lan, Wanyi Fu, Xingyu Zhou, Bo Zhao, Aodong He, Xiangling Fu, Ji Wu, Weishi Li
{"title":"Deep Learning Approach for Automated Estimation of 3D Vertebral Orientation of the Lumbar Spine","authors":"Nanfang Xu, Shanshan Liu, Yuepeng Chen, Kailai Zhang, Chenyi Guo, Cheng Zhang, Fei Xu, Qifeng Lan, Wanyi Fu, Xingyu Zhou, Bo Zhao, Aodong He, Xiangling Fu, Ji Wu, Weishi Li","doi":"10.1049/cit2.70033","DOIUrl":"https://doi.org/10.1049/cit2.70033","url":null,"abstract":"<p>Lumbar degenerative disc diseases constitute a major contributor to lower back pain. In pursuit of an enhanced understanding of lumbar degenerative pathology and the development of more effective treatment modalities, the application of precise measurement techniques for lumbar segment kinematics is imperative. This study aims to pioneer a novel automated lumbar spine orientation estimation method using deep learning techniques, to facilitate the automatic 2D–3D pre-registration of the lumbar spine during physiological movements, to enhance the efficiency of image registration and the accuracy of spinal segment kinematic measurements. A total of 12 asymptomatic volunteers were enrolled and captured in 2 oblique views with 7 different postures. Images were used for deep learning model development training and evaluation. The model was composed of a segmentation module using Mask R-CNN and an estimation module using ResNet50 architecture with a Squeeze-and-Excitation module. The cosine value of the angle between the prediction vector and the vector of ground truth was used to quantify the model performance. Data from another two prospective recruited asymptomatic volunteers were used to compare the time cost between model-assisted registration and manual registration without a model. The cosine values of vector deviation angles at three axes in the cartesian coordinate system were 0.9667 ± 0.004, 0.9593 ± 0.0047 and 0.9828 ± 0.0025, respectively. The value of the angular deviation between the intermediate vector obtained by utilising the three direction vectors and ground truth was 10.7103 ± 0.7466. Results show the consistency and reliability of the model's predictions across different experiments and axes and demonstrate that our approach significantly reduces the registration time (3.47 ± 0.90 min vs. 8.10 ± 1.60 min, <i>p</i> < 0.001), enhances the efficiency, and expands its broader utilisation of clinical research about kinematic measurements.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1306-1319"},"PeriodicalIF":7.3,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70033","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nayef Alqahtani, Arfat Ahmad Khan, Rakesh Kumar Mahendran, Muhammad Faheem
{"title":"VSMI\u0000 2\u0000 \u0000 \u0000 ${text{VSMI}}^{mathbf{2}}$\u0000 -PANet: Versatile Scale-Malleable Image Integration and Patch Wise Attention Network With Transformer for Lung Tumour Segmentation Using Multi-Modal Imaging Techniques","authors":"Nayef Alqahtani, Arfat Ahmad Khan, Rakesh Kumar Mahendran, Muhammad Faheem","doi":"10.1049/cit2.70039","DOIUrl":"https://doi.org/10.1049/cit2.70039","url":null,"abstract":"<p>Lung cancer (LC) is a major cancer which accounts for higher mortality rates worldwide. Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages. Nowadays, machine learning (ML) and deep learning (DL) methodologies are utilised for the robust detection and prediction of lung tumours. Recently, multi modal imaging emerged as a robust technique for lung tumour detection by combining various imaging features. To cope with that, we propose a novel multi modal imaging technique named versatile scale malleable image integration and patch wise attention network (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mtext>VSMI</mtext>\u0000 <mn>2</mn>\u0000 </msup>\u0000 <mo>−</mo>\u0000 <mtext>PANet</mtext>\u0000 </mrow>\u0000 <annotation> ${text{VSMI}}^{2}-text{PANet}$</annotation>\u0000 </semantics></math>) which adopts three imaging modalities named computed tomography (CT), magnetic resonance imaging (MRI) and single photon emission computed tomography (SPECT). The designed model accepts input from CT and MRI images and passes it to the <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mtext>VSMI</mtext>\u0000 <mn>2</mn>\u0000 </msup>\u0000 </mrow>\u0000 <annotation> ${text{VSMI}}^{2}$</annotation>\u0000 </semantics></math> module that is composed of three sub-modules named image cropping module, scale malleable convolution layer (SMCL) and PANet module. CT and MRI images are subjected to image cropping module in a parallel manner to crop the meaningful image patches and provide them to the SMCL module. The SMCL module is composed of adaptive convolutional layers that investigate those patches in a parallel manner by preserving the spatial information. The output from the SMCL is then fused and provided to the PANet module. The PANet module examines the fused patches by analysing its height, width and channels of the image patch. As a result, it provides an output as high-resolution spatial attention maps indicating the location of suspicious tumours. The high-resolution spatial attention maps are then provided as an input to the backbone module which uses light wave transformer (LWT) for segmenting the lung tumours into three classes, such as normal, benign and malignant. In addition, the LWT also accepts SPECT image as input for capturing the variations precisely to segment the lung tumours. The performance of the proposed model is validated using several performance metrics, such as accuracy, precision, recall, <i>F</i>1-score and AUC curve, and the results show that the proposed work outperforms the existing approaches.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1376-1393"},"PeriodicalIF":7.3,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70039","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}