{"title":"An Efficient and Lightweight Point Cloud Recognition Network Based on Neighborhood Learning","authors":"Yanxia Bao, Zilong Liu, Yahong Chen, Yang Shen","doi":"10.1049/ipr2.70226","DOIUrl":"https://doi.org/10.1049/ipr2.70226","url":null,"abstract":"<p>Point cloud recognition has wide applications in fields such as autonomous driving and shape classification. Although significant progress has been made in point cloud processing in recent years, most of it has been achieved by designing more complex networks to attain better performance. This paper proposes a novel lightweight point cloud recognition network by introducing a new local neighborhood optimization layer (LNOL), which improves traditional sampling methods by correlation learning in local area. The LNOL is embedded within a single-layer local transformer architecture, significantly reducing computational complexity and parameters while maintaining the model's expressive power. Experimental results on the ModelNet40 benchmark dataset demonstrate that our method achieves a classification accuracy of 93.3% and an average precision of 92.0% without using a voting strategy. Compared to the mainstream local transformer model point transformer, our network requires only 9.95G FLOPs and 2.33M parameters, reducing computational cost by 94.7% and parameter count by 75.7%, with only a 0.4% drop in accuracy. This study provides an efficient solution for real-time 3D recognition applications, significantly lowering computational resource requirements while maintaining performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70226","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145272237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Zhou, Ruiyang Tao, Weiqiang Zhou, Xia Chen, Xiong Li
{"title":"LGL-Net: A Lightweight Global-Local Multiscale Network With Region-Aware Interpretability for Alzheimer's Disease Diagnosis","authors":"Juan Zhou, Ruiyang Tao, Weiqiang Zhou, Xia Chen, Xiong Li","doi":"10.1049/ipr2.70228","DOIUrl":"https://doi.org/10.1049/ipr2.70228","url":null,"abstract":"<p>Alzheimer's disease (AD) is a progressive neurodegenerative disorder marked by gradual cognitive decline and structural brain degeneration. Magnetic resonance imaging (MRI), due to its non-invasive nature and high spatial resolution, plays a pivotal role in the clinical diagnosis of AD. However, considerable challenges persist, primarily due to the heterogeneity of brain structural alterations across individuals and the high computational burden associated with deploying deep learning models in clinical practice. Although recent deep learning-based approaches have significantly improved diagnostic accuracy, most models fail to identify the specific contributions of individual brain regions, limiting their interpretability and clinical applicability. To address these limitations, we propose LGL-Net, a novel lightweight 3D convolutional neural network tailored for efficient extraction and integration of both global and local anatomical features from MRI data. The architecture adopts a dual-branch design, wherein one branch captures whole-brain atrophy patterns, while the other focuses on fine-grained, region-specific structural variations. This design achieves a favourable trade-off between computational efficiency and diagnostic performance, significantly reducing the model's parameter count and computational load without compromising accuracy. Importantly, LGL-Net explicitly maps learnt features onto anatomically defined brain regions, enabling region-level interpretability of classification outcomes. By independently evaluating the contributions of each region to both global and local representations, the model elucidates how multiscale anatomical features collectively influence diagnostic decisions. Experimental results demonstrate that LGL-Net achieves classification performance comparable to existing methods, while substantially lowering model complexity and computational demands. Overall, this framework offers a scalable, interpretable and resource-efficient solution for intelligent AD diagnosis.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70228","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145272030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object Handle Segmentation in 3D Point Cloud for Robot Grasping Using Scale Invariant Heat Kernel Signature With Optimized XGBoost Classifier","authors":"Haniye Merrikhi, Hossein Ebrahimnezhad","doi":"10.1049/ipr2.70225","DOIUrl":"https://doi.org/10.1049/ipr2.70225","url":null,"abstract":"<p>Segmenting graspable regions is crucial for robotic manipulation tasks like pick-and-place and pouring. This study proposes a robust method for detecting handle-like regions in common objects, focusing on slender handles distinct from the main body. This characteristic is prevalent in many daily-use objects that are often manipulated. Our method employs the scale-invariant heat kernel signature (SI-HKS) descriptor to capture local and global shape features of 3D objects. By utilizing SI-HKS properties, we extract meaningful geometric information. Points are classified into segments using the XGBoost classifier, known for its efficiency and accuracy, while hyperparameters are optimized through random search. A post-processing step refines handle detection by filtering out non-graspable regions based on geometric skeleton curvature. The proposed approach is evaluated on a custom dataset in two configurations: five categories of handle-equipped objects and extended version with eleven categories. In the 5-class setup, the method achieves a mean intersection-over-union (mIoU) of 97.6%, outperforming leading deep learning models like PointNet, PointNet++, and DGCNN with statistically significant improvements confirmed by <i>t</i>-tests. In the extended 11-class setup, the method maintains a strong performance with a mean IoU of 97.5%. The use of intrinsic geometric features enhances rotation invariance, ensuring consistent segmentation across different orientations.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Data Modalities and Advances in Related AI Technologies for Oral Cancer Detection","authors":"Sahil Sharma, Seema Wazarkar, Geeta Kasana","doi":"10.1049/ipr2.70223","DOIUrl":"https://doi.org/10.1049/ipr2.70223","url":null,"abstract":"<p>Oral cancer diagnosis represents a significant public health burden; late-stage detection of oral cancer is a major issue for ineffective treatment. Multimodal approaches from artificial intelligence have emerged as a pretty promising approach to address this challenge. In this paper, a comprehensive review of recent studies of oral cancer detection across varied data modalities that utilise technologies such as computer vision, natural language processing, acoustics analysis, Internet of Things, and machine learning and Deep Learning (DL) is presented. Across the reviewed literature, unique datasets spanning imaging, histopathology, spectroscopy, and clinical text are identified and represented. Reported performance metrics vary by modality, such as image-based DL methods, which achieved accuracies between 91% and 99% and area under the curve values up to 0.95, spectroscopy-based approaches reported accuracies above 92%. These results highlight the diagnostic potential of varied data modalities for future research direction, and small, imbalanced datasets, lack of external validation, and personalisation are major concerns to be addressed.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70223","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comrade-Secure Adversarial Noise for 3D Point Cloud Classification Model","authors":"Taehwa Lee, Soojin Lee, Hyun Kwon","doi":"10.1049/ipr2.70215","DOIUrl":"https://doi.org/10.1049/ipr2.70215","url":null,"abstract":"<p>Deep neural networks (DNNs) are effective across many domains, including text, audio, and image. Recently, DNNs have been used in autonomous driving, robotics, and even drones owing to the increasing utilization of 3D data. However, 3D data point clouds are vulnerable to adversarial examples, much like any other form of data. An adversarial example slightly alters the original sample or adds a small amount of noise, making it appear normal to humans, which results in its misclassification by the models. In this study, we propose a method that can be used to generate a “comrade-secure” adversarial point cloud example. In the proposed method, we subtly adjust the positions of certain points in the point cloud to create an adversarial example. This alteration causes the enemy model to misclassify, while the friendly model remains accurate. We use the ModelNet40 dataset for experimental evaluation and utilize PointNet++ and PointNet, which are representative models to classify 3D point clouds, as friendly and enemy models, respectively. In the experiments, the adversarial point cloud examples generated by the proposed method showed that the friendly model achieved an accuracy of 97.65%, and the enemy model was misclassified with an attack success rate of 99.55%.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70215","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Visual State Space Model for Remote Sensing Binary Change Detection","authors":"Huagang Jin, Yu Zhou","doi":"10.1049/ipr2.70214","DOIUrl":"https://doi.org/10.1049/ipr2.70214","url":null,"abstract":"<p>Transformer and convolutional neural network (CNN) have made significant progress in the issue of remote sensing binary change detection. However, Transformer has high quadratic computational complexity, while CNN is limited by a fixed receptive field, which may hinder their capability of learning spatial contextual features. Inspired by the remarkable performance of Mamba on the task of natural language processing, which can effectively make up for the deficiencies of the above two architectures, we tailor the structure of Mamba to solve the issue of binary change detection. In this work, we explore the potential of visual Mamba to address the task of binary change detection in remote sensing imageries, which is abbreviated as Mam-BCD. The entire network is designed as an encoder–decoder architecture. The encoder employs the effective visual Mamba to fully learn global spatial contextual features from input images. For the decoder, we introduce three spatio-temporal feature learning strategies, which can be organically integrated into the Mamba architecture to achieve spatio-temporal interaction between different temporal features. Comprehensive experiments are conducted on three public available datasets to verify the efficacy of the proposed Mam-BCD. Compared to the advanced CTDFormer, Mam-BCD achieves 4.49%, 8.73% and 3.44% gain in accuracy metric on SYSU-CD, LEVIR-CD+ and WHU-CD datasets, respectively.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70214","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nova Hadi Lestriandoko, Luuk J. Spreeuwers, Raymond N. J. Veldhuis
{"title":"Vis-a-Vis: A Tool for Face Components Replacement","authors":"Nova Hadi Lestriandoko, Luuk J. Spreeuwers, Raymond N. J. Veldhuis","doi":"10.1049/ipr2.70212","DOIUrl":"https://doi.org/10.1049/ipr2.70212","url":null,"abstract":"<p>We propose a tool that can replace one or more specific facial components in images for face analysis. The tool can replace the texture and shape of facial components such as eyes, nose, and mouth. The source and destination of the components can be real faces or an average face computed from a dataset. A seamless method is applied to smooth the component boundaries after replacement. The tool is developed using the Python language and is available in open source and online, with a web interface. We also provide a desktop version that can manage multiple files or a dataset as input. The tool can, for instance, be used to investigate the contribution of face components to face recognition, face perception analysis, the change of identity, and fun applications. Some illustrative examples are provided.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145224041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSSNet: An Anchor-Free Rotated Object Detection Network With Dynamic Sample Selection for Remote Sensing Images","authors":"Longbao Wang, Yongheng Yu, Xiaoliang Luo, Lvchun Wang, Mu He, Yican Shen, Zhijun Zhou, Hongmin Gao","doi":"10.1049/ipr2.70224","DOIUrl":"https://doi.org/10.1049/ipr2.70224","url":null,"abstract":"<p>Object detection in remote sensing imagery requires precise localisation and identification of targets under challenging conditions. Facing the challenges of arbitrary target orientations, wide-scale variations, dense distributions, and small objects in remote sensing object detection, anchor-based methods suffer from inadequate rotated target representation using rectangular boxes. This necessitates excessive angle-specific anchors, leading to heavy computational overhead, severe sample imbalance, and slow speeds unsuitable for mobile deployment. To address these accuracy-efficiency trade-offs, we propose DSSNet: an anchor-free rotated object detection network with dynamic sample selection for remote sensing images. DSSNet replaces traditional backbones with the parameter-efficient ConvNeXt-T and utilises an FPN for accelerated multi-scale feature extraction. During prediction, it employs a shape-adaptive selection strategy combined with a contour point quality assessment strategy to dynamically refine target contour points, enabling real-time rotated object detection. The efficacy of DSSNet has been thoroughly validated through benchmark comparisons on diverse datasets. On the DOTA dataset, DSSNet clearly outperforms baseline methods in detection performance, achieving a mean Average Precision (mAP) of 76.97% and the fastest detection speed of 26.2 frames per second (FPS).</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated Detection of Diabetic Retinopathy by Using Global Channel Attention Mechanism","authors":"Jing Qin, Xiaolong Bu","doi":"10.1049/ipr2.70220","DOIUrl":"https://doi.org/10.1049/ipr2.70220","url":null,"abstract":"<p>Diabetic retinopathy (DR), a major ocular complication of diabetes, poses a significant global health challenge. Although convolutional neural networks (CNNs) have demonstrated effectiveness in DR grading tasks, their ability to capture long-range dependencies scattered across fundus images remains limited. To address this limitation, we propose a global channel attention mechanism that incorporates the global feature extraction capability of Vision Transformer (ViT) while maintaining compatibility with CNN architectures, thereby enhancing their ability to model long-range dependencies. Experimental results show that our model achieves test accuracies of 88.49% and 77.33% on the augmented APTOS 2019 and Messidor-2 datasets, respectively, validating the efficacy of the proposed mechanism.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70220","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145224516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Li, Xueqian Jin, Xuejiao Hu, Jinghao Cao, Sidan Du, Yang Li
{"title":"Robust and Flexible Omnidirectional Depth Estimation With Multiple 360-Degree Cameras","authors":"Ming Li, Xueqian Jin, Xuejiao Hu, Jinghao Cao, Sidan Du, Yang Li","doi":"10.1049/ipr2.70217","DOIUrl":"https://doi.org/10.1049/ipr2.70217","url":null,"abstract":"<p>Omnidirectional depth estimation has received much attention from researchers in recent years. However, challenges arise due to camera soiling and variations in camera layouts, affecting the robustness and flexibility of the algorithm. In this paper, we use the geometric constraints and redundant information of multiple 360<span></span><math>\u0000 <semantics>\u0000 <msup>\u0000 <mrow></mrow>\u0000 <mo>∘</mo>\u0000 </msup>\u0000 <annotation>$^circ$</annotation>\u0000 </semantics></math> cameras to achieve robust and flexible multi-view omnidirectional depth estimation. We implement two algorithms, in which the two-stage algorithm obtains initial depth maps by pairwise stereo matching of multiple cameras and fuses the multiple depth maps for the final depth estimation; the one-stage algorithm adopts spherical sweeping based on hypothetical depths to construct a uniform spherical matching cost of the multi-camera images and obtain the depth. Additionally, a generalized epipolar equirectangular projection is introduced to simplify the spherical epipolar constraints. To overcome panorama distortion, a spherical feature extractor is implemented. Furthermore, a synthetic 360<span></span><math>\u0000 <semantics>\u0000 <msup>\u0000 <mrow></mrow>\u0000 <mo>∘</mo>\u0000 </msup>\u0000 <annotation>$^circ$</annotation>\u0000 </semantics></math> dataset of outdoor road scenes is presented, which takes soiled camera lenses and glare into consideration and is more consistent with the real-world environment. Experiments show that our two algorithms achieve state-of-the-art performance, accurately predicting depth maps even when provided with soiled panorama inputs. The flexibility of the algorithms is experimentally validated in terms of camera layouts and numbers.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70217","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145224173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}