{"title":"Subspace-Guided Feature Reconstruction for Unsupervised Anomaly Localization","authors":"Katsuya Hotta, Chao Zhang, Yoshihiro Hagihara, Takuya Akashi","doi":"10.1049/ipr2.70157","DOIUrl":"10.1049/ipr2.70157","url":null,"abstract":"<p>Unsupervised anomaly localization aims to identify anomalous regions that deviate from normal sample patterns. Most recent methods perform feature matching or reconstruction for the target sample with pre-trained deep neural networks. However, they still struggle to address challenging anomalies because the deep embeddings stored in the memory bank can be less powerful and informative. Specifically, prior methods often overly rely on the finite resources stored in the memory bank, which leads to low robustness to unseen targets. In this paper, we propose a novel subspace-guided feature reconstruction framework to pursue adaptive feature approximation for anomaly localization. It first learns to construct low-dimensional subspaces from the given nominal samples, and then learns to reconstruct the given deep target embedding by linearly combining the subspace basis vectors using the self-expressive model. Our core is that, despite the limited resources in the memory bank, the out-of-bank features can be alternatively “mimicked” to adaptively model the target. Moreover, we propose a sampling method that leverages the sparsity of subspaces and allows the feature reconstruction to depend only on a small resource subset, contributing to less memory overhead. Extensive experiments on three benchmark datasets demonstrate that our approach generally achieves state-of-the-art anomaly localization performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70157","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144635091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liyuan Yang, Ming Yang, Ghazali Osman, Safawi Abdul Rahman, Muhammad Firdaus Mustapha
{"title":"Fuzzy-YOLO Model for Rail Anomaly Detection: Robustness Under Limited Sample and Interference Conditions","authors":"Liyuan Yang, Ming Yang, Ghazali Osman, Safawi Abdul Rahman, Muhammad Firdaus Mustapha","doi":"10.1049/ipr2.70156","DOIUrl":"10.1049/ipr2.70156","url":null,"abstract":"<p>Accurate detection of surface anomalies in railway tracks is critical for ensuring train operation safety and enabling intelligent railway management. However, the scarcity and pronounced imbalance of anomaly samples significantly constrain model training and generalisation. Moreover, complex environmental factors such as illumination variability, sensor noise, and motion blur pose additional challenges to model robustness in real-world applications. This study presents a Fuzzy-YOLO model tailored for limited sample datasets. Built upon YOLOv11, Fuzzy-YOLO incorporates a fuzzy-non-maximum suppression (NMS) mechanism and integrates a lightweight fuzzy residual neural network (RFNN-Res) module based on fuzzy logic for anomaly classification. The final anomaly type is determined via a weighted voting strategy. Experimental evaluations demonstrate that Fuzzy-YOLO achieves a mean average precision (mAP) of 98.90%, exhibiting notably enhanced stability compared to YOLOv11 under conditions of varying illumination, noise, and motion-induced blur.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70156","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144624264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Survey of Advancement in Lip Reading Models: Techniques and Future Directions","authors":"Sampada Deshpande, Kalyani Shirsath, Amey Pashte, Pratham Loya, Sandip Shingade, Vijay Sambhe","doi":"10.1049/ipr2.70095","DOIUrl":"10.1049/ipr2.70095","url":null,"abstract":"<p>Lip reading models improve information processing and decision-making by quickly and accurately comprehending enormous amounts of text. This study dives into the important role that lip reading plays in making communication more inclusive, especially for individuals with hearing impairments. From 2020 to 2024, the researchers carefully examine the progress made in lip-reading algorithms. They take a close look at the methods, innovations and principles used to decode spoken content from videos, specifically using visual speech recognition techniques. The study also emphasises the use of datasets like LRW, LRS2 and LRS3, which are crucial for this exploration. This paper offers valuable insights into recent advancements and highlights the importance of diverse datasets in improving lip-reading models. Its findings aim to guide future research efforts in making communication more accessible for people with hearing impairments.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70095","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144615041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Surbhi Bhatia Khan, Muskan Gupta, Bakkiyanathan Gopinathan, Mahesh Thyluru RamaKrishna, Mo Saraee, Arwa Mashat, Ahlam Almusharraf
{"title":"DeepFake Detection: Evaluating the Performance of EfficientNetV2-B2 on Real vs. Fake Image Classification","authors":"Surbhi Bhatia Khan, Muskan Gupta, Bakkiyanathan Gopinathan, Mahesh Thyluru RamaKrishna, Mo Saraee, Arwa Mashat, Ahlam Almusharraf","doi":"10.1049/ipr2.70152","DOIUrl":"10.1049/ipr2.70152","url":null,"abstract":"<p>The surge in digitally altered images has necessitated advanced solutions for reliable image verification, impacting sectors from media to cybersecurity. This work provides an effective method of real vs. deepfake image distinction through utilization of the EfficientNetV2-B2 model, the latest in convolutional neural networks known for its accuracy and effectiveness. The research utilized a big dataset of 100,000 images equally divided between deepfake and real classes to create a balanced sample. The methodology involved preprocessing images to a fixed size, utilizing augmentation techniques to enhance model robustness, and employing a systematic training schedule along with accuracy parameter optimization. Significantly, the research utilized an automated learning rate adjustment mechanism to optimize training performance, contributing to a complex model calibration. Outcome of the experiment design was showing 99.89% classification accuracy and an equally impressive F1 score, which is a measure of the efficiency of the model in identifying deepfakes. The results provided in-depth analysis with some misclassifications, providing recommendations for potential image processing and model training improvements. The outcome points to the suitability of applying EfficientNetV2-B2 where there is a requirement for high accuracy in image authentication.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70152","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144615043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuzhou Chen, Bin Zhang, Hongqing Song, Mingqian Du
{"title":"A High-Speed Dynamic Measurement Method for Checked Luggage Dimensions","authors":"Yuzhou Chen, Bin Zhang, Hongqing Song, Mingqian Du","doi":"10.1049/ipr2.70148","DOIUrl":"10.1049/ipr2.70148","url":null,"abstract":"<p>Ensuring compliance with stringent luggage size regulations is critical for operational efficiency and cost control in modern airports. However, conventional measurement methods often face a trade-off between speed and accuracy in the dynamic environment of check-in counters. To address these limitations, we propose a real-time luggage dimension and orientation measurement system based on a single RGB-D camera and the YOLOv8 object detection model. As luggage travels at 0.75 m/s along a conveyor, the system first detects and classifies each item, then combines two-dimensional image analysis with three-dimensional point cloud processing to compute length, width, height, and deflection angle. Trained on 7000 annotated images and validated on 100 physical samples, our method achieves average dimensional errors below 4% and angular deviations within 3°, with a mean processing time of 40 ms per item. Comparative experiments demonstrate that, under similar computational constraints, the proposed approach outperforms traditional techniques in both accuracy and robustness, thereby offering a reliable solution for enhancing real-time luggage assessment at airport check-in terminals.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70148","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144615042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuanhui Li, Hao Wang, Hangyu Bai, Xin Sun, Tao Zhang
{"title":"An Attention Augmentation-Based Transformer Network for Unsupervised Medical Image Registration","authors":"Chuanhui Li, Hao Wang, Hangyu Bai, Xin Sun, Tao Zhang","doi":"10.1049/ipr2.70154","DOIUrl":"10.1049/ipr2.70154","url":null,"abstract":"<p>Transformer-based models have achieved significant success in medical image registration in recent years. Since the self-attention operation has quadratic complexity, it usually causes huge computational overhead for these methods. So, how to provide higher quality registration while being efficient in terms of parameters and computational cost is a research hotspot. For this goal, we propose A<sup>2</sup>TNet, an attention augmentation-based transformer network, wherein the attention augmentation is achieved via combining the spatial attention and channel attention together. Meanwhile, a shifted window mechanism is introduced to further reduce the calculation complexity of the proposed attention module. Experiments carried out on two different brain MRI datasets, LPBA and Mindboggle, demonstrate that A<sup>2</sup>TNet can improve registration accuracy while effectively controlling complexity compared to existing deep learning registration models.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70154","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144615044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DF-3DNet: A Lightweight Approach Based on Deep Learning for 3D Telecommunication Tower Asset Classification","authors":"Amzar Omairi, Zool Hilmi Ismail, Gianmarco Goycochea Casas","doi":"10.1049/ipr2.70149","DOIUrl":"10.1049/ipr2.70149","url":null,"abstract":"<p>The transition from 4G to 5G communication systems and the phase-out of 3G equipment have increased the demand for efficient telecommunication tower inspection and maintenance. Traditional manual methods are time-consuming and risky, prompting the adoption of unmanned aerial vehicles (UAVs) equipped with LiDAR sensors. This research introduces a framework for telecommunication tower asset inspection, utilising a lightweight, deep learning-based 3D classifier called DF-3DNet. The process involves raw 3D point cloud data collection using DJI's Zenmuse L1 LiDAR, optimal flight planning, data pre-processing, augmentation, and classification. The study focuses on two key asset classes—radio frequency (RF) panels and microwave (MW) dishes—which are prevalent in telecommunication towers. DF-3DNet, an enhanced version of PointNet, incorporates advanced data augmentation methods and class balance compensation to optimise performance, particularly when working with limited datasets. The model achieved classification accuracies of 0.6613 on ScanObjectNN, 0.8171 on ModelNet40, and 0.869 on the telecommunication tower dataset, demonstrating its effectiveness in handling noisy, small-scale data. By streamlining inspection workflows and leveraging AI-driven classification, this framework significantly reduces costs, time, and risks associated with traditional methods, paving the way for scalable, real-time telecommunication tower asset management.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70149","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yueying Tian, Elif Ucurum, Xudong Han, Rupert Young, Chris Chatwin, Philip Birch
{"title":"Enhancing Fetal Plane Classification Accuracy With Data Augmentation Using Diffusion Models","authors":"Yueying Tian, Elif Ucurum, Xudong Han, Rupert Young, Chris Chatwin, Philip Birch","doi":"10.1049/ipr2.70151","DOIUrl":"10.1049/ipr2.70151","url":null,"abstract":"<p>Ultrasound imaging is widely used in medical diagnosis, especially for fetal health assessment. However, the availability of high-quality annotated ultrasound images is limited, which restricts the training of machine learning models. In this paper, we investigate the use of diffusion models to generate synthetic ultrasound images to improve the performance on fetal plane classification. We train different classifiers first on synthetic images and then fine-tune them with real images. Extensive experimental results demonstrate that incorporating generated images into training pipelines leads to better classification accuracy than training with real images alone. The findings suggest that generating synthetic data using diffusion models can be a valuable tool in overcoming the challenges of data scarcity in ultrasound medical imaging.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70151","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144574127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yulin Chen, Qian Huang, Meng Geng, Zhijian Wang, Yi Han
{"title":"A Systematic Review on Cell Nucleus Instance Segmentation","authors":"Yulin Chen, Qian Huang, Meng Geng, Zhijian Wang, Yi Han","doi":"10.1049/ipr2.70129","DOIUrl":"10.1049/ipr2.70129","url":null,"abstract":"<p>Cell nucleus instance segmentation plays a pivotal role in medical research and clinical diagnosis by providing insights into cell morphology, disease diagnosis, and treatment evaluation. Despite significant efforts from researchers in this field, there remains a lack of a comprehensive and systematic review that consolidates the latest advancements and challenges in this area. In this survey, we offer a thorough overview of existing approaches to nucleus instance segmentation, exploring both traditional and deep learning-based methods. Traditional methods include watershed, thresholding, active contour model, and clustering algorithms, while deep learning methods include one-stage methods and two-stage methods. For these methods, we examine their principles, procedural steps, strengths, and limitations, offering guidance on selecting appropriate techniques for different types of data. Furthermore, we comprehensively investigate the formidable challenges encountered in the field, including ethical implications, robustness under varying imaging conditions, computational constraints, and the scarcity of annotated data. Finally, we outline promising future directions for research, such as privacy-preserving and fair AI systems, domain generalization and adaptation, efficient and lightweight model design, learning from limited annotations, as well as advancing multimodal segmentation models.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70129","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144574128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sonia Rehman, Muhammad Habib, Aftab Farrukh, Aarif Alutaybi
{"title":"Improved Image Denoising: A Combination Method Using Multiscale Contextual Fusion and Recursive Learning","authors":"Sonia Rehman, Muhammad Habib, Aftab Farrukh, Aarif Alutaybi","doi":"10.1049/ipr2.70143","DOIUrl":"10.1049/ipr2.70143","url":null,"abstract":"<p>The exponential growth of imaging technology has led to a surge in visual content creation, necessitating advanced image denoising algorithms. Conventional methods, which frequently rely on predefined rules and filters, are inadequate for managing intricate noise patterns while maintaining image features. In order to tackle the issue of real-world image denoising, we investigate and integrate a new novel technique named recursive context fusion network (RCFNet) employing a deep convolutional neural network, demonstrating superior performance compared to current state-of-the-art approaches. RCFNet consists of a coarse feature extraction module and a reconstruction unit, where the former provides a broad contextual understanding and the latter refines the denoising output by preserving spatial and contextual details. Deep CNN learns features instead of using conventional methods, allowing us to improve and refine images. Dual attention units (DUs), in conjunction with the multi-scale resizing Block (MSRB) and selective kernel feature fusion (SKFF), are incorporated into the network to ensure efficient and reliable feature extraction. To demonstrate the advantages and challenges of combining many configurations into a single pipeline, we take a more detailed look at the results. By leveraging the complementary properties of these networks and computational models, we prefer to contribute to the creation of techniques that enhance image restoration while preserving crucial information, therefore encouraging further research and applications in image processing and artificial intelligence. The RCFNet achieves a high structural similarity index (SSIM) of 0.98 and a peak signal-to-noise ratio (PSNR) of 43.4 dB, outperforming many state-of-the-art methods on two benchmark datasets (DND and SIDD) and demonstrating its superior real-world image denoising ability.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70143","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}