{"title":"A Face Quality Assessment System for Unattended Face Recognition: Design and Implementation","authors":"Dunli Hu, Xin Bi, Wei Zhao, Xiaoping Zhang, Xingchen Duan","doi":"10.1049/ipr2.70042","DOIUrl":"https://doi.org/10.1049/ipr2.70042","url":null,"abstract":"<p>This paper presents a face quality assessment approach that selects the highest-quality face image using a two-stage process from video streaming. In high-traffic environments, traditional face recognition methods can cause crowd congestion, emphasizing the need for unconscious face recognition, which requires no active cooperation from individuals. Due to the nature of unconscious face recognition, it is necessary to capture high-quality face images. In this paper, the FSA-Net head pose estimation network is enhanced to FSA-Shared_Nadam by replacing the Adam optimizer with Nadam and improving stage fusion. In the first stage, FSA-Shared_Nadam estimates head pose angles, MediaPipe detects facial landmarks to calculate eye distance and aspect ratios, and sharpness is calculated using the Laplacian operator. Images are considered valid if they meet the criteria. A model trains a face quality scoring formula, learning how different head pose angles affect face recognition accuracy. In the second stage, face images are clustered, and the formula is applied to select the highest-scoring face within each cluster. The approach was tested across multiple datasets, and a simulated security checkpoint scenario was created for practical testing. The results demonstrate the effectiveness of the FSA-Shared_Nadam head pose estimation algorithm and the proposed face quality assessment approach.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Shadow Scenarios Tennis Ball Detection by an Improved RTMdet-Light Model","authors":"Yukun Zhu, Yanxia Peng, Cong Yu","doi":"10.1049/ipr2.70054","DOIUrl":"https://doi.org/10.1049/ipr2.70054","url":null,"abstract":"<p>The real-time and rapid recording of sport sensor data related to tennis ball trajectories facilitates the analysis of this information and the development of intelligent training regimes. However, there are three essential challenges in the task of tennis ball recognition using sport vision sensors: the small size of the ball, its high speed, and the complex match scenarios. As a result, this paper considers a lightweight object detection model named improved RTMDet-light to deal with these challenges. Specifically, it has compatible capacities in the backbone and neck, constructed by a basic building block that consists of large-kernel depth-wise convolutions. Furthermore, GhosNet and ShuffleNet are used to replace the CSPLayers which reduce the parameters of our model. The lightweight model proposed addresses the inherent challenges of detecting small objects and muti scenarios in the match. After training, the proposed model performed better on four scenarios with different shades of tennis ball match, with results visualized through heatmaps and performance metrics tabulated for detailed analysis. The recall, FLOPs and number of parameters of the improved RTMDet-light are 71.4%, 12.543G, and 4.874M, respectively. The results demonstrate robustness and effectiveness of our model in accurate tennis ball detecting across various scales. In conclusion, our model for real-time detection in tennis ball detection offers a lightweight and faster solution for sport sensors.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Behaviour Recognition Method Based on SME-Net","authors":"Ruimin Li, Yajuan Jia, Dan Yao, Fuquan Pan","doi":"10.1049/ipr2.70053","DOIUrl":"https://doi.org/10.1049/ipr2.70053","url":null,"abstract":"<p>Spatiotemporal, motion and channel information are pivotal in video-based behaviour recognition. Traditional 2D CNNs demonstrate low computational complexity but fail to capture temporal dynamics effectively. Conversely, 3D CNNs excel in recognising temporal patterns but at the cost of significantly higher computational demands. To address these challenges, we propose a generic and effective SME module composed of three parallel sub-modules, namely Spatio-Temporal Excitation (STE), Motion Excitation (ME) and Efficient Channel Excitation (ECE). Specifically, the STE module enhances the spatiotemporal representation using a single-channel 3D convolution, enabling the model to focus on both temporal and spatial features. The ME module emphasises motion-sensitive channels by calculating feature map differences at adjacent time steps, guiding the model toward motion-centric regions. The ECE module efficiently captures cross-channel interactions without dimensionality reduction, ensuring robust performance while significantly reducing model complexity. Pre-trained on the ImageNet dataset, the proposed method achieved Top-1 accuracy of 49.0% on the Something-Something V1 (Sth-Sth V1) dataset and 40.8% on the Diving48 dataset. Extensive ablation studies and comparative experiments further demonstrate that the proposed method strikes an optimal balance between recognition accuracy and computational efficiency.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Few-Shot Object Detection Method Based on Faster R-CNN","authors":"YangJie Wei, Shangwei Long, Yutong Wang","doi":"10.1049/ipr2.70038","DOIUrl":"https://doi.org/10.1049/ipr2.70038","url":null,"abstract":"<p>Uneven distribution of object features and insufficient feature learning significantly affect the accuracy and generalizability of existing detection methods. This paper proposes an improved two-stage few-shot object detection method that builds upon the faster region-based convolutional neural network framework to enhance its performance in detecting objects with limited training data. First, a modified data augmentation method for optical images is introduced, and a Gaussian optimization module of sample feature distribution is constructed to enhance the model's generalizability. Second, a parameter-less 3D space attention module without additional parameters, is added to enhance the space features of a sample, where a neuron linear separability measurement and feature optimization module based on mathematical operations are used to adjust the feature distribution and reduce data distribution bias. Finally, a class feature vector extractor based on meta-learning is provided to reconstruct the feature map by overlaying a class feature vector from the target domain onto the query image. This process improves accuracy and generalization performance, and multiple experiments on the PASCAL VOC dataset show that the proposed method has higher detection accuracy and stronger generalizability than other methods. Especially, the experiment using practical images under complicated environments indicates its potential effectiveness in real-world scenarios.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143688919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Uncertainty-Awared for Semantic Segmentation of Remote Sensing Images","authors":"Xiangfeng Qiu, Zhilin Zhang, Xin Luo, Xiang Zhang, Youcheng Yang, Yundong Wu, Jinhe Su","doi":"10.1049/ipr2.70045","DOIUrl":"https://doi.org/10.1049/ipr2.70045","url":null,"abstract":"<p>Remote sensing image segmentation is crucial for applications ranging from urban planning to environmental monitoring. However, traditional approaches struggle with the unique challenges of aerial imagery, including complex boundary delineation and intricate spatial relationships. To address these limitations, we introduce the semantic uncertainty-aware segmentation (SUAS) method, an innovative plug-and-play solution designed specifically for remote sensing image analysis. SUAS builds upon the rotated multi-scale interaction network (RMSIN) architecture and introduces the prompt refinement and uncertainty adjustment module (PRUAM). This novel component transforms original textual prompts into semantic uncertainty-aware descriptions, particularly focusing on the ambiguous boundaries prevalent in remote sensing imagery. By incorporating semantic uncertainty, SUAS directly tackles the inherent complexities in boundary delineation, enabling more refined segmentations. Experimental results demonstrate SUAS's effectiveness, showing improvements over existing methods across multiple metrics. SUAS achieves consistent enhancements in mean intersection-over-union (mIoU) and precision at various thresholds, with notable performance in handling objects with irregular and complex boundaries—a persistent challenge in aerial imagery analysis. The results indicate that SUAS's plug-and-play design, which leverages semantic uncertainty to guide the segmentation task, contributes to improved boundary delineation accuracy in remote sensing image analysis.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143688921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tri-Plane Dynamic Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis","authors":"Xueping Wang, Xueni Guo, Jun Xu, Yuchen Wu, Feihu Yan, Guangzhe Zhao","doi":"10.1049/ipr2.70044","DOIUrl":"https://doi.org/10.1049/ipr2.70044","url":null,"abstract":"<p>Neural radiation field (NeRF) has been widely used in the field of talking portrait synthesis. However, the inadequate utilisation of audio information and spatial position leads to the inability to generate images with high audio-lip consistency and realism. This paper proposes a novel tri-plane dynamic neural radiation field (Tri-NeRF) that employs an implicit radiation field to study the impacts of audio on facial movements. Specifically, Tri-NeRF propose tri-plane offset network (TPO-Net) to offset spatial positions in three 2D planes guided by audio. This allows for sufficient learning of audio features from image features in a low dimensional state to generate more accurate lip movements. In order to better preserve facial texture details, we innovatively propose a new gated attention fusion module (GAF) to dynamically fuse features based on strong and weak correlation of cross-modal features. Extensive experiments have demonstrated that Tri-NeRF can generate talking portraits with audio-lip consistency and realism.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143646103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanxi Wu, Yalin Yang, Zhuoran Yang, Zhizhuo Yu, Jing Lian, Bin Li, Jizhao Liu, Kaiyuan Yang
{"title":"Multi-Channel Deep Pulse-Coupled Net: A Novel Bearing Fault Diagnosis Framework","authors":"Yanxi Wu, Yalin Yang, Zhuoran Yang, Zhizhuo Yu, Jing Lian, Bin Li, Jizhao Liu, Kaiyuan Yang","doi":"10.1049/ipr2.70033","DOIUrl":"https://doi.org/10.1049/ipr2.70033","url":null,"abstract":"<p>Bearings are a critical part of various industrial equipment. Existing bearing fault detection methods face challenges such as complicated data preprocessing, difficulty in analysing time series data, and inability to learn multi-dimensional features, resulting in insufficient accuracy. To address these issues, this study proposes a novel bearing fault diagnosis model called multi-channel deep pulse-coupled net (MC-DPCN) inspired by the mechanisms of image processing in the primary visual cortex of the brain. Initially, the data are transformed into greyscale spectrograms, allowing the model to handle time series data effectively. The method introduces a convolutional coupling mechanism between multiple channels, enabling the framework can learn the features on all channels well. This study conducted experiments using the bearing fault dataset from Case Western Reserve University. On this dataset, a 6-channel (adjustable to specific tasks) MC-DPCN was utilized to analyse one normal class and three fault classes. Compared to state-of-the-art bearing fault diagnosis methods, our model demonstrates one of the highest diagnostic accuracies. This method achieved an accuracy of 99.96% in normal vs. fault discrimination and 99.89% in fault type diagnosis (average result of ten-fold cross-validation).</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70033","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143638675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frame Extraction Person Retrieval Framework Based on Improved YOLOv8s and the Stage-Wise Clustering Person Re-Identification","authors":"Jianjun Zhuang, Nan Wang, Yuchen Zhuang, Yong Hao","doi":"10.1049/ipr2.70046","DOIUrl":"https://doi.org/10.1049/ipr2.70046","url":null,"abstract":"<p>Person re-identification (Re-ID), a crucial research area in smart city security, faces challenges due to person posture changes, object occlusion and other factors, making it difficult for existing methods to accurately retrieving target person in video surveillance. To resolve this problem, we propose a person retrieval framework that integrates YOLOv8s and person Re-ID. Improved YOLOv8s is employed to extract person categories from the video on a frame-by-frame basis, and when combined with the stage-wise clustering person Re-ID network (SCPN), it enables collaborative person retrieval across multiple cameras. Notably, a feature precision (FP) module is added in the YOLOv8s network to form FP-YOLOv8s, and SCPN incorporates innovative enhancements including the stage-wise learning rate scheduler, centralized clustering loss and adaptive representation joint attention module into the person Re-ID baseline model. Comprehensive experiments on COCO, Market-1501 and DukeMTMC-ReID datasets demonstrate that our proposed framework outperforms several other leading methods. Given the scarcity of image-video person Re-ID datasets, we also provide an extended image-video person (EIVP) dataset, which contains 102 videos and 814 bounding boxes of 57 identities captured by 8 cameras. The video reasoning detection score of this framework reaches 78.8% on this dataset, indicating a 3.2% increase compared to conventional models.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143632796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyi Wu, Lucy J. Kessler, Xiang Chen, Yiguo Pan, Xiaoxia Yang, Ling Zhao, Jufeng Zhao, Gerd U. Auffarth
{"title":"Retinal Fundus Image Enhancement With Detail Highlighting and Brightness Equalizing Based on Image Decomposition","authors":"Zhiyi Wu, Lucy J. Kessler, Xiang Chen, Yiguo Pan, Xiaoxia Yang, Ling Zhao, Jufeng Zhao, Gerd U. Auffarth","doi":"10.1049/ipr2.70041","DOIUrl":"https://doi.org/10.1049/ipr2.70041","url":null,"abstract":"<p>High-quality retinal fundus images are widely used by ophthalmologists for the detection and diagnosis of eye diseases, diabetes, and hypertension. However, in retinal fundus imaging, the reduction in image quality, characterized by poor local contrast and non-uniform brightness, is inevitable. Image enhancement becomes an essential and practical strategy to address these issues. In this paper, we propose a retinal fundus image enhancement method that emphasizes details and equalizes brightness, based on image decomposition. First, the original image is decomposed into three layers using an edge-preserving filter: a base layer, a detail layer, and a noise layer. Second, an adaptive local power-law approach is applied to the base layer for brightness equalization, while detail enhancement is achieved for the detail layer through saliency analysis and blue channel removal. Finally, the base and detail layers are combined, excluding the noise layer, to synthesize the final image. The proposed method is evaluated and compared with both classical and recent approaches using two widely adopted datasets. According to the experimental results, both subjective and objective assessments demonstrate that the proposed method effectively enhances retinal fundus images by highlighting details, equalizing brightness, and suppressing noise and artifacts, all without causing color distortion.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143632797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Scale Frequency Enhancement Network for Blind Image Deblurring","authors":"YaWen Xiang, Heng Zhou, Xi Zhang, ChengYang Li, ZhongBo Li, YongQiang Xie","doi":"10.1049/ipr2.70036","DOIUrl":"https://doi.org/10.1049/ipr2.70036","url":null,"abstract":"<p>Image deblurring is a fundamental preprocessing technique aimed at recovering clear and detailed images from blurry inputs. However, existing methods often struggle to effectively integrate multi-scale feature extraction with frequency enhancement, limiting their ability to reconstruct fine textures, especially in the presence of non-uniform blur. To address these challenges, we propose a multi-scale frequency enhancement network (MFENet) for blind image deblurring. MFENet introduces a multi-scale feature extraction module (MS-FE) based on depth-wise separable convolutions to capture rich multi-scale spatial and channel information. Furthermore, the proposed method employs a frequency enhanced blur perception module (FEBP) that utilizes wavelet transforms to extract high-frequency details and multi-strip pooling to perceive non-uniform blur. Experimental results on the GoPro and HIDE datasets demonstrate that our method achieves superior deblurring performance in both visual quality and objective evaluation metrics. Notably, in downstream object detection tasks, our blind image deblurring algorithm significantly improves detection accuracy, further validating its effectiveness and robustness in practical applications.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70036","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143632621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}