Zijun Gao, Jingwen Su, Bo Li, Jue Wang, Zhankui Song
{"title":"Efficient method for detecting targets from remote sensing images based on global attention mechanism","authors":"Zijun Gao, Jingwen Su, Bo Li, Jue Wang, Zhankui Song","doi":"10.1049/ipr2.70012","DOIUrl":"https://doi.org/10.1049/ipr2.70012","url":null,"abstract":"<p>Remote sensing image target detection provides an effective and accurate data analysis tool for many application areas. Due to complex backgrounds, large differences in target scales, and missed detection of small targets, remote sensing image target detection is challenging. In order to enhance the model's understanding of the global information of remote sensing images, this paper proposes the GFA module. This module can establish the global contextual connection of remote sensing images to provide rich context to help understand the complex scene and background in which the target is located, without being limited to local information. Additionally, it focuses on channel information for enhanced target feature extraction. For the purpose of alleviating the serious imbalance in foreground–background samples that is present in single-level target detection models. The loss function is reconstructed based on focal loss by redefining the balance factor <i>α</i> and focus factor <i>γ</i>, so that it can be dynamically adjusted during network training. Meanwhile, EIoU is used to further enhance the bounding box regression capability. Affine transformations were also used to augment the dataset in order to assist the model in adjusting to real-world situations. The proposed method is experimentally validated on the publicly available HRRSD dataset. In comparison with YOLO v5, the mAP of the detection results improved by 2.7%. Compared with YOLO v8 and YOLO v10, the mAP improved by 3.2% and 3.3%. The model achieves an FPS of 40.1, an optimal balance between speed and accuracy. Further, experiments are conducted using the NWPU VHR-10 dataset and the RSOD dataset, both of which demonstrated that the proposed method outperforms other target detection methods and improves remote sensing target detection performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight Accelerated Unfolding Network With Collaborative Attention for Snapshot Spectral Compressive Imaging","authors":"Mengjie Qin, Yuchao Feng","doi":"10.1049/ipr2.70024","DOIUrl":"https://doi.org/10.1049/ipr2.70024","url":null,"abstract":"<p>In coded aperture snapshot spectral imaging (CASSI) systems, deep unfolding networks (DUNs) have made significant strides in recovering 3D hyperspectral images (HSIs) from a single 2D measurement. However, the inherent nonlinearity and ill-posed nature of HSI reconstruction continue to challenge existing methods in terms of accuracy and stability. To address these challenges, we propose a lightweight collaborative attention-enhanced accelerated unfolding network (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mi>CA</mi>\u0000 <mn>2</mn>\u0000 </msup>\u0000 <mi>UN</mi>\u0000 </mrow>\u0000 <annotation>${rm CA}^2{rm UN}$</annotation>\u0000 </semantics></math>), which integrates a DUN framework with a streamlined prior extractor. Our integrated approach introduces a generically accelerated half-quadratic splitting algorithm (A-HQS) for degradation estimation, overcoming the limitations of first-order optimization and enabling effective long-range dependency modeling. Within the prior extractor, we introduce cross-convergence attention, facilitating iterative information exchange between local and non-local Transformers to capture holistic features and enhance inductive capacity. Notably, the concept of collaborative cross-convergence is embedded throughout all submodules, ensuring effective information flow. The proposed <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mi>CA</mi>\u0000 <mn>2</mn>\u0000 </msup>\u0000 <mi>UN</mi>\u0000 </mrow>\u0000 <annotation>${rm CA}^2{rm UN}$</annotation>\u0000 </semantics></math> not only accelerates the convergence of spectral reconstruction, but also fully exploits compressed spatial-spectral information. Numerical and visual comparisons on both synthetic and real datasets demonstrate the superior performance of this approach. Comparisons on both synthetic and real datasets illustrate the superiority of this approach. The source code is available at https://github.com/Mengjie-s/CA2UN.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Text Watermarking Based on Modifying the Stroke Components of Chinese Characters","authors":"Hai Chen, Yanli Chen, Zhicheng Dong, Yongrong Wang, Asad Malik, Hanzhou Wu","doi":"10.1049/ipr2.70025","DOIUrl":"https://doi.org/10.1049/ipr2.70025","url":null,"abstract":"<p>Traditional codebooks used for tracing information leakage in text documents often suffer from limitations in embedding capacity, robustness, and efficiency due to their manual generation process. This paper proposes a robust text watermarking method based on the stroke components of Chinese characters. By designing an innovative approach, Chinese character strokes are divided into several distinct components, with only specific ones being selectively modified to generate new glyphs, thus forming a unique codebook. The watermark signals are embedded by substituting the carrier glyph with the newly generated one, and the signals are extracted using a template matching method. Experimental results demonstrate that, compared to traditional manually designed codebooks, the proposed method significantly reduces human labor and computational overhead while maintaining high visual quality. Moreover, it exhibits superior robustness and adaptability across various challenging scenarios, including digital noise attacks, print-scanning attacks, and print-camera capture, making it a highly effective solution for protecting textual information.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight Multi-Stage Holistic Attention-Based Network for Image Super-Resolution","authors":"Aatiqa Bint E Ghazali, Ahsan Fiaz, Muhammad Islam","doi":"10.1049/ipr2.70013","DOIUrl":"https://doi.org/10.1049/ipr2.70013","url":null,"abstract":"<p>High-resolution images are crucial for many applications, but factors such as environmental conditions can reduce image quality. Super-resolution (SR) techniques address this by generating high-resolution images from low-resolution inputs. While deep learning SR models have made significant progress, they can be computationally expensive and struggle with differentiating between various image scales. Lightweight SR methods, suitable for resource-constrained devices, often compromise image quality. This study introduces a multi-stage holistic attention-based network, using Gaussian Laplacian pyramids to decompose images and apply holistic attention modules at each level. This approach reduces parameters and computational costs while maintaining image quality, achieving a PSNR score of 28 and SSIM of 0.91 with only 29,000 parameters. The model demonstrates the potential for efficient and high-quality image reconstruction. Future work will focus on improving quality while minimizing costs and exploring other advanced techniques. The code will be made available upon request</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143497181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FMR-YOLO: An improved YOLOv8 algorithm for steel surface defect detection","authors":"Yongjing Ni, Qi Wu, Xiuqing Zhang","doi":"10.1049/ipr2.70009","DOIUrl":"https://doi.org/10.1049/ipr2.70009","url":null,"abstract":"<p>To address the insufficient feature extraction capability for steel surface defects in industrial production, as well as issues such as low detection speed and poor accuracy caused by large model parameters, a metal surface defect detection algorithm named FMR-YOLO, based on an improved YOLOv8n, is proposed. The algorithm incorporates a fast lightweight feature extraction structure, the number of parameters and computation of the model are reduced while preserving the spatial information, thus improving the target detection performance. A multi-scale feature fusion module is introduced, enabling the extraction of more comprehensive and richer features compared to traditional single-scale methods, to better support defect detection tasks. Additionally, a receptive field attention structure, Receptive Field Attention Neck, is designed in the Neck part to expand the model's receptive field and reduce computational complexity, significantly improving detection accuracy for small defects. This allows the model to effectively capture both global and local features in complex industrial scenarios. The effectiveness of the improved FMR-YOLO algorithm is validated on two industrial surface defect datasets: GC10-DET and NEU-DET. Experimental results show that the [email protected] detection accuracy has increased by 4.5% and 5.1% on the GC10-DET and NEU-DET datasets, respectively, with a parameter size of merely 2.7 M.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143497311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GAN-Based Super-Resolution With Enhanced Multi-Scale Laplacian Pyramid and Frequency Domain Loss","authors":"Hao Chen, Xi Lu, Jixining Zhu","doi":"10.1049/ipr2.70028","DOIUrl":"https://doi.org/10.1049/ipr2.70028","url":null,"abstract":"<p>Super-resolution techniques play an important role in the fields of image processing and computer vision. However, existing super-resolution methods based on generative adversarial networks still exhibit significant shortcomings in recovering high-frequency details and effectively utilising multi-scale information. To address these issues, this paper proposes an improved generative adversarial network. Specifically, an enhanced multi-scale Laplacian pyramid structure is designed to capture and process image details at different scales. Then, convolutional operations are added to each layer of the pyramid to further improve the recovery of multi-scale details. Additionally, a frequency domain loss is introduced, where the generated and real images are transformed into the frequency domain using Fourier transforms for comparison. This method enhances the reconstruction of high-frequency details. The experiments are validated on four publicly available datasets and the results show that the proposed network significantly outperforms existing methods in both reconstruction quality and visual performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Calibration Method for Ultra-Wide FOV Fisheye Cameras Based on Improved Camera Model and SE(3) Image Pre-Correction","authors":"Rui Xing, Fenghua He, Yu Yao","doi":"10.1049/ipr2.70021","DOIUrl":"https://doi.org/10.1049/ipr2.70021","url":null,"abstract":"<p>The severe radial distortion of ultra-wide field of view (FOV) fisheye camera results in poor model fitting and challenges in calibration board detection. In this paper, a novel calibration method for ultra-wide FOV fisheye cameras is proposed based on improved camera model and SE(3) image pre-correction. Initially, a method to extend the maximum fitting FOV of the camera model to over 180 degrees is proposed. Subsequently, a calibration board detection approach is proposed using SE(3) image pre-correction. Specifically, image pre-correction is incorporated into the camera calibration process, utilizing SE(3) to define the pre-correction plane. Calibration boards are detected within the pre-corrected images, enhancing the reliability, accuracy and speed of board detection in distorted images, consequently increasing the control point's maximum FOV. Lastly, the improved camera model and SE(3) image pre-correction are integrated into a feedback-based camera calibration system for ultra-wide FOV fisheye cameras. Operating with real-time or offline video streams as input, this system autonomously selects calibration key frames, optimizes camera parameters and calibration board poses in real-time. Simulation and real-world experiments verify the effectiveness of the proposed method, leading to a 62% increase in the achievable maximum FOV.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70021","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143475453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Genetic Algorithm-Based Network for Text Localization in Degraded Social Media Images","authors":"Shivakumara Palaiahnakote, Chandrahas Pavan Kumar, Pranjal Aggarwal, Shubham Sharma, Pasupuleti Chandana, Mahadveppa Basavanna, Umapada Pal","doi":"10.1049/ipr2.70030","DOIUrl":"https://doi.org/10.1049/ipr2.70030","url":null,"abstract":"<p>This paper presents a novel model for understanding social image content through text localization. For text localization, we explore maximally stable extremal regions (MSER) for detecting components that work by clustering pixels with similar properties. The output of component detection includes several non-text components due to the degradations of social media images. To select the best components among many, we explore the genetic algorithm by convolving different kernels with components, which results in a feature matrix that is further fed to EfficientNet for choosing actual text components. Therefore, the proposed model is called genetic algorithm based network for text localization in degraded social media images (TLDSMI). For evaluating text localization, we consider the images of the standard dataset of natural scenes by uploading and downloading from different social media platforms, namely, WhatsApp, Telegram, and Instagram. The effectiveness of our method is shown by testing on original and degraded standard datasets. For example, for the degraded images of different complexities including degradations caused by social media platforms, the proposed method performs well in almost all situations. In addition, the proposed model achieves the best F1-Score, 0.76, 0.77, 0.70, and 0.78 for the degraded images of CUTE, ICDAR 2013, Total-Text, and CTW1500, respectively, compared to the state-of-the-art methods.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143475451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Wang, Yixian Zhu, Sen Lu, Kaiming Yang, Yu Zhu
{"title":"Development of Overlay Target's Centre Positioning Algorithms Using Customizable Shape Fitting for High-Precision Wafer Bonding","authors":"Rui Wang, Yixian Zhu, Sen Lu, Kaiming Yang, Yu Zhu","doi":"10.1049/ipr2.70020","DOIUrl":"https://doi.org/10.1049/ipr2.70020","url":null,"abstract":"<p>Wafer bonding is a critical process in 3D integration, and overlay (OVL) metrology is essential for its success. Accurately positioning the centre of OVL targets is fundamental for effective metrology. However, the identification and localization of target centres become challenging due to complex shapes and unexpected features, such as rounded corners, that can arise during manufacturing. An algorithm is proposed to tackle this challenge by employing customizable shape fitting. This method begins with the extraction of sub-pixel edge points, followed by applying a Hough transform to group and smooth these points, thereby enhancing contour quality. By parameterizing the target shape based on specific points, the algorithm integrates sub-pixel traversal techniques with an optimization objective, achieving sub-pixel accuracy in centre positioning. Simulation results indicate that the algorithm can achieve a positioning accuracy of ±0.03 pixels and demonstrates robustness against noise and blur. Finally, the proposed algorithm was used to test the OVL target pair arrays fabricated by electron beam etching, confirming an accuracy of ±0.04 pixels (±6.9 nm). These results validate the algorithm's capability to meet high precision requirements for OVL target centre positioning in wafer applications.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70020","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143475452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RBS-YOLO: A Lightweight YOLOv5-Based Surface Defect Detection Model for Castings","authors":"KeZhu Wu, ShaoMing Sun, YiNing Sun, CunYi Wang, YiFan Wei","doi":"10.1049/ipr2.70018","DOIUrl":"https://doi.org/10.1049/ipr2.70018","url":null,"abstract":"<p>To ensure precise and rapid identification of casting surface defects and to support the subsequent realisation of high-precision grinding, this study introduces a method for detecting casting surface defects using a lightweight YOLOv5 framework. The enhanced model integrates the ShuffleNetV2 high-efficiency CNN architecture into the YOLOv5 foundation, substantially reducing network parameters to achieve a lightweight model. Additionally, the Convolutional Block Attention Module (CBAM) attention mechanism is incorporated to enhance the model's capability to detect defects. The ReLU activation function replaces the SiLU function in the convolutional layer, decreasing the computational load and boosting efficiency. Subsequently, the optimised model is quantised and implemented on the RV1126 embedded development board, successfully performing image inference. To validate the effectiveness of the proposed method, a dataset of casting surface defects was designed and constructed. The optimised model has a file size of 7.6 MB, representing 55.4% of the original model, with about 50.6% of the original model's parameters. The onboard inference speed of the improved model is 50 ms per image, which is 9.1% faster than the traditional YOLOv5 model. These results offer valuable insights for future casting surface defect detection technologies.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143475454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}