IET Image Processing最新文献

筛选
英文 中文
RSINS-GS: Reconstruction From Single Image With Noise-Added Strategy and 3D-GS rsin - gs:单幅图像加噪重建与3D-GS
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-05-12 DOI: 10.1049/ipr2.70082
Shengyi Qian, Lan Cheng, Pengyue Li, Xinying Xu
{"title":"RSINS-GS: Reconstruction From Single Image With Noise-Added Strategy and 3D-GS","authors":"Shengyi Qian,&nbsp;Lan Cheng,&nbsp;Pengyue Li,&nbsp;Xinying Xu","doi":"10.1049/ipr2.70082","DOIUrl":"https://doi.org/10.1049/ipr2.70082","url":null,"abstract":"<p>Multi-input reconstruction methods such as 3D-GS and NeRF excel in fidelity, yet they impose stringent requirements on the sequentiality of the input images. In contrast, single-view reconstruction methods are designed to extract certain features of the image even under limited input conditions. However, the majority of current single-view methods demand considerable graphics card performance for rendering at high resolutions and attaining high-fidelity image reconstruction at lower resolutions remains a formidable challenge. To enhance the fidelity of reconstructed images while considering the constraints of graphics card performances, we propose a novel pipeline based on novel-view synthetic (NVS), super-resolution (SR) and 3D-GS, named RSINS-GS. First, we introduce a divide-and-conquer strategy tailored to reap pixel-reinforced novel sequential views to render the reconstruction result without overburdening the graphics card, maintaining optimal performance and visual fidelity. Furthermore, to enhance the high fidelity of reconstructed images both in terms of qualitative and quantitative measures, we integrate 2D prior images with their corresponding geometric structural complements. Additionally, we introduce an innovative, generalised noise-added strategy to refine the overall reconstruction process. Extensive experimental evaluations on Nerf_synthetic datasets and Google scanned datasets show that our method achieves high quality results.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70082","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143939205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Image Decomposition With Large Separable Kernel Attention in Generative Adversarial Networks 生成对抗网络中大可分离核注意增强图像分解
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-05-12 DOI: 10.1049/ipr2.70089
Mingzhan Zhao, Ziyun Su, Xiaoyi Du
{"title":"Enhancing Image Decomposition With Large Separable Kernel Attention in Generative Adversarial Networks","authors":"Mingzhan Zhao,&nbsp;Ziyun Su,&nbsp;Xiaoyi Du","doi":"10.1049/ipr2.70089","DOIUrl":"https://doi.org/10.1049/ipr2.70089","url":null,"abstract":"<p>The original generative adversarial network (GAN) model may struggle to adequately capture global information in images, particularly during complex decomposition tasks, leading to limitations in image clarity, detail retention and overall consistency. To address this challenge, we propose the large separable kernel attention generative adversarial network (LSKA-GAN) model, building upon the blind image decomposition network (BIDeN). The LSKA module enhances BIDeN's global information capturing capability, thereby improving the quality and clarity of generated images. Experimental results demonstrate that LSKA-GAN achieves obvious improvements in hybrid image decomposition. Compared to BIDeN, LSKA-GAN exhibits an increase of 1.39 dB in peak signal-to-noise ratio (PSNR) and 0.04 in structural similarity index (SSIM). These improvements enable LSKA-GAN to generate clearer images with more complete details, marking a notable advancement in image decomposition technology.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70089","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143939206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Unsupervised Image Enhancement Method Based on Adaptation Region Divisions 基于自适应区域划分的无监督图像增强方法
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-05-08 DOI: 10.1049/ipr2.70043
Kaijun Zhou, Weiyi Yuan, Yemei Qin
{"title":"An Unsupervised Image Enhancement Method Based on Adaptation Region Divisions","authors":"Kaijun Zhou,&nbsp;Weiyi Yuan,&nbsp;Yemei Qin","doi":"10.1049/ipr2.70043","DOIUrl":"https://doi.org/10.1049/ipr2.70043","url":null,"abstract":"<p>This paper presents a novel image enhancement method that integrates traditional image processing techniques with deep learning frameworks. Initially, images are transformed from the red, green and blue (RGB) color space to the Lab color space, and the luminance component (L) is extracted to quantify texture. Subsequently, texture complexity is assessed using features derived from the gray-level co-occurrence matrix (GLCM), including contrast, correlation, homogeneity, and energy. These features are weighted to compute an overall texture complexity score, which facilitates the segmentation of the image into distinct regions. Regions characterized by simple textures are aggregated into larger segments, whereas regions with complex textures are subdivided into smaller segments. Following segmentation, histogram equalization is applied along with noise reduction and image enhancement via a convolutional autoencoder model. The model extracts relevant features and reduces dimensionality in the encoder phase, and reconstructs the image through the decoder. This methodology effectively preserves semantic information and enhances image clarity. Some experiments are conducted using the ExDark dataset, which comprises twelve categories, and the enhancement results are quantitatively evaluated using image quality metrics such as peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), learned perceptual image patch similarity (LPIPS), and neural image quality evaluator (NIQE). Experimental results demonstrate that the proposed method significantly surpasses existing enhancement techniques in terms of image quality and visual perception, thereby affirming its efficacy in improving the visual quality and detail of low-light images. The implementation code will be made publicly available at: https://github.com/Winnie0320/Image-Enhancement-Method.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143925961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Point Cloud Registration Based on Multiple Neighborhood Feature Difference 基于多邻域特征差的点云配准
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-05-05 DOI: 10.1049/ipr2.70097
Haixia Wang, Teng Wang, Zhiguo Zhang, Xiao Lu, Qiaoqiao Sun, Shibin Song, Jun Nie
{"title":"Point Cloud Registration Based on Multiple Neighborhood Feature Difference","authors":"Haixia Wang,&nbsp;Teng Wang,&nbsp;Zhiguo Zhang,&nbsp;Xiao Lu,&nbsp;Qiaoqiao Sun,&nbsp;Shibin Song,&nbsp;Jun Nie","doi":"10.1049/ipr2.70097","DOIUrl":"https://doi.org/10.1049/ipr2.70097","url":null,"abstract":"<p>Dense point cloud registration is a critical problem in computer vision and 3D reconstruction, with widespread applications in scenarios such as robotic navigation, autonomous driving, and 3D measurement. However, dense point cloud registration faces significant challenges, including high computational complexity and prolonged processing times. To address these issues, this paper proposes a point cloud registration method based on multiple neighborhood feature difference (MNFD) that employs a coarse-to-fine strategy to effectively enhance both registration efficiency and accuracy. The proposed method consists of two stages: coarse registration and fine registration. In the coarse registration stage, a novel feature point extraction approach based on MNFD is introduced, capable of identifying highly stable and distinctive feature points in the point cloud. These feature points are then utilized in combination with the fast point feature histogram (FPFH) algorithm to achieve an initial alignment between the target and template point clouds. In the fine registration stage, the results from the coarse alignment are refined using algorithms such as iterative closest point (ICP) to ensure both efficiency and precision during the registration process. Experiments conducted on publicly available datasets demonstrate the superiority of the proposed method compared to existing approaches.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70097","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143905055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Realistic Object Reconstruction Under Different Depths Through Light Field Imaging for Virtual Reality 基于光场成像的虚拟现实中不同深度的逼真物体重建
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-05-04 DOI: 10.1049/ipr2.70099
Ali Khan, Md. Moinul Hossain, Alexandra Covaci, Konstantinos Sirlantzis, Qi Qi
{"title":"Realistic Object Reconstruction Under Different Depths Through Light Field Imaging for Virtual Reality","authors":"Ali Khan,&nbsp;Md. Moinul Hossain,&nbsp;Alexandra Covaci,&nbsp;Konstantinos Sirlantzis,&nbsp;Qi Qi","doi":"10.1049/ipr2.70099","DOIUrl":"https://doi.org/10.1049/ipr2.70099","url":null,"abstract":"<p>Virtual reality (VR) immerses users in digital environments and is used in various applications. VR content is created using either computer-generated or conventional imaging. However, conventional imaging captures only 2D spatial information, which limits the realism of VR content. Advanced technologies like light field (LF) imaging can overcome this limitation by capturing both 2D spatial and 2D angular information in 4D LF images. This paper proposes a depth reconstruction model through LF imaging to aid in creating realistic VR content. Comprehensive calibrations are performed, including adjustments for camera parameters, depth calibration, and field of view (FOV) estimation. Aberration corrections, like distortion and vignetting effect correction, are conducted to enhance the quality of the reconstruction. To achieve realistic scene reconstruction, experiments were conducted by setting up a scenario with multiple objects positioned at three different depths. Quality assessments were carried out to evaluate the reconstruction quality across these varying depths. The results demonstrate that depth reconstruction quality improves with the proposed method. It also indicates that the model reduces LF image size and processing time. The depth images reconstructed by the proposed model have the potential to generate realistic VR content and can also facilitate the integration of refocusing capabilities within VR environments.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70099","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143905141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMD-DAAN: A Wasserstein Distance-Based Dynamic Adversarial Domain Adaptation Network Model for Breast Ultrasound Image Classification EMD-DAAN:一种基于Wasserstein距离的乳腺超声图像分类动态对抗域自适应网络模型
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-05-04 DOI: 10.1049/ipr2.70096
Ying Wu, Hao Huang, Bo Xu
{"title":"EMD-DAAN: A Wasserstein Distance-Based Dynamic Adversarial Domain Adaptation Network Model for Breast Ultrasound Image Classification","authors":"Ying Wu,&nbsp;Hao Huang,&nbsp;Bo Xu","doi":"10.1049/ipr2.70096","DOIUrl":"https://doi.org/10.1049/ipr2.70096","url":null,"abstract":"<p>Breast cancer is commonly diagnosed through ultrasound imaging as a primary method in clinical practice. However, the lack of large annotated datasets for breast ultrasound images, along with issues such as inconsistent edge and conditional distributions across different datasets, poses significant challenges to both manual and AI-assisted diagnosis. To address these issues, this paper proposes a dynamic adversarial domain adaptation model based on Wasserstein distance (EMD_DAAN). The EMD_DAAN model enhances the existing dynamic adversarial domain adaptation framework by incorporating an adaptive layer, further aligning the feature distributions between the source and target domain datasets. The Wasserstein distance is employed to optimize this adaptive layer, minimizing the distributional discrepancy between the feature spaces of the two domains by constructing the least-cost transport path. This approach improves the model's cross-domain generalization ability and robustness to noise interference. Through dual feature alignment via the adaptive layer and adversarial learning, the model's classification performance on breast ultrasound images is significantly enhanced. Experimental results demonstrate that the EMD_DAAN model achieves an accuracy of 82.75% on breast ultrasound images, substantially outperforming typical adversarial domain adaptation models such as DAAN in terms of classification performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70096","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143905075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bidirectionally Guided Multi-Scale Feature Decoding Network for High-Resolution Salient Object Detection 用于高分辨率显著目标检测的双向引导多尺度特征解码网络
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-05-02 DOI: 10.1049/ipr2.70094
Jiangping Tang, Shuyao Guo, Xiaofei Zhou, Liuxin Bao, Jiyong Zhang, Tong Qiao
{"title":"Bidirectionally Guided Multi-Scale Feature Decoding Network for High-Resolution Salient Object Detection","authors":"Jiangping Tang,&nbsp;Shuyao Guo,&nbsp;Xiaofei Zhou,&nbsp;Liuxin Bao,&nbsp;Jiyong Zhang,&nbsp;Tong Qiao","doi":"10.1049/ipr2.70094","DOIUrl":"https://doi.org/10.1049/ipr2.70094","url":null,"abstract":"<p>With the advancement of modern camera technology, the resolution and quality of images have been significantly improved. High-resolution images can provide more detailed and clearer information, but they may also introduce more noise, which interferes with the detection and localization of salient objects. To address this issue, existing high-resolution salient object detection methods either design complex network structures or adopt multi-modal fusion. However, these approaches often consume significant computing and storage resources. This leads to redundancy of irrelevant features and loss of critical details. In this paper, we propose a network called bidirectionally guided multi-scale feature decoding network for high-resolution salient object detection. The model incorporates a bidirectional guidance method to explore the complementarity between encoding and decoding features, thereby achieving a comprehensive combination and enhancement of features. Additionally, in the decoder, multi-scale encoding features are obtained and utilized sequentially to enhance feature learning and improve the accuracy of salient object detection. Specifically, our model consists of an encoder, a guided multi-scale feature enhancement (GMFE) module, a guided feature fusion (GFF) module, and a multi-scale feature decoder (MFD) module. First, multi-scale encoding features are extracted through the encoder. These features are then fed into the GMFE module to enhance the multi-scale encoding features under the guidance of saliency map derived from the decoding features of the previous layer. Subsequently, in the GFF module, the enhanced encoding features are fused with the decoding features from the previous layer. Finally, in the MFD module, the bidirectionally guided multi-scale encoding features is integrated to generate an accurate saliency map. Experiments on two high-resolution and two low-resolution datasets demonstrate that our model outperforms on high-resolution datasets while maintaining competitive performance on low-resolution datasets, underscoring its effectiveness across varying image qualities.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70094","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143901001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater Image Quality Evaluation: A Comprehensive Review 水下图像质量评价:综合综述
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-05-01 DOI: 10.1049/ipr2.70068
Mengjiao Shen, Miao Yang, Jinyang Zhong, Hantao Liu, Can Pan
{"title":"Underwater Image Quality Evaluation: A Comprehensive Review","authors":"Mengjiao Shen,&nbsp;Miao Yang,&nbsp;Jinyang Zhong,&nbsp;Hantao Liu,&nbsp;Can Pan","doi":"10.1049/ipr2.70068","DOIUrl":"https://doi.org/10.1049/ipr2.70068","url":null,"abstract":"<p>Underwater image quality evaluation (UIQE) is crucial in improving image processing techniques and optimizing the design of the imaging system to obtain object information more accurately. However, existing UIQE methods are designed based on limited images or consider only a few natural scene statistics (NSS) metrics, lacking consideration for generalization across various underwater imaging applications. In this paper, an in-depth review of the existing UIQE methods based on evaluation operations is provided, emphasizing the bias present when evaluating UIQE methods using individual metrics. To address this, a novel metric called quadrilateral datum evaluation (QDE) is designed for UIQE methods. It comprehensively considers robustness across different datasets, as well as correlation and ranking consistency with mean opinion scores (MOS). This is the first solution to measure an UIQE method from an all-encompassing visual perspective. By using QDE, UIQE methods characterized by greater feature strength and small imbalance demonstrate good consistency and robustness across multiple aspects, providing a basis for the design of UIQE methods.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70068","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143892872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Patch-Based Multi-Task Multi-Scale Reborn Network for Global Gaze Following in 360-Degree Images 基于patch的360度图像全局注视跟踪多任务多尺度再生网络
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-04-30 DOI: 10.1049/ipr2.70078
Jingzhao Dai, Yang Li, Sidan Du
{"title":"The Patch-Based Multi-Task Multi-Scale Reborn Network for Global Gaze Following in 360-Degree Images","authors":"Jingzhao Dai,&nbsp;Yang Li,&nbsp;Sidan Du","doi":"10.1049/ipr2.70078","DOIUrl":"https://doi.org/10.1049/ipr2.70078","url":null,"abstract":"<p>In this paper, we propose a global gaze following method using the patched-based multi-task multi-scale reborn network (MMRGaze360) specifically designed for panorama images. Unlike existing approaches that rely on spherical networks or process only local regions, our architecture thoroughly accounts for the distortions introduced by the sphere-to-plane projection, enabling gaze following in comprehensive 360-degree images. MMRGaze360 incorporates field-of-view (360-FoV) and sight line (360-Gaze) generators to model gaze behaviours and scene information in 360-degree images. A multi-task multi-scale module is introduced to capture features from multiple patches centred around the estimated points located in the 360-Gaze, using multi-scale attention maps. These features, along with the 360-FoV, are fused to produce a final heatmap. Additionally, we employ multi-layer perceptions and convolutional networks using the reborn mechanism to enhance information usage and feature representation. Moreover, we establish a novel dataset, SRGaze360, which contains more conditions of the sphere-to-plane distortion. Experimental results on the GazeFollow360 and SRGaze360 datasets demonstrate the superiority of our method over previous works. It can be validated that our approach effectively addresses the limitations of 2D gaze following in handling out-of-frame gaze positions and distortions in 360-degree images.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70078","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aluminum Surface Defect Detection Method Based on DAS-YOLO Network 基于DAS-YOLO网络的铝表面缺陷检测方法
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-04-29 DOI: 10.1049/ipr2.70090
Jun Tie, Jiating Ma, Lu Zheng, Chengao Zhu, Mian Wu, HaiJiao Wang, ChongWei Ruan, Shuangyang Li
{"title":"Aluminum Surface Defect Detection Method Based on DAS-YOLO Network","authors":"Jun Tie,&nbsp;Jiating Ma,&nbsp;Lu Zheng,&nbsp;Chengao Zhu,&nbsp;Mian Wu,&nbsp;HaiJiao Wang,&nbsp;ChongWei Ruan,&nbsp;Shuangyang Li","doi":"10.1049/ipr2.70090","DOIUrl":"https://doi.org/10.1049/ipr2.70090","url":null,"abstract":"<p>To address the accuracy limitations of current methods in detecting aluminum surface defects, particularly those with small sizes and high variation, an aluminum surface defect detection algorithm named DAS-YOLO, based on an improved YOLOv8n, is proposed. The C2f module in YOLOv8's backbone is enhanced by incorporating DCNv2, which improves the model's ability to handle irregular shapes and geometric transformations during feature extraction. An auxiliary training head (Aux Head) is added to capture multi-scale and multi-level features, significantly boosting small defect detection. Additionally, the traditional CIoU loss function is replaced with the Wise-SIoU loss, accelerating convergence and enhancing both detection and regression accuracy. Experimental results on the Alibaba Tianchi aluminum surface defect dataset show that DAS-YOLO achieves a mean average precision (mAP) of 85.3%. Compared to YOLOv8n, mAP50 improves by 3%, while precision and recall increase by 1.1% and 4.6%, respectively. Furthermore, to validate the model's performance on small defects and its generalization ability, it achieves a detection accuracy of 94.8% on the PCB dataset, with an mAP increase of 3.1% compared to YOLOv8n. These results demonstrate that DAS-YOLO significantly enhances detection accuracy while maintaining speed and exhibits outstanding performance in small defect detection.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70090","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信