Digital Signal Processing最新文献

筛选
英文 中文
IPD-YOLO: Person detection in infrared images from UAV perspective based on improved YOLO11 IPD-YOLO:基于改进YOLO11的无人机视角红外图像人物检测
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-11 DOI: 10.1016/j.dsp.2025.105469
Mengyang Li, Nan Yan
{"title":"IPD-YOLO: Person detection in infrared images from UAV perspective based on improved YOLO11","authors":"Mengyang Li,&nbsp;Nan Yan","doi":"10.1016/j.dsp.2025.105469","DOIUrl":"10.1016/j.dsp.2025.105469","url":null,"abstract":"<div><div>The integration of UAV technology and deep learning object detection algorithms for human target detection has emerged as a prominent area in current research and application. However, practical implementation faces significant challenges under low-light conditions at night. To address this issue, this paper presents a solution based on an infrared image sensor mounted on a UAV. The proposed method employs IPD-YOLO, an improved deep learning object detection algorithm derived from YOLO11, to detect humans in drone-captured infrared images. First, the detection layer is reconfigured to better accommodate small target detection from aerial perspectives. Second, the MASRCNet feature extraction module is introduced to enhance the model's capability in extracting and fusing high- and low-dimensional features along with contextual information through a star-shaped operation structure and residual context anchors. Third, the LQEHead detection head is designed, incorporating a localization quality estimator to assess the quality of detection boxes and refine the classification branch. Finally, a novel NWD-Inner CIoU loss function is proposed, combining normalized Wasserstein distance with an inner auxiliary frame mechanism to improve the localization accuracy of small targets. Ablation experiments demonstrate that each improvement contributes effectively to overall performance: adjusting the detection layer increases mAP@50 by 4.6 percentage points and mAP@50:95 by 2.9 percentage points. Incorporating MASRCNet further improves mAP@50 by 0.6 percentage points and mAP@50:95 by 0.1 percentage points. With LQEHead, mAP@75 reaches 0.495 and mAP@50:95 increases to 0.496. The adoption of the NWD-Inner CIoU loss function boosts mAP@50 to 0.915, mAP@75 to 0.500, and mAP@50:95 to 0.501. Compared with mainstream YOLO variants such as YOLOv5n, YOLOv8n, YOLOv10n, and YOLO11n, IPD-YOLO achieves improvements of 4.7, 7.4, 6.3, and 6.4 percentage points respectively on mAP@50, and enhancements of 6.7, 5.3, 4.9, and 4.4 percentage points on mAP@50:95. Furthermore, IPD-YOLO outperforms advanced models including G-YOLO, LMANet, YOFIR, and YOLO-TSL, achieving average improvements of 3.5, 2.3, 2.8, and 3.7 percentage points on mAP@50, and 5.3, 2.1, 4.4, and 5.0 percentage points on mAP@50:95 respectively. Compared with RT-DETR, IPD-YOLO maintains high detection accuracy while significantly reducing model parameters and computational cost, thereby enhancing its feasibility for real-world deployment. These results comprehensively validate the superior performance of IPD-YOLO in human detection tasks using UAV-based infrared imagery.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105469"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reversible data hiding based on pixel value similarity ordering and ordered collection 基于像素值相似排序和有序收集的可逆数据隐藏
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-11 DOI: 10.1016/j.dsp.2025.105483
Hongjie He , Ningxiong Mao , Fan Chen , Yaolin Yang , Yuan Yuan
{"title":"Reversible data hiding based on pixel value similarity ordering and ordered collection","authors":"Hongjie He ,&nbsp;Ningxiong Mao ,&nbsp;Fan Chen ,&nbsp;Yaolin Yang ,&nbsp;Yuan Yuan","doi":"10.1016/j.dsp.2025.105483","DOIUrl":"10.1016/j.dsp.2025.105483","url":null,"abstract":"<div><div>In pixel value ordering (PVO)-based reversible data hiding (RDH), a smoother pixel sequence enhances embedding capacity and visual quality. Existing global PVO-based RDH methods use pixel complexity values for secondary ordering, which inaccurately reflect pixel value size, reducing sequence smoothness. This study proposes a pixel value similarity (PVS) ordering method to improve secondary pixel ordering. A value feature set is constructed for each pixel, and pixel value similarity is calculated to place pixels with the closest PVS adjacently. Additionally, a pixel ordered collection (POC) strategy organizes pixels in subsequences to increase expanded prediction errors, boosting embedding capacity. Experimental results demonstrate that PVS ordering yields smoother pixel sequences, with lower standard deviation (SD) and sum of absolute differences (SAD) compared to complexity-based methods. The proposed PVS and POC strategies enhance marked image quality, on the Kodak image dataset achieving an average peak signal-to-noise ratio (PSNR) of 61.38 dB with 20,000 bits embedding capacity.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105483"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144634339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced dynamic temporal feature extraction with static expression insights for dynamic facial expression recognition 基于静态表情洞察的增强动态时间特征提取用于动态面部表情识别
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-11 DOI: 10.1016/j.dsp.2025.105470
Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu
{"title":"Enhanced dynamic temporal feature extraction with static expression insights for dynamic facial expression recognition","authors":"Tingting Han,&nbsp;Shuwei Dou,&nbsp;Wenxia Zhang,&nbsp;Ruqian Liu","doi":"10.1016/j.dsp.2025.105470","DOIUrl":"10.1016/j.dsp.2025.105470","url":null,"abstract":"<div><div>Dynamic Facial Expression Recognition (DFER) is a critical task in the field of computer vision, which involves the recognition and analysis of changes in facial expressions from video sequences. The extraction of temporal features of facial emotions in videos is one of the main challenges facing DFER. This paper proposes a model named RTT, based on IR50-Transformer-TFEM to enhanced dynamic temporal feature extraction with static expression insights for DFER. Specifically, the IR50 in RTT focuses on extracting static facial features from each frame of the video, while the Transformer works in conjunction with our Time Feature Enhancement Module (TFEM) to extract temporal features from the video sequence. TFEM is built after the Transformer, aiming to explore deeper temporal information. TFEM consists mainly of two important components: Feature Mapping Network (FMN) and Temporal Dependency Network (TDN). FMN enhances temporal information through feature interaction and feature weighting, while TDN encodes temporal dependencies in sequences to improve sensitivity to complex dynamic expressions. Finally, a feature representation with both facial emotional and temporal features is formed for DFER. We present promising results that surpass current state-of-the-art (SOTA) techniques on two widely recognized DFER benchmark datasets, DFEW and FERV39K. In the DFEW data set, it achieves 71.24% for unweighted average recall (UAR) and 86.81% for weighted average recall (WAR). In the FERV39K dataset, it reaches 48.59% for UAR and 60.42% for WAR. These experimental results indicate that our approach outperforms existing SOTA methods in the DFER task, suggesting the potential effectiveness of the RTT model.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105470"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random sampling analysis in the linear canonical transform domain 线性正则变换域的随机抽样分析
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-09 DOI: 10.1016/j.dsp.2025.105453
Yina Zhang , Feng Zhang
{"title":"Random sampling analysis in the linear canonical transform domain","authors":"Yina Zhang ,&nbsp;Feng Zhang","doi":"10.1016/j.dsp.2025.105453","DOIUrl":"10.1016/j.dsp.2025.105453","url":null,"abstract":"<div><div>Random sampling represents a specific class of nonuniform sampling that serves as an effective alias-free signal acquisition technique in analog-to-digital conversion systems. In this paper, we first propose the linear canonical spectrum estimators of deterministic signals which are derived from two simple random sampling methods. The proposed spectrum estimators are proven to be unbiased. Then we derive their variances to compare the accuracy of the estimators. We further analyze how sampling jitters and observation errors affect the performance of the linear canonical spectrum estimators. The sampling jitters cause bias in the estimators, which can be effectively compensated using our newly defined linear canonical characteristic function. Furthermore, we analyze the linear canonical spectrum of two types of stratified randomly sampled signals. All analytical results are validated through numerical simulations using the chirp signals.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105453"},"PeriodicalIF":2.9,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAMMD-Net: A lightweight and enhanced deep segmentation network for skin lesion detection DAMMD-Net:一种用于皮肤病变检测的轻量级增强深度分割网络
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-09 DOI: 10.1016/j.dsp.2025.105477
Hasan Polat
{"title":"DAMMD-Net: A lightweight and enhanced deep segmentation network for skin lesion detection","authors":"Hasan Polat","doi":"10.1016/j.dsp.2025.105477","DOIUrl":"10.1016/j.dsp.2025.105477","url":null,"abstract":"<div><div>Early and accurate diagnosis of skin cancer is critical to improving survival rates. Dermoscopy is one of the most important imaging techniques for this purpose. However, manual examination of dermoscopic images is laborious, time-consuming, and error-prone due to variations in the color, shape, location, texture, and size of skin lesions. Therefore, developing automatic segmentation models is crucial for assisting physicians in both qualitative and quantitative assessments. Although numerous deep learning-based segmentation models have produced satisfactory results in skin lesion detection, their backbone architectures still face intrinsic limitations and extrinsic challenges. In light of this motivation, this paper proposes a lightweight and enhanced segmentation network (DAMMD-Net) based on the DeepLabV3+ model, with an attention mechanism (AAC) and modified decoder to improve segmentation performance. The AAC is used as a local feature enhancement tool to address the interference of useless information related to healthy skin. The modified decoder module enhances the network's ability to capture spatial details and integrate contextual information by leveraging multi-level feature maps from the encoder. The proposed segmentation pipeline has been evaluated on two well-known benchmark datasets: ISIC2018 and PH2. The experimental results showed that DAMMD-Net achieved an average Dice similarity coefficient (DSC) of 0.887 for the ISIC2018 dataset and 0.929 for the PH2 dataset, outperforming the backbone network. The overall results revealed that the proposed DAMMD-Net not only achieved satisfactory performance compared to existing models but also demonstrated significant potential for clinical practice due to its lightweight architecture.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105477"},"PeriodicalIF":2.9,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144614545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-level composite attention-guided network for indoor visual localization 室内视觉定位多层次复合注意引导网络
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-08 DOI: 10.1016/j.dsp.2025.105458
Xiaogang Song , Hailong Yang , Junjie Tang , Xiaochang Li , Xiaofeng Lu , Xinhong Hei
{"title":"A multi-level composite attention-guided network for indoor visual localization","authors":"Xiaogang Song ,&nbsp;Hailong Yang ,&nbsp;Junjie Tang ,&nbsp;Xiaochang Li ,&nbsp;Xiaofeng Lu ,&nbsp;Xinhong Hei","doi":"10.1016/j.dsp.2025.105458","DOIUrl":"10.1016/j.dsp.2025.105458","url":null,"abstract":"<div><div>Accurate and robust camera pose estimation is essential for autonomous navigation and path planning in unmanned systems. To improve the localization accuracy in complex indoor scenes and mitigate information loss during feature extraction, we propose a multi-level composite attention-guided scene coordinate regression method. The proposed model predicts the mapping between 2D pixel points and 3D scene coordinates from a single RGB image. First, we introduce a Multi-level Feature Fusion Module (MFF), which employs global pooling and parallel branches to consolidate multi-level features, enhancing discrimination in repetitive structures and low-texture regions. Next, we design an Embedded Attention Module (EAM) to dynamically fuse multi-level features through parallel channel and spatial attention mechanisms, preserving edge details and suppressing noise. Finally, a differentiable random sample consensus algorithm is used to achieve robust fitting of pose parameters. Evaluation and analysis on common indoor public datasets demonstrate that the proposed method significantly improves localization performance. Additionally, extensive ablation evaluations confirm the effectiveness of the proposed Embedded Attention Module and Multi-level Feature Fusion Module in enhancing localization accuracy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105458"},"PeriodicalIF":2.9,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification bias and regression bias adjustment for long-tailed traffic sign detection and recognition 长尾交通标志检测与识别的分类偏差与回归偏差调整
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-08 DOI: 10.1016/j.dsp.2025.105467
Jiajie Li, Weiguo Huang, Guifu Du, Qiaoyue Li
{"title":"Classification bias and regression bias adjustment for long-tailed traffic sign detection and recognition","authors":"Jiajie Li,&nbsp;Weiguo Huang,&nbsp;Guifu Du,&nbsp;Qiaoyue Li","doi":"10.1016/j.dsp.2025.105467","DOIUrl":"10.1016/j.dsp.2025.105467","url":null,"abstract":"<div><div>Traffic sign detection and recognition (TSDR), as a pivotal technology in Intelligent Transportation System, has attracted growing focus and widespread application in recent times. However, in practical applications, the complexity and variability of road conditions lead to a pronounced long-tailed distribution in traffic signs, i.e., a few classes account for a large proportion of instances while most classes contain only a few instances. This long-tailed distribution leads to significant bias during training. In this paper, we identify that such bias issues exist not only in the classification branch but also in the regression branch. Therefore, we first propose the Classification Bias Adjustment (CA) module to address classification bias. This module combines margin adjustment and gradient adjustment strategies based on the mean classification scores to alleviate classification bias. Meanwhile, we propose the Regression Bias Adjustment (RA) module to address regression bias. This module re-weights the regression loss for each class in accordance with the mean Intersection over Union (IoU) to alleviate regression bias. Through comprehensive experiments on the TT100K and GTSDB datasets, it has been validated that our proposed approach has greater effectiveness than the existing state-of-the-art methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105467"},"PeriodicalIF":2.9,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144654518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating invisible adversarial watermarks based on block-matching embedding algorithm 基于块匹配嵌入算法的不可见对抗水印生成
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-08 DOI: 10.1016/j.dsp.2025.105476
Chenxiao Wang , Zihao Zeng , Xiaoyue Hu , Yong Chen
{"title":"Generating invisible adversarial watermarks based on block-matching embedding algorithm","authors":"Chenxiao Wang ,&nbsp;Zihao Zeng ,&nbsp;Xiaoyue Hu ,&nbsp;Yong Chen","doi":"10.1016/j.dsp.2025.105476","DOIUrl":"10.1016/j.dsp.2025.105476","url":null,"abstract":"<div><div>Adversarial attack methods against Deep Neural Network (DNN) models have received extensive attention and research. Adversarial attack methods mean adding subtle perturbations to the original image to mislead the recognition ability of the DNN model. How to improve the adversarial attack performance and protect the visual effect of the perturbation image is still the main challenge in this field. Based on an image block-matching embedding algorithm, this paper proposes a novel adversarial method of embedding invisible watermarks for generating adversarial examples for deceptive DNN models. Firstly, utilizing up-sampling techniques to increase the embedding capacity of the original image while ensuring the visual quality of the watermark image. Secondly, the watermark image is embedded into the original image in a chunked manner. The cosine similarity is utilized for block-matching and combined with invertible color transformation to embed the invisible watermark. Finally, the Simple Black-box Adversarial Attack(SimBA) is used to add adversarial perturbation to the watermark image to generate the invisible adversarial watermark. The inverse operation of this method ensures the reconstruction of the original watermark information. The experimental results show that the proposed method achieves an average attack success rate of 98.33% in different neural network models (VGG19, resnet101, SqueezeNet, ShuffleNet, ConvNext, and MaxViT), with an attack success rate of 99.05% in the ShuffleNet model, demonstrating the superiority of the proposed method over existing techniques. In addition, the generated invisible adversarial watermark performs well in terms of visual effects and robustness, providing additional concealment and effectively reducing the risk of detecting adversarial attacks.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105476"},"PeriodicalIF":2.9,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144611739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient spatial-temporal feature aggregation for multivariate time series forecasting with STCA 基于STCA的多变量时间序列预测的高效时空特征聚合
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-07 DOI: 10.1016/j.dsp.2025.105460
LiGuo Deng , WenDan Sha
{"title":"Efficient spatial-temporal feature aggregation for multivariate time series forecasting with STCA","authors":"LiGuo Deng ,&nbsp;WenDan Sha","doi":"10.1016/j.dsp.2025.105460","DOIUrl":"10.1016/j.dsp.2025.105460","url":null,"abstract":"<div><div>Multivariate time series (MTS) prediction plays a crucial role in many practical applications. Although spatio-temporal graph neural networks (STGNNs) have demonstrated excellent performance in MTS prediction due to the advantages of graph convolutional networks and time series modeling, their high computational complexity limits their applicability in resource constrained environments. To improve prediction accuracy while maintaining model simplicity and computational efficiency, inspired by Spatial-temporal identity (STID), this paper introduces a novel MTS prediction framework—Spatial-Temporal Channel Aggregation (STCA). This framework consists of two modules: the Channel Point Aggregation Fusion module (CPAF) enhances the capture of local spatial information and efficiently models temporal dependencies through depthwise separable convolutions and pointwise convolutions. the Selective Attention(SelAttn) module employs a self-attention mechanism to uncover complex dependencies among features. Experimental results show that STCA outperforms existing methods on multiple benchmark datasets, achieving higher prediction accuracy while significantly reducing training time.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105460"},"PeriodicalIF":2.9,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved DeepLabV3+ network-based deep learning segmentation method for thermal image water-shorelines 基于DeepLabV3+网络的热图像水岸线深度学习分割方法
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2025-07-06 DOI: 10.1016/j.dsp.2025.105461
Jiaxin Wang, Xinxu Liu, Jianxu Wang, Ming Yang
{"title":"An improved DeepLabV3+ network-based deep learning segmentation method for thermal image water-shorelines","authors":"Jiaxin Wang,&nbsp;Xinxu Liu,&nbsp;Jianxu Wang,&nbsp;Ming Yang","doi":"10.1016/j.dsp.2025.105461","DOIUrl":"10.1016/j.dsp.2025.105461","url":null,"abstract":"<div><div>The water-shorelines segmentation of thermal image is essential to the visual perception technologies and applications of unmanned surface craft. However, the traditional semantic segmentation algorithms have the problems of limited accuracy and low efficiency, which significantly restricts the segmentation performance. Although the segmentation accuracy of convolutional neural network (CNN) is greatly improved compared with these segmentation algorithms, the effect of same model for different regions is obviously different due to the uneven distribution of water-shoreline scene categories in different regions. Therefore, this study proposes an improved DeepLabV3+ network-based segmentation method for the water-shorelines by adding a SE channel attention mechanism and replacing its original backbone network. To validate the performance of the proposed method, an appropriate data set and several assessment indexes were also established. The experiments compared with several conventional algorithms shown that the obstacle interaction degree and mIoU of the proposed method can highly reach to 72.03 % and 90.17 %, which improved 4.81 % and 1.55 % compared with the DeepLabV3+ network model. Even for the limited sample images, it can also more accurate segmentation for small obstacles, and clearer extract for the water-shoreline feature information.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105461"},"PeriodicalIF":2.9,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信