Digital Signal Processing最新文献_第3页

IPD-YOLO: Person detection in infrared images from UAV perspective based on improved YOLO11 IPD-YOLO：基于改进YOLO11的无人机视角红外图像人物检测

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-11 DOI: 10.1016/j.dsp.2025.105469

Mengyang Li, Nan Yan

{"title":"IPD-YOLO: Person detection in infrared images from UAV perspective based on improved YOLO11","authors":"Mengyang Li, Nan Yan","doi":"10.1016/j.dsp.2025.105469","DOIUrl":"10.1016/j.dsp.2025.105469","url":null,"abstract":"<div><div>The integration of UAV technology and deep learning object detection algorithms for human target detection has emerged as a prominent area in current research and application. However, practical implementation faces significant challenges under low-light conditions at night. To address this issue, this paper presents a solution based on an infrared image sensor mounted on a UAV. The proposed method employs IPD-YOLO, an improved deep learning object detection algorithm derived from YOLO11, to detect humans in drone-captured infrared images. First, the detection layer is reconfigured to better accommodate small target detection from aerial perspectives. Second, the MASRCNet feature extraction module is introduced to enhance the model's capability in extracting and fusing high- and low-dimensional features along with contextual information through a star-shaped operation structure and residual context anchors. Third, the LQEHead detection head is designed, incorporating a localization quality estimator to assess the quality of detection boxes and refine the classification branch. Finally, a novel NWD-Inner CIoU loss function is proposed, combining normalized Wasserstein distance with an inner auxiliary frame mechanism to improve the localization accuracy of small targets. Ablation experiments demonstrate that each improvement contributes effectively to overall performance: adjusting the detection layer increases mAP@50 by 4.6 percentage points and mAP@50:95 by 2.9 percentage points. Incorporating MASRCNet further improves mAP@50 by 0.6 percentage points and mAP@50:95 by 0.1 percentage points. With LQEHead, mAP@75 reaches 0.495 and mAP@50:95 increases to 0.496. The adoption of the NWD-Inner CIoU loss function boosts mAP@50 to 0.915, mAP@75 to 0.500, and mAP@50:95 to 0.501. Compared with mainstream YOLO variants such as YOLOv5n, YOLOv8n, YOLOv10n, and YOLO11n, IPD-YOLO achieves improvements of 4.7, 7.4, 6.3, and 6.4 percentage points respectively on mAP@50, and enhancements of 6.7, 5.3, 4.9, and 4.4 percentage points on mAP@50:95. Furthermore, IPD-YOLO outperforms advanced models including G-YOLO, LMANet, YOFIR, and YOLO-TSL, achieving average improvements of 3.5, 2.3, 2.8, and 3.7 percentage points on mAP@50, and 5.3, 2.1, 4.4, and 5.0 percentage points on mAP@50:95 respectively. Compared with RT-DETR, IPD-YOLO maintains high detection accuracy while significantly reducing model parameters and computational cost, thereby enhancing its feasibility for real-world deployment. These results comprehensively validate the superior performance of IPD-YOLO in human detection tasks using UAV-based infrared imagery.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105469"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reversible data hiding based on pixel value similarity ordering and ordered collection 基于像素值相似排序和有序收集的可逆数据隐藏

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-11 DOI: 10.1016/j.dsp.2025.105483

Hongjie He , Ningxiong Mao , Fan Chen , Yaolin Yang , Yuan Yuan

{"title":"Reversible data hiding based on pixel value similarity ordering and ordered collection","authors":"Hongjie He , Ningxiong Mao , Fan Chen , Yaolin Yang , Yuan Yuan","doi":"10.1016/j.dsp.2025.105483","DOIUrl":"10.1016/j.dsp.2025.105483","url":null,"abstract":"<div><div>In pixel value ordering (PVO)-based reversible data hiding (RDH), a smoother pixel sequence enhances embedding capacity and visual quality. Existing global PVO-based RDH methods use pixel complexity values for secondary ordering, which inaccurately reflect pixel value size, reducing sequence smoothness. This study proposes a pixel value similarity (PVS) ordering method to improve secondary pixel ordering. A value feature set is constructed for each pixel, and pixel value similarity is calculated to place pixels with the closest PVS adjacently. Additionally, a pixel ordered collection (POC) strategy organizes pixels in subsequences to increase expanded prediction errors, boosting embedding capacity. Experimental results demonstrate that PVS ordering yields smoother pixel sequences, with lower standard deviation (SD) and sum of absolute differences (SAD) compared to complexity-based methods. The proposed PVS and POC strategies enhance marked image quality, on the Kodak image dataset achieving an average peak signal-to-noise ratio (PSNR) of 61.38 dB with 20,000 bits embedding capacity.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105483"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144634339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced dynamic temporal feature extraction with static expression insights for dynamic facial expression recognition 基于静态表情洞察的增强动态时间特征提取用于动态面部表情识别

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-11 DOI: 10.1016/j.dsp.2025.105470

Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu

{"title":"Enhanced dynamic temporal feature extraction with static expression insights for dynamic facial expression recognition","authors":"Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu","doi":"10.1016/j.dsp.2025.105470","DOIUrl":"10.1016/j.dsp.2025.105470","url":null,"abstract":"<div><div>Dynamic Facial Expression Recognition (DFER) is a critical task in the field of computer vision, which involves the recognition and analysis of changes in facial expressions from video sequences. The extraction of temporal features of facial emotions in videos is one of the main challenges facing DFER. This paper proposes a model named RTT, based on IR50-Transformer-TFEM to enhanced dynamic temporal feature extraction with static expression insights for DFER. Specifically, the IR50 in RTT focuses on extracting static facial features from each frame of the video, while the Transformer works in conjunction with our Time Feature Enhancement Module (TFEM) to extract temporal features from the video sequence. TFEM is built after the Transformer, aiming to explore deeper temporal information. TFEM consists mainly of two important components: Feature Mapping Network (FMN) and Temporal Dependency Network (TDN). FMN enhances temporal information through feature interaction and feature weighting, while TDN encodes temporal dependencies in sequences to improve sensitivity to complex dynamic expressions. Finally, a feature representation with both facial emotional and temporal features is formed for DFER. We present promising results that surpass current state-of-the-art (SOTA) techniques on two widely recognized DFER benchmark datasets, DFEW and FERV39K. In the DFEW data set, it achieves 71.24% for unweighted average recall (UAR) and 86.81% for weighted average recall (WAR). In the FERV39K dataset, it reaches 48.59% for UAR and 60.42% for WAR. These experimental results indicate that our approach outperforms existing SOTA methods in the DFER task, suggesting the potential effectiveness of the RTT model.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105470"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Random sampling analysis in the linear canonical transform domain 线性正则变换域的随机抽样分析

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-09 DOI: 10.1016/j.dsp.2025.105453

Yina Zhang , Feng Zhang

引用次数: 0

DAMMD-Net: A lightweight and enhanced deep segmentation network for skin lesion detection DAMMD-Net：一种用于皮肤病变检测的轻量级增强深度分割网络

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-09 DOI: 10.1016/j.dsp.2025.105477

Hasan Polat

{"title":"DAMMD-Net: A lightweight and enhanced deep segmentation network for skin lesion detection","authors":"Hasan Polat","doi":"10.1016/j.dsp.2025.105477","DOIUrl":"10.1016/j.dsp.2025.105477","url":null,"abstract":"<div><div>Early and accurate diagnosis of skin cancer is critical to improving survival rates. Dermoscopy is one of the most important imaging techniques for this purpose. However, manual examination of dermoscopic images is laborious, time-consuming, and error-prone due to variations in the color, shape, location, texture, and size of skin lesions. Therefore, developing automatic segmentation models is crucial for assisting physicians in both qualitative and quantitative assessments. Although numerous deep learning-based segmentation models have produced satisfactory results in skin lesion detection, their backbone architectures still face intrinsic limitations and extrinsic challenges. In light of this motivation, this paper proposes a lightweight and enhanced segmentation network (DAMMD-Net) based on the DeepLabV3+ model, with an attention mechanism (AAC) and modified decoder to improve segmentation performance. The AAC is used as a local feature enhancement tool to address the interference of useless information related to healthy skin. The modified decoder module enhances the network's ability to capture spatial details and integrate contextual information by leveraging multi-level feature maps from the encoder. The proposed segmentation pipeline has been evaluated on two well-known benchmark datasets: ISIC2018 and PH2. The experimental results showed that DAMMD-Net achieved an average Dice similarity coefficient (DSC) of 0.887 for the ISIC2018 dataset and 0.929 for the PH2 dataset, outperforming the backbone network. The overall results revealed that the proposed DAMMD-Net not only achieved satisfactory performance compared to existing models but also demonstrated significant potential for clinical practice due to its lightweight architecture.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105477"},"PeriodicalIF":2.9,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144614545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-level composite attention-guided network for indoor visual localization 室内视觉定位多层次复合注意引导网络

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-08 DOI: 10.1016/j.dsp.2025.105458

Xiaogang Song , Hailong Yang , Junjie Tang , Xiaochang Li , Xiaofeng Lu , Xinhong Hei

{"title":"A multi-level composite attention-guided network for indoor visual localization","authors":"Xiaogang Song , Hailong Yang , Junjie Tang , Xiaochang Li , Xiaofeng Lu , Xinhong Hei","doi":"10.1016/j.dsp.2025.105458","DOIUrl":"10.1016/j.dsp.2025.105458","url":null,"abstract":"<div><div>Accurate and robust camera pose estimation is essential for autonomous navigation and path planning in unmanned systems. To improve the localization accuracy in complex indoor scenes and mitigate information loss during feature extraction, we propose a multi-level composite attention-guided scene coordinate regression method. The proposed model predicts the mapping between 2D pixel points and 3D scene coordinates from a single RGB image. First, we introduce a Multi-level Feature Fusion Module (MFF), which employs global pooling and parallel branches to consolidate multi-level features, enhancing discrimination in repetitive structures and low-texture regions. Next, we design an Embedded Attention Module (EAM) to dynamically fuse multi-level features through parallel channel and spatial attention mechanisms, preserving edge details and suppressing noise. Finally, a differentiable random sample consensus algorithm is used to achieve robust fitting of pose parameters. Evaluation and analysis on common indoor public datasets demonstrate that the proposed method significantly improves localization performance. Additionally, extensive ablation evaluations confirm the effectiveness of the proposed Embedded Attention Module and Multi-level Feature Fusion Module in enhancing localization accuracy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105458"},"PeriodicalIF":2.9,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classification bias and regression bias adjustment for long-tailed traffic sign detection and recognition 长尾交通标志检测与识别的分类偏差与回归偏差调整

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-08 DOI: 10.1016/j.dsp.2025.105467

Jiajie Li, Weiguo Huang, Guifu Du, Qiaoyue Li

{"title":"Classification bias and regression bias adjustment for long-tailed traffic sign detection and recognition","authors":"Jiajie Li, Weiguo Huang, Guifu Du, Qiaoyue Li","doi":"10.1016/j.dsp.2025.105467","DOIUrl":"10.1016/j.dsp.2025.105467","url":null,"abstract":"<div><div>Traffic sign detection and recognition (TSDR), as a pivotal technology in Intelligent Transportation System, has attracted growing focus and widespread application in recent times. However, in practical applications, the complexity and variability of road conditions lead to a pronounced long-tailed distribution in traffic signs, i.e., a few classes account for a large proportion of instances while most classes contain only a few instances. This long-tailed distribution leads to significant bias during training. In this paper, we identify that such bias issues exist not only in the classification branch but also in the regression branch. Therefore, we first propose the Classification Bias Adjustment (CA) module to address classification bias. This module combines margin adjustment and gradient adjustment strategies based on the mean classification scores to alleviate classification bias. Meanwhile, we propose the Regression Bias Adjustment (RA) module to address regression bias. This module re-weights the regression loss for each class in accordance with the mean Intersection over Union (IoU) to alleviate regression bias. Through comprehensive experiments on the TT100K and GTSDB datasets, it has been validated that our proposed approach has greater effectiveness than the existing state-of-the-art methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105467"},"PeriodicalIF":2.9,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144654518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generating invisible adversarial watermarks based on block-matching embedding algorithm 基于块匹配嵌入算法的不可见对抗水印生成

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-08 DOI: 10.1016/j.dsp.2025.105476

Chenxiao Wang , Zihao Zeng , Xiaoyue Hu , Yong Chen

{"title":"Generating invisible adversarial watermarks based on block-matching embedding algorithm","authors":"Chenxiao Wang , Zihao Zeng , Xiaoyue Hu , Yong Chen","doi":"10.1016/j.dsp.2025.105476","DOIUrl":"10.1016/j.dsp.2025.105476","url":null,"abstract":"<div><div>Adversarial attack methods against Deep Neural Network (DNN) models have received extensive attention and research. Adversarial attack methods mean adding subtle perturbations to the original image to mislead the recognition ability of the DNN model. How to improve the adversarial attack performance and protect the visual effect of the perturbation image is still the main challenge in this field. Based on an image block-matching embedding algorithm, this paper proposes a novel adversarial method of embedding invisible watermarks for generating adversarial examples for deceptive DNN models. Firstly, utilizing up-sampling techniques to increase the embedding capacity of the original image while ensuring the visual quality of the watermark image. Secondly, the watermark image is embedded into the original image in a chunked manner. The cosine similarity is utilized for block-matching and combined with invertible color transformation to embed the invisible watermark. Finally, the Simple Black-box Adversarial Attack(SimBA) is used to add adversarial perturbation to the watermark image to generate the invisible adversarial watermark. The inverse operation of this method ensures the reconstruction of the original watermark information. The experimental results show that the proposed method achieves an average attack success rate of 98.33% in different neural network models (VGG19, resnet101, SqueezeNet, ShuffleNet, ConvNext, and MaxViT), with an attack success rate of 99.05% in the ShuffleNet model, demonstrating the superiority of the proposed method over existing techniques. In addition, the generated invisible adversarial watermark performs well in terms of visual effects and robustness, providing additional concealment and effectively reducing the risk of detecting adversarial attacks.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105476"},"PeriodicalIF":2.9,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144611739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient spatial-temporal feature aggregation for multivariate time series forecasting with STCA 基于STCA的多变量时间序列预测的高效时空特征聚合

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-07 DOI: 10.1016/j.dsp.2025.105460

LiGuo Deng , WenDan Sha

{"title":"Efficient spatial-temporal feature aggregation for multivariate time series forecasting with STCA","authors":"LiGuo Deng , WenDan Sha","doi":"10.1016/j.dsp.2025.105460","DOIUrl":"10.1016/j.dsp.2025.105460","url":null,"abstract":"<div><div>Multivariate time series (MTS) prediction plays a crucial role in many practical applications. Although spatio-temporal graph neural networks (STGNNs) have demonstrated excellent performance in MTS prediction due to the advantages of graph convolutional networks and time series modeling, their high computational complexity limits their applicability in resource constrained environments. To improve prediction accuracy while maintaining model simplicity and computational efficiency, inspired by Spatial-temporal identity (STID), this paper introduces a novel MTS prediction framework—Spatial-Temporal Channel Aggregation (STCA). This framework consists of two modules: the Channel Point Aggregation Fusion module (CPAF) enhances the capture of local spatial information and efficiently models temporal dependencies through depthwise separable convolutions and pointwise convolutions. the Selective Attention(SelAttn) module employs a self-attention mechanism to uncover complex dependencies among features. Experimental results show that STCA outperforms existing methods on multiple benchmark datasets, achieving higher prediction accuracy while significantly reducing training time.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105460"},"PeriodicalIF":2.9,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An improved DeepLabV3+ network-based deep learning segmentation method for thermal image water-shorelines 基于DeepLabV3+网络的热图像水岸线深度学习分割方法

IF 2.9 3区工程技术

Digital Signal Processing Pub Date : 2025-07-06 DOI: 10.1016/j.dsp.2025.105461

Jiaxin Wang, Xinxu Liu, Jianxu Wang, Ming Yang

{"title":"An improved DeepLabV3+ network-based deep learning segmentation method for thermal image water-shorelines","authors":"Jiaxin Wang, Xinxu Liu, Jianxu Wang, Ming Yang","doi":"10.1016/j.dsp.2025.105461","DOIUrl":"10.1016/j.dsp.2025.105461","url":null,"abstract":"<div><div>The water-shorelines segmentation of thermal image is essential to the visual perception technologies and applications of unmanned surface craft. However, the traditional semantic segmentation algorithms have the problems of limited accuracy and low efficiency, which significantly restricts the segmentation performance. Although the segmentation accuracy of convolutional neural network (CNN) is greatly improved compared with these segmentation algorithms, the effect of same model for different regions is obviously different due to the uneven distribution of water-shoreline scene categories in different regions. Therefore, this study proposes an improved DeepLabV3+ network-based segmentation method for the water-shorelines by adding a SE channel attention mechanism and replacing its original backbone network. To validate the performance of the proposed method, an appropriate data set and several assessment indexes were also established. The experiments compared with several conventional algorithms shown that the obstacle interaction degree and mIoU of the proposed method can highly reach to 72.03 % and 90.17 %, which improved 4.81 % and 1.55 % compared with the DeepLabV3+ network model. Even for the limited sample images, it can also more accurate segmentation for small obstacles, and clearer extract for the water-shoreline feature information.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105461"},"PeriodicalIF":2.9,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0