Digital Signal Processing最新文献

筛选
英文 中文
Adaptive polarimetric persymmetric detection for distributed subspace targets in lognormal texture clutter 对数正态纹理杂波中分布式子空间目标的自适应偏振不对称检测
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-15 DOI: 10.1016/j.dsp.2024.104872
Lichao Liu , Qiang Guo , Yuhang Tian , Mykola Kaliuzhnyi , Vladimir Tuz
{"title":"Adaptive polarimetric persymmetric detection for distributed subspace targets in lognormal texture clutter","authors":"Lichao Liu ,&nbsp;Qiang Guo ,&nbsp;Yuhang Tian ,&nbsp;Mykola Kaliuzhnyi ,&nbsp;Vladimir Tuz","doi":"10.1016/j.dsp.2024.104872","DOIUrl":"10.1016/j.dsp.2024.104872","url":null,"abstract":"<div><div>In this paper, the adaptive polarimetric persymmetric detection for distributed subspace targets under the background of compound Gaussian clutter is investigated, where the compound Gaussian clutter exhibits texture that follows a lognormal distribution. Based on the two-step Generalized Likelihood Ratio Test (2S GLRT), two-step maximum a posteriori Generalized Likelihood Ratio Test (2S MAP GLRT), two-step Rao (2S Rao) test and two-step Wald (2S Wald) test, we have proposed four polarimetric persymmetric detectors. Initially, we model the target echo as a distributed subspace signal, assuming known clutter texture and polarization speckle covariance matrix (PSCM), and derive the corresponding test statistics. Then, the estimation of the lognormal texture is obtained through maximum a posteriori (MAP). Conventionally, a set of secondary data, which share the same PSCM as the cells under test (CUTs), is assumed to participate in the estimation of the PSCM, leveraging its inherent persymmetric property during the estimation process. Finally, the estimated values are substituted into the proposed test statistics to obtain fully adaptive polarimetric persymmetric detectors. Numerical experimental results using simulated data and measured sea clutter data demonstrate that the proposed four adaptive polarimetric persymmetric detectors exhibit a constant false alarm rate (CFAR) characteristic relative to the PSCM and satisfactory detection performance for distributed subspace targets.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104872"},"PeriodicalIF":2.9,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFFR-net: Multi-scale feature fusion and attentive recalibration network for deep neural speech enhancement MFFR-net:用于深度神经语音增强的多尺度特征融合和注意力重新校准网络
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-14 DOI: 10.1016/j.dsp.2024.104870
Nasir Saleem , Sami Bourouis
{"title":"MFFR-net: Multi-scale feature fusion and attentive recalibration network for deep neural speech enhancement","authors":"Nasir Saleem ,&nbsp;Sami Bourouis","doi":"10.1016/j.dsp.2024.104870","DOIUrl":"10.1016/j.dsp.2024.104870","url":null,"abstract":"<div><div>Deep neural networks (DNNs) have been successfully applied in advancing speech enhancement (SE), particularly in overcoming the challenges posed by nonstationary noisy backgrounds. In this context, multi-scale feature fusion and recalibration (MFFR) can improve speech enhancement performance by combining multi-scale and recalibrated features. This paper proposes a speech enhancement system that capitalizes on a large-scale pre-trained model, seamlessly fused with features attentively recalibrated using varying kernel sizes in convolutional layers. This process enables the SE system to capture features across diverse scales, enhancing its overall performance. The proposed SE system uses a transferable features extractor architecture and integrates with multi-scaled attentively recalibrated features. Utilizing 2D-convolutional layers, the convolutional encoder-decoder extracts both local and contextual features from speech signals. To capture long-term temporal dependencies, a bidirectional simple recurrent unit (BSRU) serves as a bottleneck layer positioned between the encoder and decoder. The experiments are conducted on three publicly available datasets including Texas Instruments/Massachusetts Institute of Technology (TIMIT), LibriSpeech, and Voice Cloning Toolkit+Diverse Environments Multi-channel Acoustic Noise Database (VCTK+DEMAND). The experimental results show that the proposed SE system performs better than several recent approaches on the Short-Time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) evaluation metrics. On the TIMIT dataset, the proposed system showcases a considerable improvement in STOI (17.3%) and PESQ (0.74) over the noisy mixture. The evaluation on the LibriSpeech dataset yields results with a 17.6% and 0.87 improvement in STOI and PESQ.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104870"},"PeriodicalIF":2.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient recurrent real video restoration 高效的循环真实视频修复
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-13 DOI: 10.1016/j.dsp.2024.104851
Antoni Buades, Jose-Luis Lisani
{"title":"Efficient recurrent real video restoration","authors":"Antoni Buades,&nbsp;Jose-Luis Lisani","doi":"10.1016/j.dsp.2024.104851","DOIUrl":"10.1016/j.dsp.2024.104851","url":null,"abstract":"<div><div>We propose a novel method that addresses the most common limitations of real video sequences, including noise, blur, flicker, and low contrast. This method leverages the Discrete Cosine Transform (DCT) extensively for both deblurring and denoising tasks, ensuring computational efficiency. It also incorporates classical strategies for tonal stabilization and low-light enhancement. To the best of our knowledge, this is the first unified framework that tackles all these problems simultaneously. Compared to state-of-the-art learning-based methods for denoising and deblurring, our approach achieves better results while offering additional benefits such as full interpretability, reduced memory usage, and lighter computational requirements, making it well-suited for integration into mobile device processing chains.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104851"},"PeriodicalIF":2.9,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PV-YOLO: A lightweight pedestrian and vehicle detection model based on improved YOLOv8 PV-YOLO:基于改进型 YOLOv8 的轻量级行人和车辆检测模型
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-13 DOI: 10.1016/j.dsp.2024.104857
Yuhang Liu , Zhenghua Huang , Qiong Song , Kun Bai
{"title":"PV-YOLO: A lightweight pedestrian and vehicle detection model based on improved YOLOv8","authors":"Yuhang Liu ,&nbsp;Zhenghua Huang ,&nbsp;Qiong Song ,&nbsp;Kun Bai","doi":"10.1016/j.dsp.2024.104857","DOIUrl":"10.1016/j.dsp.2024.104857","url":null,"abstract":"<div><div>With the frequent occurrence of urban traffic accidents, fast and accurate detection of pedestrian and vehicle targets has become one of the key technologies for intelligent assisted driving systems. To meet the efficiency and lightweight requirements of smart devices, this paper proposes a lightweight pedestrian and vehicle detection model based on the YOLOv8n model, named PV-YOLO. In the proposed model, receptive-field attention convolution (RFAConv) serves as the backbone network because of its target feature extraction ability, and the neck utilizes the bidirectional feature pyramid network (BiFPN) instead of the original path aggregation network (PANet) to simplify the feature fusion process. Moreover, a lightweight detection head is introduced to reduce the computational burden and improve the overall detection accuracy. In addition, a small target detection layer is designed to improve the accuracy for small distant targets. Finally, to reduce the computational burden further, the lightweight C2f module is utilized to compress the model. The experimental results on the BDD100K and KITTI datasets demonstrate that the proposed PV-YOLO can achieve higher detection accuracy than YOLOv8n and other baseline methods with less model complexity.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104857"},"PeriodicalIF":2.9,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video foreground and background separation via Gaussian scale mixture and generalized nuclear norm based robust principal component analysis 通过基于高斯尺度混合物和广义核规范的鲁棒主成分分析进行视频前景和背景分离
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-12 DOI: 10.1016/j.dsp.2024.104863
Yongpeng Yang , Zhenzhen Yang , Jianlin Li
{"title":"Video foreground and background separation via Gaussian scale mixture and generalized nuclear norm based robust principal component analysis","authors":"Yongpeng Yang ,&nbsp;Zhenzhen Yang ,&nbsp;Jianlin Li","doi":"10.1016/j.dsp.2024.104863","DOIUrl":"10.1016/j.dsp.2024.104863","url":null,"abstract":"<div><div>Since one decade, robust principal component analysis (RPCA) has been the most representative problem formulation for video foreground and background separation via decomposing an observed matrix into sparse and low-rank matrices. However, existing RPCA methods still have several major limitations for video foreground and background separation including neglecting impact of noise, low approximation degree for sparse and low-rank function, neglecting spatial-temporal relation of pixels and regularization parameter selection. All these limitations reduce their performance for video foreground and background separation. Consequently, in order to solve the problems of neglecting impact of noise and low approximation accuracy, we first design a novel RPCA method based on Gaussian scale mixture and generalized nuclear norm (GSMGNN), which integrates the Gaussian scale mixture (GSM) and generalized nuclear norm (GNN). Specifically, the GSM can better describe each pixel of foreground in videos via decomposing the foreground to a standardized Gaussian random variable and a positive hidden multiplier. Meanwhile, the GNN can better approximate to the low-rank background. In addition, we extend the GSMGNN method to the robust Gaussian scale mixture and generalized nuclear norm (RGSMGNN) method against noise via inducing the noise item. And the efficient ADMM method is adopted to solve these two proposed methods via breaking them into easier handling smaller pieces. At last, experiments on challenging datasets demonstrate the better effectiveness than many other state-of-the-art methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104863"},"PeriodicalIF":2.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNN Intelligent diagnosis method for bearing incipient faint faults based on adaptive stochastic resonance-wave peak cross correlation sliding sampling 基于自适应随机共振波峰值交叉相关滑动采样的 CNN 轴承初期故障智能诊断方法
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-12 DOI: 10.1016/j.dsp.2024.104871
Peng Liu , Shuo Zhao , Ludi Kang , Yibing Yin
{"title":"CNN Intelligent diagnosis method for bearing incipient faint faults based on adaptive stochastic resonance-wave peak cross correlation sliding sampling","authors":"Peng Liu ,&nbsp;Shuo Zhao ,&nbsp;Ludi Kang ,&nbsp;Yibing Yin","doi":"10.1016/j.dsp.2024.104871","DOIUrl":"10.1016/j.dsp.2024.104871","url":null,"abstract":"<div><div>As a representative of deep learning networks, convolutional neural networks (CNN) have been widely used in bearing fault diagnosis with good results. However, the signal length and segmentation of the input CNN can have a significant impact on diagnostic accuracy. In addition, the signal-to-noise ratio of early bearing faults is usually very low, which makes it difficult for traditional CNNs to accurately identify and classify these faults. To solve this problem, this paper proposes an adaptive stochastic resonance wave peak cross-correlation sliding sampling method. Firstly, the adaptive stochastic resonance is used to reduce the noise of the original signal, and then the data is divided from the position of the signal wave peak, the correlation coefficient between the divided signals is calculated, and the maximum value is found to determine the size of the division window. Finally, it is converted into a 2D image by Gramian Angular Field and input into CNN for diagnostic classification. The design methodology was validated using the Case Western Reserve University bearing dataset. Subsequently, three validation strategies were established on a self-built platform, including mixed diagnosis of 10 different bearing states, variable speed diagnosis, and low sampling data diagnosis. The proposed method outperforms the conventional CNN by 10 % in the Case Western Reserve University dataset test set. The variable speed test set is 24.67 % and 31.17 % higher, respectively. It is 30 % higher in low sampling data diagnosis.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104871"},"PeriodicalIF":2.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IGGCN: Individual-guided graph convolution network for pedestrian trajectory prediction IGGCN:用于行人轨迹预测的个体引导图卷积网络
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-12 DOI: 10.1016/j.dsp.2024.104862
Wangxing Chen, Haifeng Sang, Jinyu Wang, Zishan Zhao
{"title":"IGGCN: Individual-guided graph convolution network for pedestrian trajectory prediction","authors":"Wangxing Chen,&nbsp;Haifeng Sang,&nbsp;Jinyu Wang,&nbsp;Zishan Zhao","doi":"10.1016/j.dsp.2024.104862","DOIUrl":"10.1016/j.dsp.2024.104862","url":null,"abstract":"<div><div>Accurately predicting the future trajectory of pedestrians is crucial for applications such as autonomous driving and robot navigation. Graph convolution is widely used in trajectory prediction tasks due to its scalability and adaptive feature-learning capabilities. However, there are two problems with pedestrian trajectory prediction methods based on graph convolution: 1. Previous methods struggled to adjust social interactions according to the attributes of different pedestrians, making it difficult to accurately model the relative importance between different pedestrians and others; 2. Previous methods lacked dynamic processing of pedestrian spatial-temporal interaction features to capture high-level spatial-temporal interaction features effectively. Therefore, we propose an Individual-Guided Graph Convolution Network (IGGCN) for pedestrian trajectory prediction. To tackle problem 1, we design an individual-guided interaction module that can adjust pedestrian social interaction modeling according to the pedestrian's attributes, thereby achieving an accurate description of the relative importance of pedestrians. We extend the module to temporal interaction modeling to further achieve an accurate description of the relative importance of time frames. To address problem 2, we design a deformable convolution module to dynamically process spatial-temporal interaction features through deformable convolution kernels, facilitating the capture of high-level spatial-temporal interaction features. We evaluate our method on the ETH, UCY, and SDD datasets. Quantitative analysis shows that our method has lower prediction errors than the current state-of-the-art methods. Qualitative analysis further reveals that our method effectively eliminates the influence of irrelevant pedestrians and accurately models the spatial-temporal interaction relationship of pedestrians.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104862"},"PeriodicalIF":2.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Bayesian learning based multi trajectory tracking algorithm for direction of arrival trajectory estimation 基于稀疏贝叶斯学习的到达方向轨迹估计多轨迹跟踪算法
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-12 DOI: 10.1016/j.dsp.2024.104852
Sahar Barzegari Banadkoki, Mahmoud Ferdosizade Naeiny
{"title":"Sparse Bayesian learning based multi trajectory tracking algorithm for direction of arrival trajectory estimation","authors":"Sahar Barzegari Banadkoki,&nbsp;Mahmoud Ferdosizade Naeiny","doi":"10.1016/j.dsp.2024.104852","DOIUrl":"10.1016/j.dsp.2024.104852","url":null,"abstract":"<div><div>One of the applications of sequential sparse signal reconstruction is multi-target Direction of Arrival (DoA) trajectory estimation. In fact, each member of the support set is equivalent to the DoA of a moving target at each time instant. There is a mapping between the indices of the sparse vector and DoA values in continuous angle space. The key idea of this paper is to use the dynamic information of the continuous angular space to more accurately track sparse vectors and estimate the DoA trajectories of moving sources with time-varying acceleration based on the Sparse Bayesian Learning (SBL) framework. For this purpose, the members of the estimated support set are mapped to the continuous angular space at each instant. Then, the obtained DoAs are assigned to the available DoA trajectories using the Predictive-Description-Length (PDL) algorithm. In the following, the DoA of each source is predicted for the next time using the Kalman filter. Finally, the predicted DoAs are mapped to a sparse vector. The obtained sparse vector is used as the prior information for SBL-based sparse reconstruction. Simulation results show that the proposed algorithm, which is called SBL-MTT (Multi Trajectory Tracking), leads to an accurate reconstruction of successive sparse vectors in application of DoA trajectory estimation of moving sources.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104852"},"PeriodicalIF":2.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An enhanced lightweight model for small-scale pedestrian detection based on YOLOv8s 基于 YOLOv8s 的小规模行人检测增强型轻量级模型
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-10 DOI: 10.1016/j.dsp.2024.104866
Feifei Zhang , Lee Vien Leong , Kin Sam Yen , Yana Zhang
{"title":"An enhanced lightweight model for small-scale pedestrian detection based on YOLOv8s","authors":"Feifei Zhang ,&nbsp;Lee Vien Leong ,&nbsp;Kin Sam Yen ,&nbsp;Yana Zhang","doi":"10.1016/j.dsp.2024.104866","DOIUrl":"10.1016/j.dsp.2024.104866","url":null,"abstract":"<div><div>Autonomous vehicle scenarios often involve occluded and distant pedestrians, leading to missed and false detections or models that are too large to deploy. To address these issues, this study proposed a lightweight model based on Yolov8s. Feature extraction and fusion networks were redesigned to optimize the detection layer for better detection. The Backbone Network incorporated Dual Conv and ELAN to create the EDLAN module. The EDLAN module and optimized SPPF-LSKA improved the small-scale pedestrian feature extraction in complex backgrounds while reducing the parameters and computation. In Neck Network, BiFPN and VoVGSCSP enhance pedestrian features and improve detection. In addition, the WIoU loss function addressed the target imbalance to enhance generalization ability and overall performance. Enhanced Yolov8s was trained and validated using the CityPersons dataset. Compared to Yolov8s, it improved the precision, recall, F1 score, and mAP@50 by 5.2%, 7.2%, 6.8%, and 6.8%, respectively, while reducing the parameters by 68% and compressing the model size by 67%. The validation experiments were conducted on Caltech and BDD100K datasets. The result demonstrated that precision increased by 3.4% and 1.1%, the mAP@50 also increased by 7.6% and 2.8%, respectively. The modified model reduced the model parameters and size while effectively improving the detection accuracy, making it highly valuable for autonomous driving scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104866"},"PeriodicalIF":2.9,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shallow multiplexing and multiscale dilation convolution combined attention based oriented object detection in remote sensing images 基于遥感图像的浅层复用和多尺度扩张卷积组合注意力定向目标检测
IF 2.9 3区 工程技术
Digital Signal Processing Pub Date : 2024-11-10 DOI: 10.1016/j.dsp.2024.104865
Jiangtao Wang , Jiawei Shi
{"title":"Shallow multiplexing and multiscale dilation convolution combined attention based oriented object detection in remote sensing images","authors":"Jiangtao Wang ,&nbsp;Jiawei Shi","doi":"10.1016/j.dsp.2024.104865","DOIUrl":"10.1016/j.dsp.2024.104865","url":null,"abstract":"<div><div>Remote sensing images are becoming increasingly important in many areas of life because of the valuable information they provide. However, detecting objects in these images remains a difficult task due to their complex and variable characteristics, such as size, scale, and orientation. Moreover, there is a growing demand for efficient and speedy detection methods in practical applications. Therefore, in this paper, we propose a framework for oriented object detection in remote sensing images based on shallow multiplexing and multiscale dilation convolution combined attention. To achieve a lightweight network structure, we utilize ResNet18 as the backbone network. First, a shallow multiplexing module (SM) is designed to improve the utilization of detailed information in the shallow layer of the network. It enhances the interaction between the shallow and deep layers, resulting in a richer representation of network features. Second, a multiscale dilation convolution combined attention module (MDCA) is proposed to prioritize contextual information by using convolution with different dilation rates. This guides the network to focus more on the object information in remote sensing images. Then, the dilated encoder (DE) is employed at the feature fusion stage to enhance the semantic information of the context and produce a feature map with multiple receptive fields. Finally, the log<sub>2</sub> loss function is applied to improve the training results. The experiments are being conducted on three publicly available remote sensing image datasets, and the results demonstrate that the proposed algorithm outperforms other algorithms in terms of detection performance on these datasets. Code is available at <span><span>https://github.com/sbsfsum/SM-and-MDCA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104865"},"PeriodicalIF":2.9,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信