Deng Bian , Mingwei Tang , Miaogui Ling , Haowen Xu , Shixuan Lv , Qi Tang , Jie Hu
{"title":"A refined methodology for small object detection: Multi-scale feature extraction and cross-stage feature fusion network","authors":"Deng Bian , Mingwei Tang , Miaogui Ling , Haowen Xu , Shixuan Lv , Qi Tang , Jie Hu","doi":"10.1016/j.dsp.2025.105297","DOIUrl":"10.1016/j.dsp.2025.105297","url":null,"abstract":"<div><div>In recent years, object detection has become a prominent area in computer vision, witnessing significant advancements. While modern detectors can tackle small objects, they still face challenges in feature extraction and fusion during the detection process. The limited representation of small objects hampers the model's ability to discern discriminative information crucial for subsequent tasks. To address this issue, we introduce the Multi-Scale Feature Extraction and Cross-Stage Feature Fusion Network (MCFN) for small object detection. By obtaining abundant multi-scale feature information, MCFN can improve the effectiveness of small object detection. Initially, a Multi-Scale Feature Extraction module (MSFE) is introduced to capture target object features at various scales, providing high-quality feature information for subsequent processing. Secondly, for feature fusion, a Cross-Stage Feature Pyramid Network (CSFPN) is employed to merge feature maps across different layers, enabling the model to leverage abstract information from higher-level feature maps and detailed information from lower-level feature maps. Finally, experimental results on the VisDrone-DET2019 and constellation datasets confirm MCFN's superior performance in small object detection, outperforming mainstream detectors.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"164 ","pages":"Article 105297"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A survey of multi-view stereo 3D reconstruction algorithms based on deep learning","authors":"Shengjie Feng, Xiaoqun Wu, Jian Cao","doi":"10.1016/j.dsp.2025.105291","DOIUrl":"10.1016/j.dsp.2025.105291","url":null,"abstract":"<div><div>Multi-view stereo (MVS) 3D surface reconstruction, as a core problem in the fields of computer vision and graphics, aims to accurately recover the geometric structure of a scene from multi-view images. This technology bridges the gap between data capture and surface editing. It can be used in various downstream applications such as cultural heritage preservation, urban construction planning, Virtual Reality (VR), and Augmented Reality (AR). However, MVS still faces numerous challenges when dealing with complex scenes, high-frequency textures, and occlusions. Improving accuracy, robustness, and computational efficiency has become a key focus of current research. To address the aforementioned challenges, this survey systematically reviews key techniques in MVS-based 3D reconstruction, focusing on the latest advancements in deep learning methods. It provides a detailed analysis of the general MVS pipeline, including feature extraction, cost volume construction, cost volume regularization, and loss functions. Additionally, we compile commonly used datasets for MVS 3D reconstruction (with corresponding links) and evaluation metrics. In response to the performance limitations of traditional methods in complex scenes, we analyze three representative paradigms based on the evolution of network architectures: convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformer-based structures. Furthermore, to broaden research perspectives, we explore and summarize the applications of emerging Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) technologies in MVS, emphasizing a comparative analysis of their implementation processes. For experimental evaluation, this paper reviews the performance of leading MVS approaches on several public datasets, and compares their effectiveness. Finally, we summarize the theoretical and practical significance of this survey, highlighting its main contributions, strengths, and limitations, proposing future research directions and suggestions for MVS 3D reconstruction.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"165 ","pages":"Article 105291"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Luo, Shurui Zhang, Yubing Han, Renli Zhang, Weixing Sheng
{"title":"Low complexity robust subband adaptive beamforming with frequency-angle coupling","authors":"Jie Luo, Shurui Zhang, Yubing Han, Renli Zhang, Weixing Sheng","doi":"10.1016/j.dsp.2025.105288","DOIUrl":"10.1016/j.dsp.2025.105288","url":null,"abstract":"<div><div>Subband adaptive beamforming (SAB) is an important technology in wideband array signal processing, where subband focusing algorithms are widely adopted due to their moderate computational complexity and satisfactory performance. However, these algorithms typically require prior knowledge of the directions of the desired signal and interference. To improve interference suppression performance with retained low complexity, three new algorithms are proposed: subband covariance matrix superposition adaptive beamforming (SCMSAB) achieves adaptive null broadening through covariance matrix stacking, single subband covariance matrix adaptive beamforming (SSCMAB) specializes in low relative bandwidth scenarios, and low-complexity robust SAB (LCRSAB) implements frequency-angle coupling on reference subband covariance matrices based on SSCMAB. Both SCMSAB and LCRSAB are applicable to wideband scenarios. LCRSAB further enhances performance through numerical optimization of reference frequencies and virtual interference parameters, ensuring accurate null steering. Simulations validate that all three algorithms significantly reduce computational complexity compared to conventional focusing methods while retaining robustness and outperforming traditional approaches in practical implementations.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"164 ","pages":"Article 105288"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Texture and structural distortion metric based on dual-tree complex wavelet transform for DIBR-synthesized image quality assessment","authors":"Huan Zhang , Zhijun Xiong , Xu Zhang , Jiangzhong Cao , Yun Zhang","doi":"10.1016/j.dsp.2025.105293","DOIUrl":"10.1016/j.dsp.2025.105293","url":null,"abstract":"<div><div>With the improvement of Depth-Image-Based Rendering (DIBR) technology, previous Image Quality Assessment (IQA) models that rely on strong bias priors of DIBR distortion may not be able to locate the evolving DIBR distortion accurately, thus leading to decreased performance. To address this issue, in this paper, a new full-reference image quality assessment model based on Dual-Tree Complex Wavelet Transform (DTCWT) is proposed, which measures the DIBR distortion across intermediate and coarse levels, as well as multiple directions, from the perspective of structure and texture. Specifically, DTCWT is utilized to mimic the multi-directional and multi-scale visual characteristics of the Human Visual System (HVS). Through DTCWT, texture features, representing fine details, and structural features, denoting higher-level semantic information, are extracted from the low-level and high-level subbands, respectively. Features extracted from various directions are then weighted and aggregated, with their significance determined by the distribution of distortion directions in the assessment of texture and structural distortions. For the final quality score, the product of texture and structural distortions is used. Experimental results on two publicly available DIBR datasets (i.e., the IRCCyN/IVC and IETR) show that the proposed method has better average performance than state-of-the-art methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"165 ","pages":"Article 105293"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rongfeng Zhou , Deqiang Cheng , Rugang Wang , Ping Li , Yuanyuan Wang , Feng Zhou
{"title":"Unsupervised low-light image enhancement algorithm based on texture-aware generation and dual discriminator","authors":"Rongfeng Zhou , Deqiang Cheng , Rugang Wang , Ping Li , Yuanyuan Wang , Feng Zhou","doi":"10.1016/j.dsp.2025.105296","DOIUrl":"10.1016/j.dsp.2025.105296","url":null,"abstract":"<div><div>Aiming at the conventional low-light image enhancement algorithms with the problems of loss of detail texture, large noise, and small dataset of paired low/normal light images, an unsupervised low-light image enhancement algorithm called TA-GAN (Texture Aware GAN) by basing on texture-aware generation and dual discriminator is proposed and demonstrated for the first time. It mainly consists of four parts: texture-aware enhancement module, generator module, dual discriminator module, and denoising module. Firstly, in the texture-aware enhancement module, the grayscale regularization image and texture-aware loss function are used to highlight the lighting and texture information of the image. Secondly, in the generator module, pyramid pooling mechanism is used to obtain information at different scales of the image. After that, the U-network fuses the grayscale regularization images to enable gradual enhancement of illumination and detail recovery, and the image generated by the U-network is fed into the compound polarized attention mechanism to achieve high-quality pixel-level regression. The dual discriminator consists of a global discriminator and a local discriminator. The global discriminator is concerned with the overall structure and global consistency of the image. The local discriminator focuses on the local region of the image and evaluates the realism of the local details of the generated image. Finally, in the denoising module, the feature information of similar regions in the image is utilized to reduce the noise size. The algorithm in this paper has been experimentally analyzed on LOL dataset with PSNR value of 23.94 and SSIM of 0.87. It can effectively solve the problems of conventional low-light image enhancement algorithms such as loss of detail texture, large noise, and small dataset of paired low/normal light images. It is improved in both subjective and objective evaluation indexes.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"165 ","pages":"Article 105296"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local-global routing between capsules for hyperspectral image classification with noisy labels","authors":"Heng Zhou , Ping Zhong","doi":"10.1016/j.dsp.2025.105294","DOIUrl":"10.1016/j.dsp.2025.105294","url":null,"abstract":"<div><div>The complexity of hyperspectral image (HSI) potentially leads to annotation errors, necessitating robust learning techniques to distinguish true patterns under noisy labels. Existing methods for handling noisy labels often mistakenly remove normal samples and struggle with generalization due to the complex interaction between spatial-spectral information and noisy labels. To address these challenges, a novel perspective is proposed to employ the intrinsic consistency of individual sample features for robust learning. This perspective is implemented in a novel local-global capsule network (LGCaps) for HSI classification. Specifically, LGCaps integrates a local-global routing algorithm and coupled capsule decoders. The local-global routing algorithm integrates feature levels hierarchically and iteratively through local aggregation, ensuring consistency between local and global capsules to capture both coarse and fine-grained features, enhancing robustness by leveraging intrinsic sample feature consistency. Moreover, the coupled capsule decoders dynamically compress and reconstruct spectral and spatial information, collaborating with the encoder to leverage intrinsic consistency, preventing overfitting to noisy labels and improving model robustness and performance. Additionally, extensive experiments across four HSI datasets demonstrate that LGCaps surpasses existing capsule networks and models resistant to noisy labels, especially effective when trained with noisy labels and limited samples.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"164 ","pages":"Article 105294"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qing Liu, Jian Xie, Zhaolin Zhang, Yanyun Gong, Ling Wang
{"title":"Tensor-based passive localization of multiple wideband emitters using PARAFAC decomposition","authors":"Qing Liu, Jian Xie, Zhaolin Zhang, Yanyun Gong, Ling Wang","doi":"10.1016/j.dsp.2025.105290","DOIUrl":"10.1016/j.dsp.2025.105290","url":null,"abstract":"<div><div>Emitter localization techniques are crucial for various applications in both civilian and military surveillance. In this work, we propose a passive position determination method for localizing multiple emitters by utilizing both direction of arrival (DOA) and time of arrival (TOA) information. We adopt a joint spatio-temporal processing framework that integrates an antenna array with a bandpass filter bank to intercept and localize signals from the interested emitters. The intercepted signal is then characterized as a low-rank third-order tensor, enabling the application of PARAllel FACtor (PARAFAC) decomposition to extract the spatio-temporal response matrix. Subsequently, a localization cost function, which is directly related to the emitter position, is formulated based on the estimated response matrix. The emitters' positions are determined through a nonlinear grid search algorithm. Additionally, numerical examples are provided to illustrate the effectiveness of the proposed method, demonstrating its superior performance in terms of estimation accuracy and resolution capability, particularly in scenarios involving multiple emitters.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"164 ","pages":"Article 105290"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renjie Li , Dezhi Tian , Zhennan Liang , Shaoqiang Chang , Quanhua Liu
{"title":"Multi-frequency based underdetermined direction-of-arrival estimation for irregularly positioned sparse array","authors":"Renjie Li , Dezhi Tian , Zhennan Liang , Shaoqiang Chang , Quanhua Liu","doi":"10.1016/j.dsp.2025.105295","DOIUrl":"10.1016/j.dsp.2025.105295","url":null,"abstract":"<div><div>With the increasing complexity of modern electromagnetic environments, it is more and more common that the number of sources is more than the number of physical sensors, hence underdetermined direction-of-arrival (DOA) estimation has been a key problem. Sparse array is able to increase degrees-of-freedom (DOFs) in the context of difference coarray equivalence. However, most of the related methods have strict restrictions on sparse array configurations and are difficult to apply in engineering. Therefore, in this paper, we study the underdetermined DOA estimation method based on irregularly positioned sparse arrays. Firstly, we extend the virtual array elements by multi-frequency, and then, to address the DOA ambiguity caused by the existence of holes in the virtual array, we take advantage of the difference in the positions of the virtual array elements generated by different frequency and realize the unambiguous DOA estimation by the spatial spectral multiplication of the different virtual array. In addition, we discuss the theoretical constraints on the random layout of sparse array elements and gives a design solution which can exploit all of the DOFs. The simulation results demonstrate the effectiveness of the method.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"165 ","pages":"Article 105295"},"PeriodicalIF":2.9,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A target tracking method by fusing hybrid domain saliency techniques","authors":"Zhongming Liao , Zhaosheng Xu , Huanhuan Hou , Azlan Ismail","doi":"10.1016/j.dsp.2025.105292","DOIUrl":"10.1016/j.dsp.2025.105292","url":null,"abstract":"<div><div>Significant progress has been made in target-tracking research, but existing algorithms continue to encounter several challenges. Inaccurate target localization often results from errors in the response map, while the limited discriminative power of template branches contributes to frequent tracking failures. These issues undermine the accuracy and robustness of current tracking methods. To address these challenges, we propose a target tracking algorithm, HST-TT, based on a hybrid domain saliency technique combined with a dynamic weighting strategy. A saliency map extraction module employing dynamic weighting strategies across frequency, spatial, and temporal domains has been designed. This module is integrated into the search template branch of the tracking network to enhance its discriminative power and enable the precise extraction of the target's saliency map. In addition, a lightweight convolutional neural network feature extraction module has been developed, integrating compression and excitation attention mechanisms to extract effective multi-resolution features, generate a response map, and improve target-background discrimination by enhancing relevant channel features while suppressing redundant ones. HST-TT is thoroughly evaluated on seven benchmark datasets: OTB2015, UAV123, LaSOT, VOT2018, VOT2019, GOT-10k, and TrackingNet. Comparative experimental results show that HST-TT outperforms state-of-the-art tracking algorithms across key performance metrics, achieving notable improvements in target localization accuracy and tracking robustness.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"164 ","pages":"Article 105292"},"PeriodicalIF":2.9,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ciro André Pitz , Marcos Vinicius Matsuo , Rui Seara
{"title":"A gradient-based algorithm for joint beamforming and reflection design in RIS-assisted mobile communications","authors":"Ciro André Pitz , Marcos Vinicius Matsuo , Rui Seara","doi":"10.1016/j.dsp.2025.105298","DOIUrl":"10.1016/j.dsp.2025.105298","url":null,"abstract":"<div><div>Reconfigurable intelligent surfaces (RISs) have emerged as a transformative technology for shaping the propagation environment in next-generation mobile communication systems, creating the need for efficient optimization strategies. In this context, this paper introduces a gradient-based algorithm for jointly optimizing beamforming and reflection design in the uplink of RIS-assisted multi-user (MU) multiple-input single-output (MISO) systems. The proposed approach addresses the sum-rate capacity maximization problem while significantly reducing computational complexity. To achieve this, the optimization problem is reformulated by using eigenvalue minimization, thereby reducing computational burden. This reformulation reduces computational demands without compromising the sum-rate performance. The proposed algorithm incorporates both steepest descent and the damped quasi-Newton Broyden–Fletcher–Goldfarb–Shanno (BFGS) approximation, enabling flexible trade-offs between computational efficiency and convergence rate. Simulation results confirm the effectiveness of the proposed algorithm in enhancing the sum-rate capacity.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"164 ","pages":"Article 105298"},"PeriodicalIF":2.9,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}