Fanghua Hong;Wanyu Wang;Andong Lu;Lei Liu;Qunjing Wang
{"title":"Efficient RGBT Tracking via Multi-Path Mamba Fusion Network","authors":"Fanghua Hong;Wanyu Wang;Andong Lu;Lei Liu;Qunjing Wang","doi":"10.1109/LSP.2025.3563123","DOIUrl":"https://doi.org/10.1109/LSP.2025.3563123","url":null,"abstract":"RGBT tracking aims to fully exploit the complementary advantages of visible and infrared modalities to achieve robust tracking, thus the design of multimodal fusion network is crucial. However, existing methods typically adopt CNNs or Transformer networks to construct the fusion network, which poses a challenge in achieving a balance between performance and efficiency. To overcome this issue, we introduce an innovative visual state space (VSS) model, represented by Mamba, for RGBT tracking. In particular, we design a novel multi-path Mamba fusion network that achieves robust multimodal fusion capability while maintaining a linear overhead. First, we design a multi-path Mamba layer to sufficiently fuse two modalities in both global and local perspectives. Second, to alleviate the issue of inadequate VSS modeling in the channel dimension, we introduce a simple yet effective channel swapping layer. Extensive experiments conducted on four public RGBT tracking datasets demonstrate that our method surpasses existing state-of-the-art trackers. Notably, our fusion method achieves higher tracking performance compared to the well-known Transformer-based fusion approach (TBSI), while also achieving 92.8% and 80.5% reductions in parameter count and computational cost, respectively.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1790-1794"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TF-CorrNet: Leveraging Spatial Correlation for Continuous Speech Separation","authors":"Ui-Hyeop Shin;Bon Hyeok Ku;Hyung-Min Park","doi":"10.1109/LSP.2025.3562819","DOIUrl":"https://doi.org/10.1109/LSP.2025.3562819","url":null,"abstract":"In general, multi-channel source separation has utilized inter-microphone phase differences (IPDs) concatenated with magnitude information in time-frequency domain, or real and imaginary components stacked along the channel axis. However, the spatial information of a sound source is fundamentally contained in the “differences” between microphones, specifically in the correlation between them, while the power of each microphone also provides valuable information about the source spectrum, which is why the magnitude is also included. Therefore, we propose a network that directly leverages a correlation input with phase transform (PHAT)-<inline-formula><tex-math>$beta$</tex-math></inline-formula> to estimate the separation filter. In addition, the proposed TF-CorrNet processes the features alternately across time and frequency axes as a dual-path strategy in terms of spatial information. Furthermore, we add a spectral module to model source-related direct time-frequency patterns for improved speech separation. Experimental results demonstrate that the proposed TF-CorrNet effectively separates the speech sounds, showing high performance with a low computational cost in the LibriCSS dataset.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1875-1879"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Prediction of Hybrid Integration Detector for Radar Moderately Fluctuating Rayleigh Targets","authors":"Hongying Zheng;Qilei Zhang;Yongsheng Zhang","doi":"10.1109/LSP.2025.3562829","DOIUrl":"https://doi.org/10.1109/LSP.2025.3562829","url":null,"abstract":"In this letter, we address the performance prediction of the hybrid integration detector for radar moderately fluctuating Rayleigh targets in thermal noise. Initially, the moderately fluctuating Rayleigh target model is defined as a general form of well-known Swerling I and Swerling II models using an exponential correlation function. Based on this, an exact closed-form expression of detection probability for hybrid integration detector is derived. In extreme conditions (correlation coefficient is 1 or 0), the derived expression can degrade into the classical formulas, confirming it's validity. Finally, numerical examples are presented to verify the effectiveness of the derived theoretical model and to analyze the issue of optimal hybrid integration detector.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1920-1924"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decentralized Smoothing ADMM for Quantile Regression With Non-Convex Sparse Penalties","authors":"Reza Mirzaeifard;Diyako Ghaderyan;Stefan Werner","doi":"10.1109/LSP.2025.3562828","DOIUrl":"https://doi.org/10.1109/LSP.2025.3562828","url":null,"abstract":"In the rapidly evolving internet-of-things (IoT) ecosystem, effective data analysis techniques are crucial for handling distributed data generated by sensors. Addressing the limitations of existing methods, such as the sub-gradient approach, which fails to distinguish between active and non-active coefficients effectively, this paper introduces the decentralized smoothing alternating direction method of multipliers (DSAD) for penalized quantile regression. Our method leverages non-convex sparse penalties like the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD), improving the identification and retention of significant predictors. DSAD incorporates a total variation norm within a smoothing ADMM framework, achieving consensus among distributed nodes and ensuring uniform model performance across disparate data sources. This approach overcomes traditional convergence challenges associated with non-convex penalties in decentralized settings. We present convergence proof and extensive simulation results to validate the effectiveness of the DSAD, demonstrating its superiority in achieving reliable convergence and enhancing estimation accuracy compared with prior methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1915-1919"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandru Brateanu;Raul Balmez;Adrian Avram;Ciprian Orhei;Cosmin Ancuti
{"title":"LYT-NET: Lightweight YUV Transformer-Based Network for Low-Light Image Enhancement","authors":"Alexandru Brateanu;Raul Balmez;Adrian Avram;Ciprian Orhei;Cosmin Ancuti","doi":"10.1109/LSP.2025.3563125","DOIUrl":"https://doi.org/10.1109/LSP.2025.3563125","url":null,"abstract":"This letter introduces LYT-Net, a novel lightweight transformer-based model for low-light image enhancement. LYT-Net consists of several layers and detachable blocks, including our novel blocks—Channel-Wise Denoiser (<bold>CWD</b>) and Multi-Stage Squeeze & Excite Fusion (<bold>MSEF</b>)—along with the traditional Transformer block, Multi-Headed Self-Attention (<bold>MHSA</b>). In our method we adopt a dual-path approach, treating chrominance channels <inline-formula><tex-math>$U$</tex-math></inline-formula> and <inline-formula><tex-math>$V$</tex-math></inline-formula> and luminance channel <inline-formula><tex-math>$Y$</tex-math></inline-formula> as separate entities to help the model better handle illumination adjustment and corruption restoration. Our comprehensive evaluation on established LLIE datasets demonstrates that, despite its low complexity, our model outperforms recent LLIE methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"2065-2069"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Effective Yet Fast Early Stopping Metric for Deep Image Prior in Image Denoising","authors":"Xiaohui Cheng;Shaoping Xu;Wuyong Tao","doi":"10.1109/LSP.2025.3562948","DOIUrl":"https://doi.org/10.1109/LSP.2025.3562948","url":null,"abstract":"The deep image prior (DIP) and its variants have demonstrated the ability to address image denoising in an unsupervised manner using only a noisy image as training data, but practical limitations arise due to overfitting in highly overparameterized models and the lack of robustness in the fixed iteration step of early stopping, which fails to adapt to varying noise levels and image contents, thereby affecting denoising effectiveness. In this work, we propose an effective yet fast early stopping metric (ESM) to overcome these limitations when applying DIP models to process synthetic or real noisy images. Specifically, our ESM measures the image quality of the output images generated by the DIP network. We split the output image from each iteration into two sub-images and calculate their distance as an ESM to evaluate image quality. When the ESM stops decreasing over several iterations, we end the training, ensuring near-optimal performance without needing the ground-truth image, thus reducing computational costs and making ESM suitable for application in the denoising of real noisy images.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1925-1929"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SELL:A Method for Low-Light Image Enhancement by Predicting Semantic Priors","authors":"Quanquan Xiao;Haiyan Jin;Haonan Su;Ruixia Yan","doi":"10.1109/LSP.2025.3562822","DOIUrl":"https://doi.org/10.1109/LSP.2025.3562822","url":null,"abstract":"In recent years, low-light image enhancement techniques have made significant progress in generating reasonable visual details. However, current methods have not yet fully utilized the full semantic prior of visual elements in low-light environments. Therefore, images generated by these low-light image enhancement methods often suffer from degraded visual quality and may even be distorted. To address this problem, we propose a method to guide low-light image enhancement by predicting semantic priors. Specifically, we train a semantic prior predictor under standard lighting conditions, which is made to learn and predict semantic prior features for low-light images by knowledge distillation on high-quality standard images. Subsequently, we utilize a semantic-aware module that enables the model to adaptively integrate these learned semantic priors, thus ensuring semantic consistency of the enhanced images. Experiments show that the method outperforms several current state-of-the-art methods in terms of visual performance on the LOL-v2 and SICE benchmark datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1785-1789"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143925033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingyun Tian;Qiang Shen;Zilong Deng;Yang Gao;Simiao Wang
{"title":"Mask-Guided Cross-Modality Fusion Network for Visible-Infrared Vehicle Detection","authors":"Lingyun Tian;Qiang Shen;Zilong Deng;Yang Gao;Simiao Wang","doi":"10.1109/LSP.2025.3562816","DOIUrl":"https://doi.org/10.1109/LSP.2025.3562816","url":null,"abstract":"Drone-based vehicle detection is crucial for intelligent traffic management. However, current methods relying solely on single visible or infrared modalities struggle with precision and robustness, especially in adverse weather conditions. The effective integration of cross-modal information to enhance vehicle detection still poses significant challenges. In this letter, we propose a masked-guided cross-modality fusion method, called MCMF, for robust and accurate visible-infrared vehicle detection. Firstly, we construct a framework consisting of three branches, with two dedicated to the visible and infrared modalities respectively, and another tailored for the fused multi-modal. Secondly, we introduce a Location-Sensitive Masked AutoEncoder (LMAE) for intermediate-level feature fusion. Specifically, our LMAE utilizes masks to cover intermediate-level features of one modality based on the prediction hierarchy of another modality, and then distills cross-modality guidance information through regularization constraints. This strategy, through a self-learning paradigm, effectively preserves the useful information from both modalities while eliminating redundant information from each. Finally, the fused features are input into an uncertainty-based detection head to generate predictions for bounding boxes of vehicles. When evaluated on the DroneVehicle dataset, our MCIF reaches 71.42% w.r..t. mAP, outperforming an established baseline method by 7.42%. Ablation studies further demonstrate the effectiveness of our LMAE for visible-infrared fusion.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1815-1819"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143925250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting of Mutual-Structure Denoising: A Plug-and-Play Solution for Compressive Sampling MRI Reconstruction With Theoretical Guarantees","authors":"Qingyong Zhu;Majun Shi;Zhuo-Xu Cui;Hongwu Zeng;Dong Liang","doi":"10.1109/LSP.2025.3562820","DOIUrl":"https://doi.org/10.1109/LSP.2025.3562820","url":null,"abstract":"The field of accelerated magnetic resonance imaging (AMRI) has garnered significant attention, focusing on reconstructing target image from compressively sampled k-space data to address an ill-posed linear inverse problem. In this study, we exploit the multiparameterization of MRI to propose a new plug-and-play prior (P<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>) for enhancing reconstruction quality. We begin by introducing a mutual-structure guided P<inline-formula><tex-math>$^{3}$</tex-math></inline-formula> (MS-GP<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>) framework, based on jointly penalized least-squares regression (JPLSR), to selectively transfer common priors from a reference image to the target one, thereby minimizing errors caused by indiscriminate structural replication. Furthermore, we establish a self-sharpening weighting (SSW) scheme that effectively differentiates between the sharp and smooth image components, contributing to a boosted variant of MS-GP<inline-formula><tex-math>$^{3}$</tex-math></inline-formula> (BMS-GP<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>) for further improvement in artifact suppression and detail restoration. Finally, embedding BMS-GP<inline-formula><tex-math>$^{3}$</tex-math></inline-formula> in half-quadratic splitting (HQS) iterations yields an advanced AMRI algorithm, dubbed BMS-GP<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>-HQS, which not only outperforms state-of-the-art (SOTA) methods but also provides robust theoretical guarantees, including ensured convergence and resilience to noise.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1880-1884"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dengke Zhang;Quan Tang;Fagui Liu;Haiqing Mei;C. L. Philip Chen
{"title":"Exploring Token-Level Augmentation in Vision Transformer for Semi-Supervised Semantic Segmentation","authors":"Dengke Zhang;Quan Tang;Fagui Liu;Haiqing Mei;C. L. Philip Chen","doi":"10.1109/LSP.2025.3562821","DOIUrl":"https://doi.org/10.1109/LSP.2025.3562821","url":null,"abstract":"Semi-supervised semantic segmentation has witnessed remarkable advancements in recent years. However, existing algorithms are based on convolutional neural networks, and directly applying them to Vision Transformers poses certain limitations due to conceptual disparities. To this end, we propose TokenSwap, a data augmentation technique designed explicitly for semi-supervised semantic segmentation with Vision Transformers. TokenSwap aligns well with the global attention mechanism by mixing images at the token level, enhancing the learning capability for contextual information among image patches and the utilization of unlabeled data. We further incorporate image augmentation and feature augmentation to promote the diversity of augmentation. Moreover, to enhance consistency regularization, we propose a dual-branch framework where each branch applies image and feature augmentation to the input image. We conduct extensive experiments across multiple benchmark datasets, including Pascal VOC 2012, Cityscapes, and COCO. Results suggest that the proposed method outperforms state-of-the-art algorithms with notably observed accuracy improvement, especially under limited fine annotations.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1885-1889"},"PeriodicalIF":3.2,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}