{"title":"FM-RTDETR: Small Object Detection Algorithm Based on Enhanced Feature Fusion With Mamba","authors":"Yuchuan Yang;Jiahui Dai;Yong Wang;Yafei Chen","doi":"10.1109/LSP.2025.3553426","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553426","url":null,"abstract":"Traditional real-time object detection networks deployed in autonomous aerial vehicles (AAVs) struggle to extract features from small objects in complex backgrounds with occlusions and overlapping objects. To address this challenge, we propose FM-RTDETR, a real-time object detection algorithm optimized for small object detection. We redesign the encoder of RT-DETRv2 by integrating the Feature Aggregation and Diffusion Network (FADN), improving the algorithm's ability to capture contextual information. Subsequently, we introduce the Parallel Atrous Mamba Feature Fusion Module (PAMFFM), which combines shallow and deep semantic information to better capture small object features. Furthermore, we propose the Cross-stage Enhanced Feature Fusion Module (CEFFM), merging features for small objects to provide richer and more detailed information. Finally, we propose STIoU Loss, which incorporates a penalty term to adjust the scaling of the loss function, improving detection granularity for small objects. FM-RTDETR achieves AP<inline-formula><tex-math>$_{50}$</tex-math></inline-formula> scores of 54.0% and 56.3% on the VisDrone2019-DET and AI-TOD datasets. Compared with other state-of-the-art methods, our method shows great potential in small object detection from drones.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1570-1574"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengtong Li;Tao Zhuang;Kai Chen;Jia-Xin Zhong;Jing Lu
{"title":"Deep Learning-Based Approach for Identification and Compensation of Nonlinear Distortions in Parametric Array Loudspeakers","authors":"Mengtong Li;Tao Zhuang;Kai Chen;Jia-Xin Zhong;Jing Lu","doi":"10.1109/LSP.2025.3553434","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553434","url":null,"abstract":"Compared to traditional electrodynamic loudspeakers, the parametric array loudspeaker (PAL) offers exceptional directivity for audio applications but suffers from significant nonlinear distortions due to its inherent intricate demodulation process. The Volterra filter-based approaches have been widely used to reduce these distortions, but the effectiveness is limited by its inverse filter's capability. Specifically, its <inline-formula><tex-math>$p$</tex-math></inline-formula>th-order inverse filter can only compensate for nonlinearities up to the <inline-formula><tex-math>$p$</tex-math></inline-formula>th order, while the higher-order nonlinearities it introduces continue to generate lower-order harmonics. In contrast, this paper introduces the modern deep learning methods for the first time to address nonlinear identification and compensation for PAL systems. Specifically, the WaveNet neural network, recognized for its success in audio nonlinear system modeling, is utilized to identify and compensate for distortions in a double sideband amplitude modulation-based PAL system. Experimental measurements from 250 Hz to 8 kHz demonstrate that our proposed approach significantly reduces both total harmonic distortion and intermodulation distortion of audio sound generated by PALs, achieving average reductions to 3.11% and 0.93%, respectively. This performance is notably superior to results obtained using the current state-of-the-art Volterra filter-based methods. Our work opens new possibilities for improving the sound reproduction performance of PALs.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1455-1459"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruijie Yang;Yuanfang Guo;Chao Zhou;Guohao Li;Yunhong Wang
{"title":"Vector Quantization Based Query-Efficient Attack via Direct Preference Optimization","authors":"Ruijie Yang;Yuanfang Guo;Chao Zhou;Guohao Li;Yunhong Wang","doi":"10.1109/LSP.2025.3553791","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553791","url":null,"abstract":"This work studies black-box adversarial attacks against deep neural networks, where the attacker only has access to the query feedback from the target model. The current state-of-the-art (SOTA) query-efficient attacks usually combine transfer-based and query-based methods by utilizing the gradient or initializations of surrogate models. However, these strategies typically incur significant computational costs and require a large number of queries during the attack process. In this paper, we propose a novel query-efficient method for generating black-box adversarial perturbations, named Vector Quantization based Query-efficient Adversarial Perturbation generation (VQQAP). Specifically, we propose a Nucleus Sampling based Discretization Module (NSDM) to create diverse adversarial examples in the discrete latent space. To directly optimize the latent vector, we formulate the optimization problem as a direct preference optimization (DPO) problem, and iteratively solve this problem based on the target model feedback. Experimental evaluations demonstrate the effectiveness and efficiency of our method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1550-1554"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-Branch Network for No-Reference Super-Resolution Image Quality Assessment","authors":"Tong Tang;Fan Yang;Xinyu Lin;Weisheng Li","doi":"10.1109/LSP.2025.3553432","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553432","url":null,"abstract":"No-reference super-resolution image quality assessment (SR-IQA) has become an critical technique for optimizing SR algorithms, the key challenge is how to comprehensively learn visual related features of SR image. Existing methods ignore the context information and feature correlation. To tackle this problem, this letter proposes a dual-branch network for no-reference super-resolution image quality assessment (DBSRNet). First, dual-branch feature extraction module is designed, where residual network and receptive field block net are combined to learn multi-scale local features, stacked vision transformer blocks are utilized to learn global features. Then, correlations between dual-branch features are learned and fused based on self-attention mechanism structure, final predicted score is obtained by adaptive feature pooling strategy. Finally, experimental results show that DBSRNet significantly outperforms State-of-the-Art methods in terms of prediction accuracy on all SR-IQA datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1366-1370"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143726419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AdaMoT: Adaptive Motion-Aware Transformer for Efficient Visual Tracking","authors":"Yongjun Wang;Xiaohui Hao","doi":"10.1109/LSP.2025.3553429","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553429","url":null,"abstract":"Visual object tracking utilizing adaptive computation presents challenges stemming from the complexities of modeling intricate motion patterns and achieving computational efficiency. While recent transformer-based trackers have shown promising results, they struggle to effectively capture varying motion dynamics and often waste computation on less informative regions, leading to degraded performance under fast motion and occlusion. In this letter, we present AdaMoT, an innovative motion-aware transformer framework featuring three lightweight modules that integrate adaptive attention and motion estimation: a Lightweight Adaptive Motion Estimation (LAME) module that guides transformer attention through motion pattern modeling, a Saliency-based Hard Attention Sampling (SHAS) module that reduces computation by 60% through focusing on motion-critical regions, and an Adaptive ViT Attention Head Adjustment (AVAHA) module that dynamically allocates attention heads based on motion complexity. Our framework uniquely integrates motion estimation with transformer attention through a shared feature space, achieving robust tracking with minimal overhead. Comprehensive testing indicate that AdaMoT attains superior performance on various demanding benchmarks (75.1% AO on GOT-10 k, 84.9% AUC on TrackingNet, 72.9% AUC on LaSOT) while maintaining real-time speed (32.1 FPS) with only 4% FLOPs increase.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1450-1454"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Proximal Newton Adaptive Importance Sampler","authors":"Víctor Elvira;Émilie Chouzenoux;O. Deniz Akyildiz","doi":"10.1109/LSP.2025.3553790","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553790","url":null,"abstract":"Adaptive importance sampling (AIS) algorithms are a rising methodology in signal processing, statistics, and machine learning. An effective adaptation of the proposals is key for the success of AIS. Recent works have shown that gradient information about the involved target density can greatly boost performance, but its applicability is restricted to differentiable targets. In this letter, we propose a proximal Newton adaptive importance sampler for the estimation of expectations with respect to non-smooth target distributions. We implement a scaled Newton proximal gradient method to adapt the proposal distributions, enabling efficient and optimized moves even when the target distribution lacks differentiability. We show the good performance of the algorithm in two scenarios: one with convex constraints and another with non-smooth sparse priors.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1545-1549"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modified Price Nonlinear Frequency Modulated Waveform for Improved Performance","authors":"Geon U Kim;Jeong Phill Kim","doi":"10.1109/LSP.2025.3553437","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553437","url":null,"abstract":"A modified Price formula nonlinear frequency curves for pulse compression with ultra-low sidelobe level (SLL) and efficient use of the frequency spectrum is proposed. Although the frequency curve of the original Price formula is known to give a relatively low remaining SLL, this level is not suitable for applications such as meteorological radar, which needs to detect tiny targets in a dense and widespread cluttered environment. This paper introduces new parameters for obtaining an ultra-low SLL. After a parameter study, hybrid optimization based on the genetic algorithm and the Nelder-Mead search was used for finding the optimal solution. In this method, even without amplitude windowing, an SLL lower than <inline-formula><tex-math>$-100$</tex-math></inline-formula> dB could be achieved with time-bandwidth product of 200. In addition, another design with a one-sided occupied spectrum (<inline-formula><tex-math>$ f_{-90text{dB}}$</tex-math></inline-formula>) of only 32.85 MHz and a Doppler tolerance of <inline-formula><tex-math>$-0.24$</tex-math></inline-formula> dB was made.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1406-1410"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SequenceOut: Boosting CNNs by Freezing Layers","authors":"Shitala Prasad;Rakesh Paul;Mayur Kamat","doi":"10.1109/LSP.2025.3553430","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553430","url":null,"abstract":"Convolutional neural networks (CNNs) are a powerful tool for various computer vision tasks, demonstrating exceptional performance in image classification, object detection, and segmentation. However, traditional training methods often require meticulous hyperparameter tuning, architectural adjustments, or the introduction of additional data through techniques such as data augmentation to achieve optimal accuracy. This letter introduces an innovative training strategy that leverages layer freezing to enhance the training process while keeping the model's architecture and hyperparameters unchanged. By selectively and progressively freezing certain hidden layers in the CNN, we prevent the model from reaching a saturation point. This approach effectively reduces the backpropagation parameter space, facilitating more focused and efficient learning in the remaining layers.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1401-1405"},"PeriodicalIF":3.2,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ViLNM: Visual-Language Noise Modeling for Text-to-Image Person Retrieval","authors":"Guolin Xu;Yong Feng;Yanying Chen;Guofan Duan;Mingliang Zhou","doi":"10.1109/LSP.2025.3553424","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553424","url":null,"abstract":"Text-to-image person retrieval (TPR) focuses on finding a specific person based on the textual description, and most methods implicitly assume the training image-text pairs are correctly aligned. In practice, the image-text pairs exist under-correlated or false-correlated due to the low quality of the images and annotation errors. Meanwhile, remarkable similarities between different person identities may lead to a mismatch between text and image. To tackle the two issues, we present a Visual-Language Noise Modeling (ViLNM) method that successfully captures robust cross-modal associations even with noise. Specifically, we design a Noise Token Aware (NTA) module that eliminates the words in the textual description that do not match the image, utilizing the matched words to establish a more reliable association. Besides, to enhance the recognition ability of the model for different person identities, we propose a Joint Inter and Intra-Modal Contrastive Loss (JII) and Local Aggregation (LA) module to increase the feature differences between different person identities. We conduct comprehensive experiments on three public benchmarks, and ViLNM performs best.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1386-1390"},"PeriodicalIF":3.2,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low Complexity MRC Detection of IRS-Aided Single User MIMO-OTFS","authors":"Sapta Girish Neelam","doi":"10.1109/LSP.2025.3553428","DOIUrl":"https://doi.org/10.1109/LSP.2025.3553428","url":null,"abstract":"This paper presents a novel detection strategy for Intelligent Reflecting Surface (IRS)-aided single user Multiple-Input Multiple-Output (MIMO) systems utilizing Orthogonal Time Frequency Space (OTFS) modulation, tailored to operate effectively under hardware constraints such as Carrier Frequency Offset (CFO). The proposed method employs Maximum Ratio Combining (MRC) to enhance signal quality by mitigating multipath fading and inter-antenna interference. A notable feature of this detection strategy is its low computational complexity, which makes it highly practical for real-time applications in dynamic wireless environments. Designed for low computational complexity, this detection scheme significantly improves the performance of IRS-aided MIMO-OTFS systems. Simulation results demonstrate the superior capabilities of the proposed approach, as IRS-aided single-user MIMO-OTFS systems with MRC detection consistently outperform traditional detection methods. These findings highlight the transformative potential of integrating IRS and OTFS to advance the next generation of wireless communication systems.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1396-1400"},"PeriodicalIF":3.2,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}