Xianyun Sun;Caiyong Wang;Yunlong Wang;Jianze Wei;Zhenan Sun
{"title":"IrisFormer: A Dedicated Transformer Framework for Iris Recognition","authors":"Xianyun Sun;Caiyong Wang;Yunlong Wang;Jianze Wei;Zhenan Sun","doi":"10.1109/LSP.2024.3522856","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522856","url":null,"abstract":"While Vision Transformer (ViT)-based methods have significantly improved the performance of various vision tasks in natural scenes, progress in iris recognition remains limited. In addition, the human iris contains unique characters that are distinct from natural scenes. To remedy this, this paper investigates a dedicated Transformer framework, termed IrisFormer, for iris recognition and attempts to improve the accuracy by combining the contextual modeling ability of ViT and iris-specific optimization to learn robust, fine-grained, and discriminative features. Specifically, to achieve rotation invariance in iris recognition, we employ relative position encoding instead of regular absolute position encoding for each iris image token, and a horizontal pixel-shifting strategy is utilized during training for data augmentation. Then, to enhance the model's robustness against local distortions such as occlusions and reflections, we randomly mask some tokens during training to force the model to learn representative identity features from only part of the image. Finally, considering that fine-grained features are more discriminative in iris recognition, we retain the entire token sequence for patch-wise feature matching instead of using the standard single classification token. Experiments on three popular datasets demonstrate that the proposed framework achieves competitive performance under both intra- and inter-dataset testing protocols.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"431-435"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Man Wang;Zheng Shi;Yunfei Li;Xianda Wu;Weiqiang Tan
{"title":"Synchronous and Asynchronous HARQ-CC Assisted SCMA Schemes","authors":"Man Wang;Zheng Shi;Yunfei Li;Xianda Wu;Weiqiang Tan","doi":"10.1109/LSP.2024.3523227","DOIUrl":"https://doi.org/10.1109/LSP.2024.3523227","url":null,"abstract":"This letter proposes a novel hybrid automatic repeat request with chase combining assisted sparse code multiple access (HARQ-CC-SCMA) scheme. Depending on whether the same superimposed packet is retransmitted, synchronous and asynchronous modes are considered for retransmissions. Moreover, a factor graph aggregation (FGA) method is used for multi-user detection. Specifically, a large-scale factor graph is constructed by combining all the received superimposed signals and message passing algorithm (MPA) is applied to calculate log-likelihood ratio (LLR). Monte Carlo simulations are preformed to show that FGA surpasses bit-level combining (BLC) and HARQ with incremental redundancy (HARQ-IR) in synchronous mode. Moreover, FGA performs better than BLC at high signal-to-noise ratio (SNR) region in asynchronous mode. However, FGA in asynchronous mode is worse than BLC at low SNR, because significant error propagation is induced by the presence of failed messages after the maximum allowable HARQ rounds.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"506-510"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Dynamic Distractor-Repressed Correlation Filter for Real-Time UAV Tracking","authors":"Zhi Chen;Lijun Liu;Zhen Yu","doi":"10.1109/LSP.2024.3522850","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522850","url":null,"abstract":"With high-efficiency computing advantages and desirable tracking accuracy, discriminative correlation filters (DCFs) have been widely utilized in UAV tracking, leading to substantial progress. However, in some intricate scenarios (e.g., similar objects or backgrounds, background clutter), DCF-based trackers are prone to generating low-reliability response maps influenced by surrounding response distractors, thereby reducing tracking robustness. Furthermore, the limited computational resources and endurance on UAV platforms drive DCF-based trackers to exhibit real-time and reliable tracking performance. To address the aforementioned issues, a dynamic distractor-repressed correlation filter (DDRCF) is proposed. First, a dynamic distractor-repressed regularization is introduced into the DCF framework. Then, a new objective function is formulated to tune the penalty intensity of the distractor-repressed regularization module. Furthermore, a novel response map variation evaluation mechanism is used to dynamically tune the distractor-repressed regularization coefficient to adapt to omnipresent appearance variations. Considerable and exhaustive experiments on four prevailing UAV benchmarks, i.e., UAV123@10fps, UAVTrack112, DTB70 and UAVDT, validate that the proposed DDRCF tracker is superior to other state-of-the-art trackers. Moreover, the proposed method can achieve a tracking speed of 59 FPS on a CPU, meeting the requirements of real-time aerial tracking.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"616-620"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infrared Small Target Detection via Local-Global Feature Fusion","authors":"Lang Wu;Yong Ma;Fan Fan;Jun Huang","doi":"10.1109/LSP.2024.3523226","DOIUrl":"https://doi.org/10.1109/LSP.2024.3523226","url":null,"abstract":"Due to the high-luminance (HL) background clutter in infrared (IR) images, the existing IR small target detection methods struggle to achieve a good balance between efficiency and performance. Addressing the issue of HL clutter, which is difficult to suppress, leading to a high false alarm rate, this letter proposes an IR small target detection method based on local-global feature fusion (LGFF). We develop a fast and efficient local feature extraction operator and utilize global rarity to characterize the global feature of small targets, effectively suppressing a significant amount of HL clutter. By integrating local and global features, we achieve further enhancement of the targets and robust suppression of the clutter. Experimental results demonstrate that the proposed method outperforms existing methods in terms of target enhancement, clutter removal, and real-time performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"466-470"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chengzhong Wang;Jianjun Gu;Dingding Yao;Junfeng Li;Yonghong Yan
{"title":"GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement","authors":"Chengzhong Wang;Jianjun Gu;Dingding Yao;Junfeng Li;Yonghong Yan","doi":"10.1109/LSP.2024.3522852","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522852","url":null,"abstract":"Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion models have gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the distribution of the signal with isotropic Gaussian noise and recover clean speech distribution from the prior. However, these methods often suffer from a substantial computational burden. We argue that the computational inefficiency partially stems from the oversight that speech enhancement is not purely a generative task; it primarily involves noise reduction and completion of missing information, while the clean clues in the original mixture do not need to be regenerated. In this paper, we propose a method that introduces noise with anisotropic guidance during the diffusion process, allowing the neural network to preserve clean clues within noisy recordings. This approach substantially reduces computational complexity while exhibiting robustness against various forms of noise and speech distortion. Experiments demonstrate that the proposed method achieves state-of-the-art results with only approximately 4.5 million parameters, a number significantly lower than that required by other diffusion methods. This effectively narrows the model size disparity between diffusion-based and predictive speech enhancement approaches. Additionally, the proposed method performs well in very noisy scenarios, demonstrating its potential for applications in highly challenging environments.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"426-430"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing No-Reference Audio-Visual Quality Assessment via Joint Cross-Attention Fusion","authors":"Zhaolin Wan;Xiguang Hao;Xiaopeng Fan;Wangmeng Zuo;Debin Zhao","doi":"10.1109/LSP.2024.3522855","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522855","url":null,"abstract":"As the consumption of multimedia content continues to rise, audio and video have become central to everyday entertainment and social interactions. This growing reliance amplifies the demand for effective and objective audio-visual quality assessment (AVQA) to understand the interaction between audio and visual elements, ultimately enhancing user satisfaction. However, existing state-of-the-art AVQA methods often rely on simplistic machine learning models or fully connected networks for audio-visual signal fusion, which limits their ability to exploit the complementary nature of these modalities. In response to this gap, we propose a novel no-reference AVQA method that utilizes joint cross-attention fusion of audio-visual perception. Our approach begins with a dual-stream feature extraction process that simultaneously captures long-range spatiotemporal visual features and audio features. The fusion model then dynamically adjusts the contributions of features from both modalities, effectively integrating them to provide a more comprehensive perception for quality score prediction. Experimental results on the LIVE-SJTU and UnB-AVC datasets demonstrate that our model outperforms state-of-the-art methods, achieving superior performance in audio-visual quality assessment.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"556-560"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinze Liu;Xiaojun Yang;Jiale Zhang;Jing Wang;Feiping Nie
{"title":"Outlier Indicator Based Projection Fuzzy K-Means Clustering for Hyperspectral Image","authors":"Xinze Liu;Xiaojun Yang;Jiale Zhang;Jing Wang;Feiping Nie","doi":"10.1109/LSP.2024.3521714","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521714","url":null,"abstract":"The application of hyperspectral image (HSI) clustering has become widely used in the field of remote sensing. Traditional fuzzy K-means clustering methods often struggle with HSI data due to the significant levels of noise, consequently resulting in segmentation inaccuracies. To address this limitation, this letter introduces an innovative outlier indicator-based projection fuzzy K-means clustering (OIPFK) algorithm for clustering of HSI data, enhancing the efficacy and robustness of previous fuzzy K-means methodologies through a two-pronged strategy. Initially, an outlier indicator vector is constructed to identify noise and outliers by computing the distances between each data point in a reduced dimensional space. Subsequently, the OIPFK algorithm incorporates the fuzzy membership relationships between samples and clustering centers within this lower-dimensional framework, along with the integration of the outlier indicator vectors, to significantly mitigates the influence of noise and extraneous features. Moreover, an efficient iterative optimization algorithm is employed to address the optimization challenges inherent to OIPKM. Experimental results from three real-world hyperspectral image datasets demonstrate the effectiveness and superiority of our proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"496-500"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interbeat Interval Filtering","authors":"İlker Bayram","doi":"10.1109/LSP.2024.3522853","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522853","url":null,"abstract":"Several inhibitory and excitatory factors regulate the beating of the heart. Consequently, the interbeat intervals (IBIs) vary around a mean value. Various statistics have been proposed to capture heart rate variability (HRV) to give a glimpse into this balance. However, these statistics require accurate estimation of IBIs as a first step, which can be challenging especially for signals recorded in ambulatory conditions. We propose a lightweight state-space filter that models the IBIs as samples of an inverse Gaussian distribution with time-varying parameters. We make the filter robust against outliers by adapting the probabilistic data association filter to the setup. We demonstrate that the resulting filter can accurately identify outliers and the parameters of the tracked distribution can be used to compute a specific HRV statistic (standard deviation of normal-to-normal intervals, SDNN) without further analysis.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"481-485"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyunduk Kim;Sang-Heon Lee;Myoung-Kyu Sohn;Jungkwang Kim;Hyeyoung Park
{"title":"STSPhys: Enhanced Remote Heart Rate Measurement With Spatial-Temporal SwiftFormer","authors":"Hyunduk Kim;Sang-Heon Lee;Myoung-Kyu Sohn;Jungkwang Kim;Hyeyoung Park","doi":"10.1109/LSP.2024.3522854","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522854","url":null,"abstract":"Estimating heart activities and physiological signals from facial video without any contact, known as remote photoplethysmography and remote heart rate estimation, holds significant potential for numerous applications. In this letter, we present a novel approach for remote heart rate measurement leveraging a Spatial-Temporal SwiftFormer architecture (STSPhys). Our model addresses the limitations of existing methods that rely heavily on 3D CNNs or 3D visual transformers, which often suffer from increased parameters and potential instability during training. By integrating both spatial and temporal information from facial video data, STSPhys achieves robust and accurate heart rate estimation. Additionally, we introduce a hybrid loss function that integrates constraints from both the time and frequency domains, further enhancing the model's accuracy. Experimental results demonstrate that STSPhys significantly outperforms existing state-of-the-art methods on intra-dataset and cross-dataset tests, achieving superior performance with fewer parameters and lower computational complexity.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"521-525"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Surveillance Video Compression With Background Hyperprior","authors":"Yu Zhao;Song Tang;Mao Ye","doi":"10.1109/LSP.2024.3521663","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521663","url":null,"abstract":"Neural surveillance video compression methods have demonstrated significant improvements over traditional video compression techniques. In current surveillance video compression frameworks, the first frame in a Group of Pictures (GOP) is usually compressed fully as an I frame, and the subsequent P frames are compressed by referencing this I frame at Low Delay P (LDP) encoding mode. However, this compression approach overlooks the utilization of background information, which limits its adaptability to different scenarios. In this paper, we propose a novel Adaptive Surveillance Video Compression framework based on background hyperprior, dubbed as ASVC. This background hyperprior is related with side information to assist in coding both the temporal and spatial domains. Our method mainly consists of two components. First, the background information from a GOP is extracted, modeled as hyperprior and is compressed by exiting methods. Then these hyperprior is used as side information to compress both I frames and P frames. ASVC effectively captures the temporal dependencies in the latent representations of surveillance videos by leveraging background hyperprior for auxiliary video encoding. The experimental results demonstrate that applying ASVC to traditional and learning based methods significantly improves performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"456-460"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}