Min Shi;Chengkun Zheng;Qingming Yi;Jian Weng;Aiwen Luo
{"title":"Knowledge Distillation in Fourier Frequency Domain for Dense Prediction","authors":"Min Shi;Chengkun Zheng;Qingming Yi;Jian Weng;Aiwen Luo","doi":"10.1109/LSP.2024.3515795","DOIUrl":"https://doi.org/10.1109/LSP.2024.3515795","url":null,"abstract":"Knowledge distillation has been widely used to enhance student network performance for dense prediction tasks. Most previous knowledge distillation methods focus on valuable regions of the feature map in the spatial domain, ignoring the semantic information in the frequency domain. This work explores effective information representation of feature maps in the frequency domain and proposes a novel distillation method in the Fourier domain. This approach enhances the student's amplitude representation and transmits both original feature knowledge and global pixel relations. Experiments on object detection and semantic segmentation tasks, including both homogeneous distillation and heterogeneous distillation, demonstrate the significant improvement for the student network. For instance, the ResNet50-RepPoints detector and ResNet18-PspNet segmenter achieve 4.2% AP and 5.01% mIoU improvements on COCO2017 and CityScapes datasets, respectively.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"296-300"},"PeriodicalIF":3.2,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noise-Robust Hearing Aid Voice Control","authors":"Iván López-Espejo;Eros Roselló;Amin Edraki;Naomi Harte;Jesper Jensen","doi":"10.1109/LSP.2024.3512377","DOIUrl":"https://doi.org/10.1109/LSP.2024.3512377","url":null,"abstract":"Advancing the design of robust hearing aid (HA) voice control is crucial to increase the HA use rate among hard of hearing people as well as to improve HA users' experience. In this work, we contribute towards this goal by, first, presenting a novel HA speech dataset consisting of noisy own voice captured by 2 behind-the-ear (BTE) and 1 in-ear-canal (IEC) microphones. Second, we provide baseline HA voice control results from the evaluation of light, state-of-the-art keyword spotting models utilizing different combinations of HA microphone signals. Experimental results show the benefits of exploiting bandwidth-limited bone-conducted speech (BCS) from the IEC microphone to achieve noise-robust HA voice control. Furthermore, results also demonstrate that voice control performance can be boosted by assisting BCS by the broader-bandwidth BTE microphone signals. Aiming at setting a baseline upon which the scientific community can continue to progress, the HA noisy speech dataset has been made publicly available.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"241-245"},"PeriodicalIF":3.2,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10783154","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BiASAM: Bidirectional-Attention Guided Segment Anything Model for Very Few-Shot Medical Image Segmentation","authors":"Wei Zhou;Guilin Guan;Wei Cui;Yugen Yi","doi":"10.1109/LSP.2024.3513240","DOIUrl":"https://doi.org/10.1109/LSP.2024.3513240","url":null,"abstract":"The Segment Anything Model (SAM) excels in general segmentation but encounters difficulties in medical imaging due to few-shot learning challenges, particularly with extremely limited annotated data. Existing approaches often suffer from insufficient feature extraction and inadequate loss function balancing, resulting in decreased accuracy and poor generalization. To address these issues, we propose BiASAM, which uniquely incorporates two bidirectional attention mechanisms into SAM for medical image segmentation. Firstly, BiASAM integrates a spatial-frequency attention module to improve feature extraction, enhancing the model's ability to capture both fine and coarse details. Secondly, we employ an attention-based gradient update mechanism that dynamically adjusts loss weights, boosting the model's learning efficiency and adaptability in data-scarce scenarios. Additionally, BiASAM utilizes the point and box fusion prompt to enhance segmentation precision at both global and local levels. Experiments across various medical datasets show BiASAM achieves performance comparable to fully supervised methods with just two labeled samples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"246-250"},"PeriodicalIF":3.2,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DiffHSR: Unleashing Diffusion Priors in Hyperspectral Image Super-Resolution","authors":"Yizhen Jia;Yumeng Xie;Ping An;Zhen Tian;Xia Hua","doi":"10.1109/LSP.2024.3512371","DOIUrl":"https://doi.org/10.1109/LSP.2024.3512371","url":null,"abstract":"Hyperspectral images provide rich spectral information and have been widely applied in numerous computer vision tasks. However, their low spatial resolution often limits their use in applications such as image segmentation and recognition. In previous works, generating high-resolution hyperspectral (HR-HS) images required the use of low-resolution hyperspectral (LR-HS) images and high-resolution RGB (HR-RGB) images as priors, which increases the cost of data collection and may lead to measurement and calibration errors in practical applications. Although the currently popular CNN-based single hyperspectral image super-resolution (single HS-SR) methods have improved performance, they are not flexible enough to process images with different degradation. From a visual perspective, the generated super-resolution images exhibit a significant smudging effect due to the loss of information. Leveraging multi-modal techniques and generative prior, we propose DiffHSR that marks a significant leap in LR-HS images super-restoration without HR-RGB. Additionally, we have established a connection between hyperspectral images and the RGB image-based generative model tasks using low-cost data and fine-tuning approaches, which creates a novel paradigm. Comprehensive experiments have demonstrated that our proposed method achieves strong visual performance and competitive results in term of quantitative metrics and perceptive quality.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"236-240"},"PeriodicalIF":3.2,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Theoretical Guarantees for Sparse Graph Signal Recovery","authors":"Gal Morgenstern;Tirza Routtenberg","doi":"10.1109/LSP.2024.3514800","DOIUrl":"https://doi.org/10.1109/LSP.2024.3514800","url":null,"abstract":"Sparse graph signals have recently been utilized in graph signal processing (GSP) for tasks such as graph signal reconstruction, blind deconvolution, and sampling. In addition, sparse graph signals can be used to model real-world network applications across various domains, such as social, biological, and power systems. Despite the extensive use of sparse graph signals, limited attention has been paid to the derivation of theoretical guarantees on their recovery. In this paper, we present a novel theoretical analysis of the problem of recovering a node-domain sparse graph signal from the output of a first-order graph filter. The graph filter we study is the Laplacian matrix, and we derive upper and lower bounds on its mutual coherence. Our results establish a connection between the recovery performance and the minimal graph nodal degree. The proposed bounds are evaluated via simulations on the Erdős-Rényi graph.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"266-270"},"PeriodicalIF":3.2,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142875011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Ma;Hongtao Duan;Ruihe Ma;Yongjin Xian;Xiaolong Li
{"title":"High-Performance Optimization Framework for Reversible Data Hiding Predictor","authors":"Bin Ma;Hongtao Duan;Ruihe Ma;Yongjin Xian;Xiaolong Li","doi":"10.1109/LSP.2024.3512357","DOIUrl":"https://doi.org/10.1109/LSP.2024.3512357","url":null,"abstract":"Existing deep learning-based reversible data hiding (RDH) predictors are affected by the difference of pixel complexity, which leads to the reduction of prediction accuracy. Therefore, this letter proposes an optimization framework tailored for RDH predictors, which integrates the local complexity of pixels into the predictor's regression optimization process. By analyzing the image's texture features, the framework adaptively determines the optimal prediction coefficients, thereby improving prediction accuracy. Notably, this optimization framework is versatile and can be applied to optimize other deep learning-based RDH predictors. Additionally, recognizing the critical role of interpolation strategies in RDH pixel prediction, we introduce a multi-scale fusion-enhanced interpolation network specifically designed for RDH, which integrates features across different scales to provide accurate reference pixels for subsequent predictions. Finally, experimental results demonstrate that the proposed method outperforms several advanced RDH predictors in terms of both prediction accuracy and embedding performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"231-235"},"PeriodicalIF":3.2,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huang Xie;Khazar Khorrami;Okko Räsänen;Tuomas Virtanen
{"title":"Text-Based Audio Retrieval by Learning From Similarities Between Audio Captions","authors":"Huang Xie;Khazar Khorrami;Okko Räsänen;Tuomas Virtanen","doi":"10.1109/LSP.2024.3511414","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511414","url":null,"abstract":"This letter proposes to use similarities of audio captions for estimating audio-caption relevances to be used for training text-based audio retrieval systems. Current audio-caption datasets (e.g., Clotho) contain audio samples paired with annotated captions, but lack relevance information about audio samples and captions beyond the annotated ones. Besides, mainstream approaches (e.g., CLAP) usually treat the annotated pairs as positives and consider all other audio-caption combinations as negatives, assuming a binary relevance between audio samples and captions. To infer the relevance between audio samples and arbitrary captions, we propose a method that computes non-binary audio-caption relevance scores based on the textual similarities of audio captions. We measure textual similarities of audio captions by calculating the cosine similarity of their Sentence-BERT embeddings and then transform these similarities into audio-caption relevance scores using a logistic function, thereby linking audio samples through their annotated captions to all other captions in the dataset. To integrate the computed relevances into training, we employ a listwise ranking objective, where relevance scores are converted into probabilities of ranking audio samples for a given textual query. We show the effectiveness of the proposed method by demonstrating improvements in text-based audio retrieval compared to methods that use binary audio-caption relevances for training.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"221-225"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Liu;Haowen Hou;Fei Ma;Shiguang Ni;Fei Richard Yu
{"title":"MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding","authors":"Yi Liu;Haowen Hou;Fei Ma;Shiguang Ni;Fei Richard Yu","doi":"10.1109/LSP.2024.3511426","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511426","url":null,"abstract":"In untrimmed video tasks, identifying temporal boundaries in videos is crucial for temporal video grounding. With the emergence of multimodal large language models (MLLMs), recent studies have focused on endowing these models with the capability of temporal perception in untrimmed videos. To address the challenge, in this paper, we introduce a multimodal large language model named MLLM-TA with precise temporal perception to obtain temporal attention. Unlike the traditional MLLMs, answering temporal questions through one or two words related to temporal information, we leverage the text description proficiency of MLLMs to acquire video temporal attention with description. Specifically, we design a dual temporal-aware generative branches aimed at the visual space of the entire video and the textual space of global descriptions, simultaneously generating mutually supervised consistent temporal attention, thereby enhancing the video temporal perception capabilities of MLLMs. Finally, we evaluate our approach on both video grounding task and highlight detection task on three popular benchmarks, including Charades-STA, ActivityNet Captions and QVHighlights. The extensive results show that our MLLM-TA significantly outperforms previous approaches both on zero-shot and supervised setting, achieving state-of-the-art performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"281-285"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beyond Diagonal RIS: Key to Next-Generation Integrated Sensing and Communications?","authors":"Tara Esmaeilbeig;Kumar Vijay Mishra;Mojtaba Soltanalian","doi":"10.1109/LSP.2024.3511395","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511395","url":null,"abstract":"Reconfigurable intelligent surfaces (RIS) offer unprecedented flexibility for smart wireless channels. Recent research shows that RIS platforms enhance signal quality, coverage, and link capacity in integrated sensing and communication (ISAC) systems. This paper explores the use of fully-connected beyond diagonal RIS (BD-RIS) in ISAC. BD-RIS provides additional degrees of freedom by allowing non-zero off-diagonal elements in the scattering matrix, enhancing functionality and performance. We aim to maximize the weighted sum of the signal-to-noise ratio (SNR) at both the radar receiver and communication users using BD-RIS. Numerical results demonstrate the advantages of BD-RIS in ISAC, significantly improving SNR for both radar and communication users.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"216-220"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10777522","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FAMSeC: A Few-Shot-Sample-Based General AI-Generated Image Detection Method","authors":"Juncong Xu;Yang Yang;Han Fang;Honggu Liu;Weiming Zhang","doi":"10.1109/LSP.2024.3511421","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511421","url":null,"abstract":"The explosive growth of generative AI has saturated the internet with AI-generated images, raising security concerns and increasing the need for reliable detection methods. The primary requirement for such detection is generalizability, typically achieved by training on numerous fake images from various models. However, practical limitations, such as closed-source models and restricted access, often result in limited training samples. Therefore, training a general detector with few-shot samples is essential for modern detection mechanisms. To address this challenge, we propose FAMSeC, a general AI-generated image detection method based on LoRA-based \u0000<bold>F</b>\u0000orgery \u0000<bold>A</b>\u0000wareness \u0000<bold>M</b>\u0000odule and \u0000<bold>Se</b>\u0000mantic feature-guided \u0000<bold>C</b>\u0000ontrastive learning strategy. To effectively learn from limited samples and prevent overfitting, we developed a forgery awareness module (FAM) based on LoRA, maintaining the generalization of pre-trained features. Additionally, to cooperate with FAM, we designed a semantic feature-guided contrastive learning strategy (SeC), making the FAM focus more on the differences between real/fake image than on the features of the samples themselves. Experiments show that FAMSeC outperforms state-of-the-art method, enhancing classification accuracy by 14.55% with just 0.56% of the training samples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"226-230"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}