IEEE Signal Processing Letters最新文献

筛选
英文 中文
KFA: Keyword Feature Augmentation for Open Set Keyword Spotting KFA:用于发现开放集关键词的关键词特征增强技术
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-22 DOI: 10.1109/LSP.2024.3484932
Kyungdeuk Ko;Bokyeung Lee;Jonghwan Hong;Hanseok Ko
{"title":"KFA: Keyword Feature Augmentation for Open Set Keyword Spotting","authors":"Kyungdeuk Ko;Bokyeung Lee;Jonghwan Hong;Hanseok Ko","doi":"10.1109/LSP.2024.3484932","DOIUrl":"https://doi.org/10.1109/LSP.2024.3484932","url":null,"abstract":"In recent years, with the advancement of deep learning technology and the emergence of smart devices, there has been a growing interest in keyword spotting (KWS), which is used to activate AI systems with automatic speech recognition and text-to-speech. However, smart devices with KWS often encounter false alarm errors when inputting unexpected words. To address this issue, existing KWS methods typically train non-target words as an \u0000<italic>unknown</i>\u0000 class. Despite these efforts, there is still a possibility that unseen words not trained as part of the \u0000<italic>unknown</i>\u0000 class could be misclassified as one of the target words. To overcome this limitation, we propose a new method named Keyword Feature Augmentation (KFA) for open-set KWS. KFA performs feature augmentation through adversarial learning to increase the loss. The augmented features are constrained within a limited space using label smoothing. Unlike other generative model-based open set recognition (OSR) methods, KFA does not require any additional training parameters or repeated operation for inference. As a result, KFA has achieved a 0.955 AUROC score and 97.34% target class accuracy for Google Speech Commands V1, and a 0.959 AUROC score and 98.17% target class accuracy for Google Speech Commands V2, which is the highest performance when compared to various OSR methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning 音频曼巴用于音频表征学习的双向状态空间模型
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483009
Mehmet Hamza Erol;Arda Senocak;Jiu Feng;Joon Son Chung
{"title":"Audio Mamba: Bidirectional State Space Model for Audio Representation Learning","authors":"Mehmet Hamza Erol;Arda Senocak;Jiu Feng;Joon Son Chung","doi":"10.1109/LSP.2024.3483009","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483009","url":null,"abstract":"Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision tasks in this regard. In this study, we explore whether reliance on self-attention is necessary for audio classification tasks. By introducing Audio Mamba (AuM), the first self-attention-free, purely SSM-based model for audio classification, we aim to address this question. We evaluate AuM on various audio datasets - comprising six different benchmarks - where it achieves comparable or better performance compared to well-established AST model.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
System-Informed Neural Network for Frequency Detection 用于频率检测的系统信息神经网络
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483036
Sunyoung Ko;Myoungin Shin;Geunhwan Kim;Youngmin Choo
{"title":"System-Informed Neural Network for Frequency Detection","authors":"Sunyoung Ko;Myoungin Shin;Geunhwan Kim;Youngmin Choo","doi":"10.1109/LSP.2024.3483036","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483036","url":null,"abstract":"We contrive a deep learning-based frequency analysis scheme called system-informed neural network (SINN) by considering the corresponding linear system model. SINN adopts the adaptive learned iterative soft shrinkage algorithm as the NN architecture and includes the system model in loss function. It has good generalization with fast processing time and finds a solution that satisfies the system model as a physics-informed neural network. To further improve SINN, multiple measurements are exploited by assuming the existence of common frequency components over the measurements. SINN is examined using simulated acoustic data, and the performance is compared to Fourier transform and sparse Bayesian learning (SBL) in terms of the detection/false alarm rate and mean squared error. SINN exhibits clear frequency components in in-situ data tests, as in SBL, by reducing noise effectively. Finally, SINN is applied to noisy passive sonar signals, which include 43 frequency components, and many are recovered.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RFI-Aware and Low-Cost Maximum Likelihood Imaging for High-Sensitivity Radio Telescopes 用于高灵敏度射电望远镜的射频干扰感知和低成本最大似然成像技术
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483011
J. Wang;M. N. El Korso;L. Bacharach;P. Larzabal
{"title":"RFI-Aware and Low-Cost Maximum Likelihood Imaging for High-Sensitivity Radio Telescopes","authors":"J. Wang;M. N. El Korso;L. Bacharach;P. Larzabal","doi":"10.1109/LSP.2024.3483011","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483011","url":null,"abstract":"This paper addresses the challenge of interference mitigation and reduction of computational cost in the context of radio interferometric imaging. We propose a novel maximum-likelihood-based methodology based on the antenna sub-array switching technique, which strikes a refined balance between imaging accuracy and computational efficiency. In addition, we tackle robustness regarding radio interference by modeling the additive noise as t-distributed. Through simulation results, we demonstrate the superiority of the t-distributed noise model over the conventional Gaussian noise model in scenarios involving interferences. We evidence that our proposed switching approach yields similar imaging performances with far fewer visibilities compared to the full array configuration, thus, diminishing the computational complexity.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Order Estimation of Linear-Phase FIR Filters for DAC Equalization in Multiple Nyquist Bands 用于多奈奎斯特频带 DAC 均衡的线性相位 FIR 滤波器的阶次估计
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483008
Deijany Rodriguez Linares;Håkan Johansson;Yinan Wang
{"title":"Order Estimation of Linear-Phase FIR Filters for DAC Equalization in Multiple Nyquist Bands","authors":"Deijany Rodriguez Linares;Håkan Johansson;Yinan Wang","doi":"10.1109/LSP.2024.3483008","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483008","url":null,"abstract":"This letter considers the design and properties of linear-phase finite-length impulse response (FIR) filters for equalization of the frequency responses of digital-to-analog converters (DACs). The letter derives estimates for the filter orders required, as functions of the bandwidth and equalization accuracy, for four DAC pulses that are used in DACs in multiple Nyquist bands. The estimates are derived through a large set of minimax-optimal equalizers and the use of symbolic regression followed by minimax-optimal curve fitting for further enhancement. Design examples demonstrate the accuracy of the proposed estimates. In addition, the letter discusses the appropriateness of the four types of linear-phase FIR filters, for the different equalizer cases, as well as the corresponding properties of the equalized systems.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Noise Adapters for Incremental Speech Enhancement 学习噪声适配器以增强语音效果
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-16 DOI: 10.1109/LSP.2024.3482171
Ziye Yang;Xiang Song;Jie Chen;Cédric Richard;Israel Cohen
{"title":"Learning Noise Adapters for Incremental Speech Enhancement","authors":"Ziye Yang;Xiang Song;Jie Chen;Cédric Richard;Israel Cohen","doi":"10.1109/LSP.2024.3482171","DOIUrl":"https://doi.org/10.1109/LSP.2024.3482171","url":null,"abstract":"Incremental speech enhancement (ISE), with the ability to incrementally adapt to new noise domains, represents a critical yet comparatively under-investigated topic. While the regularization-based method has been proposed to solve the ISE task, it usually suffers from the dilemma wherein the gain of one domain directly entails the loss of another. To solve this issue, we propose an effective paradigm, termed Learning Noise Adapters (LNA), which significantly mitigates the catastrophic domain forgetting phenomenon in the ISE task. In our methodology, we employ a frozen pre-trained model to train and retain a domain-specific adapter for each newly encountered domain, enabling the capture of variations in feature distributions within these domains. Subsequently, our approach involves the development of an unsupervised, training-free noise selector for the inference stage, which is responsible for identifying the domains of test speech samples. A comprehensive experimental validation has substantiated the effectiveness of our approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Entropy and Quantized Metric Models for Absolute Category Ratings 绝对类别评级的最大熵和量化度量模型
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-15 DOI: 10.1109/LSP.2024.3480832
Dietmar Saupe;Krzysztof Rusek;David Hägele;Daniel Weiskopf;Lucjan Janowski
{"title":"Maximum Entropy and Quantized Metric Models for Absolute Category Ratings","authors":"Dietmar Saupe;Krzysztof Rusek;David Hägele;Daniel Weiskopf;Lucjan Janowski","doi":"10.1109/LSP.2024.3480832","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480832","url":null,"abstract":"The datasets of most image quality assessment studies contain ratings on a categorical scale with five levels, from bad (1) to excellent (5). For each stimulus, the number of ratings from 1 to 5 is summarized and given in the form of the mean opinion score. In this study, we investigate families of multinomial probability distributions parameterized by mean and variance that are used to fit the empirical rating distributions. To this end, we consider quantized metric models based on continuous distributions that model perceived stimulus quality on a latent scale. The probabilities for the rating categories are determined by quantizing the corresponding random variables using threshold values. Furthermore, we introduce a novel discrete maximum entropy distribution for a given mean and variance. We compare the performance of these models and the state of the art given by the generalized score distribution for two large data sets, KonIQ-10k and VQEG HDTV. Given an input distribution of ratings, our fitted two-parameter models predict unseen ratings better than the empirical distribution. In contrast to empirical distributions of absolute category ratings and their discrete models, our continuous models can provide fine-grained estimates of quantiles of quality of experience that are relevant to service providers to satisfy a certain fraction of the user population.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pose-Promote: Progressive Visual Perception for Activities of Daily Living 姿势-促进:日常生活活动的渐进式视觉感知
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-14 DOI: 10.1109/LSP.2024.3480046
Qilang Ye;Zitong Yu
{"title":"Pose-Promote: Progressive Visual Perception for Activities of Daily Living","authors":"Qilang Ye;Zitong Yu","doi":"10.1109/LSP.2024.3480046","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480046","url":null,"abstract":"Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Multidimensional Spatial Attention for Robust Nighttime Visual Tracking 学习多维空间注意力,实现稳健的夜间视觉跟踪
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-14 DOI: 10.1109/LSP.2024.3480831
Qi Gao;Mingfeng Yin;Yuanzhi Ni;Yuming Bo;Shaoyi Bei
{"title":"Learning Multidimensional Spatial Attention for Robust Nighttime Visual Tracking","authors":"Qi Gao;Mingfeng Yin;Yuanzhi Ni;Yuming Bo;Shaoyi Bei","doi":"10.1109/LSP.2024.3480831","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480831","url":null,"abstract":"The recent development of advanced trackers, which use nighttime image enhancement technology, has led to marked advances in the performance of visual tracking at night. However, the images recovered by currently available enhancement methods still have some weaknesses, such as blurred target details and obvious image noise. To this end, we propose a novel method for learning multidimensional spatial attention for robust nighttime visual tracking, which is developed over a spatial channel transformer based low light enhancer (SCT), named MSA-SCT. First, a novel multidimensional spatial attention (MSA) is designed. Additional reliable feature responses are generated by aggregating channel and multi-scale spatial information, thus making the model more adaptable to illumination conditions and noise levels in different regions of the image. Second, with optimized skip connections, the effects of redundant information and noise can be limited, which is more useful for the propagation of fine detail features in nighttime images from low to high level features and improves the enhancement effect. Finally, the tracker with enhancers was tested on multiple tracking benchmarks to fully demonstrate the effectiveness and superiority of MSA-SCT.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Recurrent Spatio-Temporal Graph Neural Network Based on Latent Time Graph for Multi-Channel Time Series Forecasting 基于潜在时间图的循环时空图神经网络用于多通道时间序列预测
IF 3.2 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2024-10-14 DOI: 10.1109/LSP.2024.3479917
Linzhi Li;Xiaofeng Zhou;Guoliang Hu;Shuai Li;Dongni Jia
{"title":"A Recurrent Spatio-Temporal Graph Neural Network Based on Latent Time Graph for Multi-Channel Time Series Forecasting","authors":"Linzhi Li;Xiaofeng Zhou;Guoliang Hu;Shuai Li;Dongni Jia","doi":"10.1109/LSP.2024.3479917","DOIUrl":"https://doi.org/10.1109/LSP.2024.3479917","url":null,"abstract":"With the advancement of technology, the field of multi-channel time series forecasting has emerged as a focal point of research. In this context, spatio-temporal graph neural networks have attracted significant interest due to their outstanding performance. An established approach involves integrating graph convolutional networks into recurrent neural networks. However, this approach faces difficulties in capturing dynamic spatial correlations and discerning the correlation of multi-channel time series signals. Another major problem is that the discrete time interval of recurrent neural networks limits the accuracy of spatio-temporal prediction. To address these challenges, we propose a continuous spatio-temporal framework, termed Recurrent Spatio-Temporal Graph Neural Network based on Latent Time Graph (RST-LTG). RST-LTG incorporates adaptive graph convolution networks with a time embedding generator to construct a latent time graph, which subtly captures evolving spatial characteristics by aggregating spatial information across multiple time steps. Additionally, to improve the accuracy of continuous time modeling, we introduce a gate enhanced neural ordinary differential equation that effectively integrates information across multiple scales. Empirical results on four publicly available datasets demonstrate that the RST-LTG model outperforms 19 competing methods in terms of accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142452677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信