IEEE Journal of Selected Topics in Signal Processing最新文献

筛选
英文 中文
DARIO: Differentiable Vision Transformer Pruning With Low-Cost Proxies
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-11-18 DOI: 10.1109/JSTSP.2024.3501685
Haozhe Sun;Alexandre Heuillet;Felix Mohr;Hedi Tabia
{"title":"DARIO: Differentiable Vision Transformer Pruning With Low-Cost Proxies","authors":"Haozhe Sun;Alexandre Heuillet;Felix Mohr;Hedi Tabia","doi":"10.1109/JSTSP.2024.3501685","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3501685","url":null,"abstract":"Transformer models have gained popularity for their exceptional performance. However, these models still face the challenge of high inference latency. To improve the computational efficiency of such models, we propose a novel differentiable pruning method called DARIO (<bold>D</b>ifferenti<bold>A</b>ble vision transformer p<bold>R</b>un<bold>I</b>ng with low-cost pr<bold>O</b>xies). Our approach involves optimizing a set of gating parameters using differentiable, data-agnostic, scale-invariant, and low-cost performance proxies. DARIO is a data-agnostic pruning method, it does not need any classification heads during pruning. We evaluated DARIO on two popular state-of-the-art pre-trained ViT models, including both large (MAE-ViT) and small (MobileViT) sizes. Extensive experiments conducted across 40 diverse datasets demonstrated the effectiveness and efficiency of our DARIO method. DARIO not only significantly improves inference efficiency on modern hardware but also excels in preserving accuracy. Notably, DARIO has even achieved an increase in accuracy on MobileViT, despite only fine-tuning the last block and the classification head.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 6","pages":"997-1009"},"PeriodicalIF":8.7,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Alias-and-Separate Speech Coding Framework With Minimal Algorithmic Delay 最小算法延迟的改进型别名和分离语音编码框架
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-11-18 DOI: 10.1109/JSTSP.2024.3501681
Eunkyun Lee;Seungkwon Beack;Jong Won Shin
{"title":"Improved Alias-and-Separate Speech Coding Framework With Minimal Algorithmic Delay","authors":"Eunkyun Lee;Seungkwon Beack;Jong Won Shin","doi":"10.1109/JSTSP.2024.3501681","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3501681","url":null,"abstract":"Alias-and-Separate (AaS) speech coding framework has shown the possibility to encode wideband (WB) speech with a narrowband (NB) speech codec and reconstruct it using speech separation. WB speech is first decimated incurring aliasing and then coded, transmitted, and decoded with a NB codec. The decoded signal is then separated into lower band and spectrally-flipped high band using a speech separation module, which are expanded, lowpass/highpass filtered, and added together to reconstruct the WB speech. The original AaS system, however, has algorithmic delay originated from the overlap-add operation for consecutive segments. This algorithmic delay can be reduced by omitting the overlap-add procedure, but the quality of the reconstructed speech is also degraded due to artifacts on the segment boundaries. In this work, we propose an improved AaS framework with minimum algorithmic delay. The decoded signal is first expanded by inserting zeros in-between samples before being processed by source separation module. As the expanded signal can be viewed as a summation of the frequency-shifted versions of the original signal, the decoded-and-expanded signal is then separated into the frequency-shifted signals, which are multiplied by complex exponentials and summed up to reconstruct the original signal. With carefully designed transposed convolution operation in the separation module, the proposed system requires minimal algorithmic delay while preventing discontinuity at the segment boundaries. Additionally, we propose to employ a generative vocoder to further improve the perceived quality and a modified multi-resolution short-time Fourier transform (MR-STFT) loss. Experimental results on the WB speech coding with a NB codec demonstrated that the proposed system outperformed the original AaS system and the existing WB speech codec in the subjective listening test. We have also shown that the proposed method can be applied when the decimation factor is not 2 in the experiment on the fullband speech coding with a WB codec.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 8","pages":"1414-1426"},"PeriodicalIF":8.7,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perceptual Neural Audio Coding With Modified Discrete Cosine Transform 利用修正离散余弦变换进行感知神经音频编码
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-11-06 DOI: 10.1109/JSTSP.2024.3491576
Hyungseob Lim;Jihyun Lee;Byeong Hyeon Kim;Inseon Jang;Hong-Goo Kang
{"title":"Perceptual Neural Audio Coding With Modified Discrete Cosine Transform","authors":"Hyungseob Lim;Jihyun Lee;Byeong Hyeon Kim;Inseon Jang;Hong-Goo Kang","doi":"10.1109/JSTSP.2024.3491576","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3491576","url":null,"abstract":"Despite efforts to leverage the modeling power of deep neural networks (DNNs) in audio coding, effectively deploying them in real-world applications is still problematic due to their high computational cost and the restricted range of target signals or achievable bit-rates. In this paper, we propose an alternative approach for integrating DNNs into a perceptual audio coder that allows for the optimization of the whole system in a data-driven, end-to-end manner. The key idea of the proposed method is to make DNNs control the quantization noise in the classic transform coding framework, specifically based on the modified discrete cosine transform (MDCT). The proposal includes a new DNN-based mechanism for adaptively adjusting the quantization step sizes of frequency bands targeting an arbitrary bit-rate, eventually acting as a data-driven differentiable psychoacoustic model. The side information regarding the adaptive quantization is also encoded and decoded by DNNs via learned representation. During training, the perceptual distortion is evaluated by a perceptual quality estimation model trained on actual human ratings so that the proposed audio codec can effectively allocate bits considering their effect on the perceptual quality. Through comparisons with legacy audio codecs (MP3 and AAC) and a neural audio codec (EnCodec), we show that our method can achieve further coding gains over the legacy codecs with a substantially lower computational load on the decoder compared to other neural audio codecs.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 8","pages":"1490-1505"},"PeriodicalIF":8.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Signal Processing Society Information 电气和电子工程师学会信号处理学会信息
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-11-05 DOI: 10.1109/JSTSP.2024.3459324
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2024.3459324","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3459324","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"C2-C2"},"PeriodicalIF":8.7,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10744618","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Signal Processing Society Information 电气和电子工程师学会信号处理学会信息
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-11-05 DOI: 10.1109/JSTSP.2024.3459322
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2024.3459322","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3459322","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"C3-C3"},"PeriodicalIF":8.7,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10744789","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the Special Issue Near-Field Signal Processing: Algorithms, Implementations and Applications 近场信号处理》特刊简介:算法、实现与应用
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-11-05 DOI: 10.1109/JSTSP.2024.3465108
Ahmet M. Elbir;Kumar Vijay Mishra;Özlem Tuğfe Demir;Emil Björnson;Angel Lozano
{"title":"Introduction to the Special Issue Near-Field Signal Processing: Algorithms, Implementations and Applications","authors":"Ahmet M. Elbir;Kumar Vijay Mishra;Özlem Tuğfe Demir;Emil Björnson;Angel Lozano","doi":"10.1109/JSTSP.2024.3465108","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3465108","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"541-545"},"PeriodicalIF":8.7,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10744777","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multipath Component Power Delay Profile Based Ranging 基于多径分量功率延迟曲线的测距
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-11-04 DOI: 10.1109/JSTSP.2024.3491580
Fangqing Xiao;Zilu Zhao;Dirk T. M. Slock
{"title":"Multipath Component Power Delay Profile Based Ranging","authors":"Fangqing Xiao;Zilu Zhao;Dirk T. M. Slock","doi":"10.1109/JSTSP.2024.3491580","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3491580","url":null,"abstract":"Precision ranging technology has become indispensable for ensuring efficient, reliable, and low-latency fifth-generation (5G) networks. In this paper, we propose a novel ranging method which is multipath component (MPC) power delay profile (PDP) based ranging. Whereas the Received Signal Strength (RSS) only summarizes the PDP into a single characteristic, we aim to furthermore exploit the range dependent curvature of the PDP envelope over its delay spread. However, the multipath propagation only allows to sample the PDP envelope at the path delays and suffers from (slow) fading. Hence our approach involves constructing a statistical fading model of the PDP and establishing a relationship between the distribution parameters and the propagation distance. To theoretically validate the feasibility of our proposed method, we adopt the widely accepted Nakagami-m fading model, which renders traditional estimation methods difficult to apply. Therefore we introduce the Expectation Maximization (EM)-Revisited Vector Approximate Message Passing (ReVAMP) algorithm. This algorithm is specifically designed to handle difficulties in parameter estimation for Gaussian linear models (GLMs) with hidden random variables and intractable posterior distributions. Extensive numerical simulation results have been conducted which exhibit preliminary evidence of the effectiveness of our MPCPDP-based ranging method compared to the received signal strength (RSS)-based method. Moreover, the versatility of the EM-ReVAMP algorithm allows for its extension to other statistical fading models beyond the Nakagami-m model with minor modifications, which opens the door to potential improvements based on more accurate statistical fading models. Nevertheless, the applicability of our MPCPDP-based ranging method should be validated in real-world scenarios in future studies.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 5","pages":"950-963"},"PeriodicalIF":8.7,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Speech Coding for Real-Time Communications Using Constant Bitrate Scalar Quantization
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-11-04 DOI: 10.1109/JSTSP.2024.3491575
Andreas Brendel;Nicola Pia;Kishan Gupta;Lyonel Behringer;Guillaume Fuchs;Markus Multrus
{"title":"Neural Speech Coding for Real-Time Communications Using Constant Bitrate Scalar Quantization","authors":"Andreas Brendel;Nicola Pia;Kishan Gupta;Lyonel Behringer;Guillaume Fuchs;Markus Multrus","doi":"10.1109/JSTSP.2024.3491575","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3491575","url":null,"abstract":"Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder is learned. This allows for efficient transmission of the input audio signal. The learned discrete representation of neural codecs is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose and analyze simple alternatives to VQ, which are based on projected Scalar Quantization (SQ). These quantization techniques do not need any additional losses, scheduling parameters or codebook storage thereby simplifying the training of neural audio codecs. For real-time speech communication applications, these neural codecs are required to operate at low complexity, low latency and at low bitrates. We address those challenges by proposing a new causal network architecture that is based on SQ and a Short-Time Fourier Transform (STFT) representation. The proposed method performs particularly well in the very low complexity and low bitrate regime.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 8","pages":"1462-1476"},"PeriodicalIF":8.7,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations 通过低比特率神经编解码器和预训练表征学习通用语音令牌
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-10-30 DOI: 10.1109/JSTSP.2024.3488557
Xue Jiang;Xiulian Peng;Yuan Zhang;Yan Lu
{"title":"Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations","authors":"Xue Jiang;Xiulian Peng;Yuan Zhang;Yan Lu","doi":"10.1109/JSTSP.2024.3488557","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3488557","url":null,"abstract":"Current large speech language models are mainly based on semantic tokens from discretization of self-supervised learned representations and acoustic tokens from a neural codec, following a semantic-modeling and acoustic-synthesis paradigm. However, semantic tokens discard paralinguistic attributes of speakers that is important for natural spoken communication, while prompt-based acoustic synthesis from semantic tokens has limits in recovering paralinguistic details and suffers from robustness issues, especially when there are domain gaps between the prompt and the target. This paper unifies two types of tokens and proposes the UniCodec, a universal speech token learning that encapsulates all semantics of speech, including linguistic and paralinguistic information, into a compact and semantically-disentangled unified token. Such a unified token can not only benefit speech language models in understanding with paralinguistic hints but also help speech generation with high-quality output. A low-bitrate neural codec is leveraged to learn such disentangled discrete representations at global and local scales, with knowledge distilled from self-supervised learned features. Extensive evaluations on multilingual datasets demonstrate its effectiveness in generating natural, expressive and long-term consistent output quality with paralinguistic attributes well preserved in several speech processing tasks.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 8","pages":"1477-1489"},"PeriodicalIF":8.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Unified Activity Detection Framework for Massive Access: Beyond the Block-Fading Paradigm 大规模访问的统一活动检测框架:超越块衰落范式
IF 8.7 1区 工程技术
IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-10-24 DOI: 10.1109/JSTSP.2024.3486200
Jianan Bai;Erik G. Larsson
{"title":"A Unified Activity Detection Framework for Massive Access: Beyond the Block-Fading Paradigm","authors":"Jianan Bai;Erik G. Larsson","doi":"10.1109/JSTSP.2024.3486200","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3486200","url":null,"abstract":"The wireless channel changes continuously with time and frequency and the block-fading assumption, which is popular in many theoretical analyses, never holds true in practical scenarios. This discrepancy is critical for user activity detection in grant-free random access, where joint processing across multiple coherence blocks is undesirable, especially when the environment becomes more dynamic. In this paper, we develop a framework for low-dimensional approximation of the channel to capture its variations over time and frequency, and use this framework to implement robust activity detection algorithms. Furthermore, we investigate how to efficiently estimate the principal subspace that defines the low-dimensional approximation. We also examine pilot hopping as a way of exploiting time and frequency diversity in scenarios with limited channel coherence, and extend our algorithms to this case. Through numerical examples, we demonstrate a substantial performance improvement achieved by our proposed framework.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 7","pages":"1366-1380"},"PeriodicalIF":8.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信