Xue Wang;Tian Zhou;Jianqing Zhu;Jialin Liu;Kun Yuan;Tao Yao;Wotao Yin;Rong Jin;HanQin Cai
{"title":"S$^text{3}$Attention: Improving Long Sequence Attention With Smoothed Skeleton Sketching","authors":"Xue Wang;Tian Zhou;Jianqing Zhu;Jialin Liu;Kun Yuan;Tao Yao;Wotao Yin;Rong Jin;HanQin Cai","doi":"10.1109/JSTSP.2024.3446173","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3446173","url":null,"abstract":"Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challenging part of those approaches is maintaining the proper balance between information preservation and computation reduction: the longer sub-sequences used, the better information is preserved, but at the price of introducing more noise and computational costs. In this paper, we propose a smoothed skeleton sketching based Attention structure, coined S<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>Attention, which significantly improves upon the previous attempts to negotiate this trade-off. S<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>Attention has two mechanisms to effectively minimize the impact of noise while keeping the linear complexity to the sequence length: a smoothing block to mix information over long sequences and a matrix sketching method that simultaneously selects columns and rows from the input matrix. We verify the effectiveness of S<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>Attention both theoretically and empirically. Extensive studies over Long Range Arena (LRA) datasets and six time-series forecasting show that S<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>Attention significantly outperforms both vanilla Attention and other state-of-the-art variants of Attention structures.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 6","pages":"985-996"},"PeriodicalIF":8.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shekhar Kumar Yadav;S. R. M. Prasanna;Nithin V. George
{"title":"NeaSource Localization and Beamforming in the Spherical Sector Harmonics Domain","authors":"Shekhar Kumar Yadav;S. R. M. Prasanna;Nithin V. George","doi":"10.1109/JSTSP.2024.3442469","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3442469","url":null,"abstract":"Three-dimensional arrays can localize sources anywhere in the spatial domain without any ambiguity. Among these arrays, the spherical microphone array (SMA) has gained widespread usage in acoustic source localization and beamforming. However, SMAs are bulky, making them undesirable in applications with space and power constraints. To deal with this issue, arrays with microphones placed only in a sector of a sphere have been developed along with various techniques for localizing far-field sources in the spherical sector harmonics (S\u0000<sup>2</sup>\u0000H) domain. This work addresses near-field acoustic localization and beamforming using a spherical sector microphone array. We first introduce a representation of spherical waves from a near-field point source in the S\u0000<sup>2</sup>\u0000H domain using the orthonormal S\u0000<sup>2</sup>\u0000H basis functions. Then, using the representation, we develop an array model for when a spherical sector array is placed in a wavefield created by multiple near-field sources in the S\u0000<sup>2</sup>\u0000H domain. We highlight the advantages of the developed array model over the baseline near-field spatial domain array model. Using the developed array model, two algorithms are proposed for the joint estimation of the range, elevation and azimuth locations of near-field sources, namely NF-S\u0000<sup>2</sup>\u0000H-MUSIC and NF-S\u0000<sup>2</sup>\u0000H-MVDR. Further, a near-field beamforming algorithm capable of radial and angular filtering in the S\u0000<sup>2</sup>\u0000H domain is also presented. Finally, we present the Cramer-Rao Bound (CRB) for range, elevation and azimuth estimation in the S\u0000<sup>2</sup>\u0000H domain for near-field sources. The performances of the proposed algorithms are assessed using extensive near-field localization and beamforming simulations and an experiment.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"546-560"},"PeriodicalIF":8.7,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo Cazzella;Marouan Mizmizi;Dario Tagliaferri;Damiano Badini;Matteo Matteucci;Umberto Spagnolini
{"title":"Deep Learning-Based Target-to-User Association in Integrated Sensing and Communication Systems","authors":"Lorenzo Cazzella;Marouan Mizmizi;Dario Tagliaferri;Damiano Badini;Matteo Matteucci;Umberto Spagnolini","doi":"10.1109/JSTSP.2024.3438128","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3438128","url":null,"abstract":"In Integrated Sensing and Communication (ISAC) systems, matching the radar targets with communication user equipments (UEs) is functional to several communication tasks, such as proactive handover and beam prediction. In this paper, we consider a radar-assisted communication system where a base station (BS) is equipped with a multiple-input-multiple-output (MIMO) radar that has a double aim: \u0000<italic>i)</i>\u0000 associate vehicular radar targets to vehicular equipments (VEs) in the communication beamspace and \u0000<italic>ii)</i>\u0000 predict the beamforming vector for each VE from radar data. The proposed target-to-user (T2U) association consists of two stages. First, vehicular radar targets are detected from range-angle images, and, for each, a beamforming vector is estimated. Then, the inferred per-target beamforming vectors are matched with the ones utilized at the BS for communication to perform target-to-user (T2U) association. Joint multi-target detection and beam inference is obtained by modifying the you only look once (YOLO) model, which is trained over simulated range-angle radar images. Simulation results over different urban vehicular mobility scenarios show that the proposed T2U method provides a probability of correct association that increases with the size of the BS antenna array, highlighting the respective increase of the separability of the VEs in the beamspace. Moreover, we show that the modified YOLO architecture can effectively perform both beam prediction and radar target detection, with similar performance in mean average precision on the latter over different antenna array sizes.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 5","pages":"886-900"},"PeriodicalIF":8.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational Offloading in Semantic-Aware Cloud-Edge-End Collaborative Networks","authors":"Zelin Ji;Zhijin Qin","doi":"10.1109/JSTSP.2024.3433387","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3433387","url":null,"abstract":"The trend of massive connectivity pushes forward the explosive growth of end devices. The emergence of various applications has prompted a demand for pervasive connectivity and more efficient computing paradigms. On the other hand, the lack of computational capacity of the end devices restricts the implementation of the intelligent applications, and becomes a bottleneck of the multiple access for supporting massive connectivity. Mobile cloud computing (MCC) and mobile edge computing (MEC) techniques enable end devices to offload local computation-intensive tasks to servers by networks. In this paper, we consider the cloud-edge-end collaborative networks to utilize distributed computing resources. Furthermore, we apply task-oriented semantic communications to tackle the fast-varying channel between the end devices and MEC servers and reduce the communication cost. To minimize long-term energy consumption on constraints queue stability and computational delay, a Lyapunov-guided deep reinforcement learning hybrid (DRLH) framework is proposed to solve the mixed integer non-linear programming (MINLP) problem. The long-term energy consumption minimization problem is transformed into the deterministic problem in each time frame. The DRLH framework integrates a model-free deep reinforcement learning algorithm with a model-based mathematical optimization algorithm to mitigate computational complexity and leverage the scenario information, so that improving the convergence performance. Numerical results demonstrate that the proposed DRLH framework achieves near-optimal performance on energy consumption while stabilizing all queues.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 7","pages":"1235-1248"},"PeriodicalIF":8.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unstructured Pruning and Low Rank Factorisation of Self-Supervised Pre-Trained Speech Models","authors":"Haoyu Wang;Wei-Qiang Zhang","doi":"10.1109/JSTSP.2024.3433616","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3433616","url":null,"abstract":"Self-supervised pre-trained speech models require significant memory and computational resources, limiting their applicability to many speech tasks. Unstructured pruning is a compression method that can achieve minimal performance degradation, while the resulting sparse matrix mandates special hardware or computational operators for acceleration. In this study, we propose a novel approach that leverages the potential low-rank structures of the unstructured sparse matrices by applying truncated singular value decomposition (SVD), thus converting them into parameter-efficient dense models. Moreover, we introduce nuclear norm regularisation to ensure lower rank and a learnable singular value selection strategy to determine the approximate truncation rate for each matrix. Experiments on multiple speech tasks demonstrate that the proposed method can convert an unstructured sparse model into a light-weight and hardware-friendly dense model with comparable or superior performance.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 6","pages":"1046-1058"},"PeriodicalIF":8.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Riou;Kaiwen Dong;Kevin Subrin;Patrick Le Callet
{"title":"Reinforcement Learning Based Tactile Sensing for Active Point Cloud Acquisition, Recognition and Localization","authors":"Kevin Riou;Kaiwen Dong;Kevin Subrin;Patrick Le Callet","doi":"10.1109/JSTSP.2024.3431203","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3431203","url":null,"abstract":"Traditional passive point cloud acquisition systems, such as lidars or stereo cameras, can be impractical in real-life and industrial use cases. Firstly, some extreme environments may preclude the use of these sensors. Secondly, they capture information from the entire scene instead of focusing on areas relevant to the end task, such as object recognition and localization. In contrast, we propose to train a Reinforcement Learning (RL) agent with dual objectives: i) control a robot equipped with a tactile (or laser) sensor to iteratively collect a few relevant points from the scene, and ii) recognize and localize objects from the sparse point cloud which has been collected. The iterative point sampling strategy, referred to as an active sampling strategy, is jointly trained with the classifier and the pose estimator to ensure efficient exploration that focuses on areas relevant to the recognition task. To achive these two objectives, we introduce three RL reward terms: classification, exploration, and pose estimation rewards. These rewards serve the purpose of offering guidance and supervision in their respective domain, allowing us to delve into their individual impacts and contributions. We compare the proposed framework to both active sampling strategies and passive hard-coded sampling strategies coupled with state-of-the-art point cloud classifiers. Furthermore, we evaluate our framework in realistic scenarios, considering realistic and similar objects, as well as accounting for uncertainty in the object's position in the workspace.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"299-311"},"PeriodicalIF":8.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongwei Hou;Xuan He;Tianhao Fang;Xinping Yi;Wenjin Wang;Shi Jin
{"title":"Beam-Delay Domain Channel Estimation for mmWave XL-MIMO Systems","authors":"Hongwei Hou;Xuan He;Tianhao Fang;Xinping Yi;Wenjin Wang;Shi Jin","doi":"10.1109/JSTSP.2024.3431919","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3431919","url":null,"abstract":"This paper investigates the uplink channel estimation of the millimeter-wave (mmWave) extremely large-scale multiple-input-multiple-output (XL-MIMO) communication system in the beam-delay domain, taking into account the near-field and beam-squint effects due to the transmission bandwidth and array aperture growth. Specifically, we model spatial-frequency domain channels in the beam-delay domain to explore inter-antenna and inter-subcarrier correlations. Within this model, the frequency-dependent hybrid-field beam domain steering vectors are introduced to describe the near-field and beam-squint effects. The independent and non-identically distributed Bernoulli-Gaussian models with unknown prior hyperparameters are employed to capture the sparsity in the beam-delay domain, posing a challenge for channel estimation. Under the constrained Bethe free energy minimization framework, we design different structures and constraints on trial beliefs to develop hybrid message passing (HMP) algorithms, thus achieving efficient joint estimation of beam-delay domain channel and prior hyperparameters. To further improve the model accuracy, the multidimensional grid point perturbation (MDGPP)-based representation is presented, which assigns individual perturbation parameters to each multidimensional discrete grid. By treating the MDGPP parameters as unknown hyperparameters, we propose the two-stage HMP algorithm for MDGPP-based channel estimation, where the output of the initial stage is pruned for the refinement stage to reduce the computational complexity. Numerical simulations demonstrate the significant superiority of the proposed algorithm over benchmarks with both near-field and beam-squint effects.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"646-661"},"PeriodicalIF":8.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuxi Chen;Tianlong Chen;Yu Cheng;Weizhu Chen;Ahmed Hassan Awadallah;Zhangyang Wang
{"title":"One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization","authors":"Xuxi Chen;Tianlong Chen;Yu Cheng;Weizhu Chen;Ahmed Hassan Awadallah;Zhangyang Wang","doi":"10.1109/JSTSP.2024.3431927","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3431927","url":null,"abstract":"Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of <italic>parameter-efficient</i> fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) <italic>flexible</i>, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) <italic>(nearly) hyperparameter-free</i>, requiring only a global parameter budget as input. This new scheme, dubbed <bold>AutoSparse</b>, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are <italic>automatically allocated</i> (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2<inline-formula><tex-math>$%$</tex-math></inline-formula> on MNLI-mismatched.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 6","pages":"1059-1069"},"PeriodicalIF":8.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DDL: Empowering Delivery Drones With Large-Scale Urban Sensing Capability","authors":"Xuecheng Chen;Haoyang Wang;Yuhan Cheng;Haohao Fu;Yuxuan Liu;Fan Dang;Yunhao Liu;Jinqiang Cui;Xinlei Chen","doi":"10.1109/JSTSP.2024.3427371","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3427371","url":null,"abstract":"Delivery drones provide a promising sensing platform for smart cities thanks to their city-wide infrastructure and large-scale deployment. However, due to limited battery lifetime and available resources, it is challenging to schedule delivery drones to derive both high sensing and delivery performance, which is a highly complicated optimization problem with several coupled decision variables. Meanwhile, this complex optimization problem involves multiple interconnected decision variables, making it even more complex. In this paper, we first propose a delivery drone-based sensing system and formulate a mixed-integer non-linear programming problem (MINLP) that jointly optimizes the sensing utility and delivery time, considering practical factors including energy capacity and available delivery drones. Then we provide an efficient solution that integrates the strength of deep reinforcement learning (DRL) and heuristic, which decouples the highly complicated optimization search process and replaces the heavy computation with a rapid approximation. Evaluation results compared with the state-of-the-art baselines show that \u0000<italic>DDL</i>\u0000 improves the scheduling quality by at least 46% on average. More importantly, our proposed method could effectively improve the computational efficiency, which is up to 98 times higher than the best baseline.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"502-515"},"PeriodicalIF":8.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized Parameter-Efficient Deep Learning Systems via Reversible Jump Simulated Annealing","authors":"Peter Marsh;Ercan Engin Kuruoglu","doi":"10.1109/JSTSP.2024.3428355","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3428355","url":null,"abstract":"We utilize the non-convex optimization method simulated annealing enriched with reversible jumps to enable a model selection capacity for deep learning models in a model size aware context. By using simulated annealing enriched with reversible jumps, we can yield a robust stochastic learning of the hidden posterior distribution of the structure, simultaneously constructing a more focused and certain estimate of the structure, all while making use of all the data. Being based upon Markov-chain learning methods, we constructed our priors to favor smaller and simpler architectures, allowing us to converge on the set of globally optimal models that are additionally parameter-efficient, seeking low parameter count deep models that retain good predictive accuracy. We demonstrate the capability on standard image recognition with CIFAR-10, as well as performing model selection on time-series tasks, realizing networks with competitive performance as compared to competing non-convex optimization methods such as genetic algorithms, random search, and Gaussian process based Bayesian optimization, while being less than half the size.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 6","pages":"1010-1023"},"PeriodicalIF":8.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}