{"title":"Optimized Parameter-Efficient Deep Learning Systems via Reversible Jump Simulated Annealing","authors":"Peter Marsh;Ercan Engin Kuruoglu","doi":"10.1109/JSTSP.2024.3428355","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3428355","url":null,"abstract":"We utilize the non-convex optimization method simulated annealing enriched with reversible jumps to enable a model selection capacity for deep learning models in a model size aware context. By using simulated annealing enriched with reversible jumps, we can yield a robust stochastic learning of the hidden posterior distribution of the structure, simultaneously constructing a more focused and certain estimate of the structure, all while making use of all the data. Being based upon Markov-chain learning methods, we constructed our priors to favor smaller and simpler architectures, allowing us to converge on the set of globally optimal models that are additionally parameter-efficient, seeking low parameter count deep models that retain good predictive accuracy. We demonstrate the capability on standard image recognition with CIFAR-10, as well as performing model selection on time-series tasks, realizing networks with competitive performance as compared to competing non-convex optimization methods such as genetic algorithms, random search, and Gaussian process based Bayesian optimization, while being less than half the size.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 6","pages":"1010-1023"},"PeriodicalIF":8.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Stage Audio-Visual Speech Separation Method Without Visual Signals for Testing and Tuples Loss With Dynamic Margin","authors":"Yinggang Liu;Yuanjie Deng;Ying Wei","doi":"10.1109/JSTSP.2024.3427424","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3427424","url":null,"abstract":"Speech separation as a fundamental task in signal processing can be used in many types of intelligent robots, and audio-visual (AV) speech separation has been proven to be superior to audio-only speech separation. In current AV speech separation methods, visual information plays a pivotal role not only during network training but also during testing. However, due to various factors in real environments, sensors do not always possible to obtain high-quality visual signals. In this paper, we propose an effective two-stage AV speech separation model that introduces a new approach of visual feature embedding, where visual information is used to optimize the separation network during training, but no visual input is required during testing. Different from the current methods which fuse visual features and audio features together as the input of the separation network, in this model, visual features are embedded into AV matching block to calculate the cross-modal consistency loss, which is used as part of the loss function for network optimization. A novel tuples loss function with a learnable dynamic margin is proposed for better AV matching, and two margin change strategies are given. The proposed two-stage AV speech separation method is evaluated on the widely used GRID and VoxCeleb2 datasets. Experimental results show that the performance outperforms current AV speech separation methods.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"459-472"},"PeriodicalIF":8.7,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Image Thresholding Algorithm Using Fuzzy Logic for Autonomous Underwater Vehicle Navigation","authors":"I-Chen Sang;William R. Norris","doi":"10.1109/JSTSP.2024.3426484","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3426484","url":null,"abstract":"Breakthroughs in autonomous vehicle technology have ignited diverse topics within engineering research. Among these, the focus on conducting inspections through autonomous underwater vehicles (AUVs) stands out as particularly influential, owing to the substantial investments directed towards offshore infrastructures. Leveraging the capabilities of onboard sensors, AUVs hold the potential to adeptly trace and examine pipelines with high levels of accuracy. However, the complicated and varying underwater environment presents a formidable challenge to ensuring the robustness of the localization and navigation framework. In response to these challenges, this study introduces a novel GPS-denied, adaptive, vision-based navigation framework tailored specifically for AUV inspection tasks. Different from conventional approaches involving manual parameter tuning, this framework dynamically adjusts contrast enhancement and edge detection functions based on incoming frame data. Fuzzy inference systems (FIS) have been harnessed within both image processing and the navigation algorithm, strengthening the overall robustness of the system. The verification of the proposed framework took place within a simulation environment. Through the implemented algorithm, the AUV adeptly identified, approached, and traversed the pipeline. Additionally, the framework distinctly showcased its capacity to dynamically adjust parameters, reduce processing time, and uphold consistency amid diverse illuminations and levels of noise.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"358-367"},"PeriodicalIF":8.7,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10596073","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marc Andrew Valdez;Jacob D. Rezac;Michael B. Wakin;Joshua A. Gordon
{"title":"Multi-Frequency Spherical Near-Field Antenna Measurements Using Compressive Sensing","authors":"Marc Andrew Valdez;Jacob D. Rezac;Michael B. Wakin;Joshua A. Gordon","doi":"10.1109/JSTSP.2024.3424310","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3424310","url":null,"abstract":"We propose compressive sensing approaches for broadband spherical near-field measurements that reduce measurement demands beyond what is achievable using conventional single-frequency compressive sensing. Our approaches use two different compressive signal models—sparsity-based and low-rank-based—whose viability we establish using a simulated standard gain horn antenna. Under mild assumptions on the device being tested, we prove that sparsity-based broadband compressive sensing provides significant measurement number reductions over single-frequency compressive sensing. We find that our proposed low-rank model also provides an effective means of achieving broadband compressive sensing, using numerical experiments, with performance on par with the best broadband sparsity-based method. Exemplifying these best-case results, even in the presence of measurement noise, the methods we propose can achieve relative errors of −40 dB using about 1/4 of the measurements required for conventional sampling. This is equivalent to about 1/2 sample per unknown, whereas traditional spherical near-field measurements require a minimum of roughly 2 measurements per unknown.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"572-586"},"PeriodicalIF":8.7,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Xia;Jun Du;Zekai Zhang;Ziyuan Wang;Jingzehua Xu;Weishi Mi
{"title":"Standoff Target Tracking for Networked UAVs With Specified Performance via Deep Reinforcement Learning","authors":"Yi Xia;Jun Du;Zekai Zhang;Ziyuan Wang;Jingzehua Xu;Weishi Mi","doi":"10.1109/JSTSP.2024.3425052","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3425052","url":null,"abstract":"Maintaining rapid and prolonged standoff target tracking for networked unmanned aerial vehicles (UAVs) is challenging, as existing methods fail to improve tracking performance while simultaneously reducing energy consumption. This paper proposes a deep reinforcement learning (DRL)-based tracking scheme for UAVs to approximate an escape target, effectively addressing time constraints and guaranteeing low energy expenditure. In the first phase, a coordinated target tracking protocol and a target position estimator are developed using only bearing measurements, which enable the deployment of UAVs along a standoff circle centered at the target with an expected angular spacing. Additionally, an unknown system dynamics estimator (USDE) is devised based on concise filtering operations to mitigate adverse disturbances. In the second phase, multi-agent deep deterministic policy gradient (MADDPG) is employed to strike an optimal balance between tracking accuracy and energy consumption by encoding time limitations as skilled barrier functions. Simulation results demonstrate that the proposed method outperforms benchmarks in terms of tracking accuracy and control cost.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"516-528"},"PeriodicalIF":8.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Optimization With Formal Safety Guarantees via Online Conformal Prediction","authors":"Yunchuan Zhang;Sangwoo Park;Osvaldo Simeone","doi":"10.1109/JSTSP.2024.3422825","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3422825","url":null,"abstract":"Black-box zero-th order optimizationis a central primitive for applications in fields as diverse as finance, physics, and engineering. In a common formulation of this problem, a designer sequentially attempts candidate solutions, receiving noisy feedback on the value of each attempt from the system. In this paper, we study scenarios in which feedback is also provided on the <italic>safety</i> of the attempted solution, and the optimizer is constrained to limit the number of unsafe solutions that are tried throughout the optimization process. Focusing on methods based on Bayesian optimization (BO), prior art has introduced an optimization scheme – referred to as <sc>SafeOpt</small> – that is guaranteed not to select <italic>any</i> unsafe solution with a controllable probability over feedback noise as long as strict assumptions on the safety constraint function are met. In this paper, a novel BO-based approach is introduced that satisfies safety requirements irrespective of properties of the constraint function. This strong theoretical guarantee is obtained at the cost of allowing for an arbitrary, controllable but non-zero, rate of violation of the safety constraint. The proposed method, referred to as <sc>Safe-Bocp</small>, builds on online conformal prediction (CP) and is specialized to the cases in which feedback on the safety constraint is either noiseless or noisy. Experimental results on synthetic and real-world data validate the advantages and flexibility of the proposed <sc>Safe-Bocp</small>.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 1","pages":"45-59"},"PeriodicalIF":8.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition","authors":"R. Gnana Praveen;Jahangir Alam","doi":"10.1109/JSTSP.2024.3422823","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3422823","url":null,"abstract":"Multimodal emotion recognition has immense potential for the comprehensive assessment of human emotions, utilizing multiple modalities that often exhibit complementary relationships. In video-based emotion recognition, audio and visual modalities have emerged as prominent contact-free channels, widely explored in existing literature. Current approaches typically employ cross-modal attention mechanisms between audio and visual modalities, assuming a constant state of complementarity. However, this assumption may not always hold true, as non-complementary relationships can also manifest, undermining the efficacy of cross-modal feature integration and thereby diminishing the quality of audio-visual feature representations. To tackle this problem, we introduce a novel Incongruity-Aware Cross-Attention (IACA) model, capable of harnessing the benefits of robust complementary relationships while efficiently managing non-complementary scenarios. Specifically, our approach incorporates a two-stage gating mechanism designed to adaptively select semantic features, thereby effectively capturing the inter-modal associations. Additionally, the proposed model demonstrates an ability to mitigate the adverse effects of severely corrupted or missing modalities. We rigorously evaluate the performance of the proposed model through extensive experiments conducted on the challenging RECOLA and Aff-Wild2 datasets. The results underscore the efficacy of our approach, as it outperforms state-of-the-art methods by adeptly capturing inter-modal relationships and minimizing the influence of missing or heavily corrupted modalities. Furthermore, we show that the proposed model is compatible with various cross-modal attention variants, consistently improving performance on both datasets.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"444-458"},"PeriodicalIF":8.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topology-Preserving Motion Coordination for Multi-Robot Systems in Adversarial Environments","authors":"Zitong Wang;Yushan Li;Xiaoming Duan;Jianping He","doi":"10.1109/JSTSP.2024.3421898","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3421898","url":null,"abstract":"The interaction topology plays a significant role in the distributed motion coordination of multi-robot systems (MRSs) for its noticeable impact on the information flow between robots. However, recent research has revealed that in adversarial environments, the topology can be inferred by external adversaries equipped with advanced sensors, posing severe security risks to MRSs. Therefore, it is of utmost importance to preserve the interaction topology from inference attacks while ensuring the coordination performance. To this end, we propose a topology-preserving motion coordination (TPMC) algorithm that strategically introduces perturbation signals during the coordination process with a compensation design. The major novelty is threefold: i) We focus on the second-order motion coordination model and tackle the coupling issue of the perturbation signals with the unstable state updating process; ii) We develop a general framework for distributed compensation of perturbation signals, strategically addressing the challenge of perturbation accumulation while ensuring precise motion coordination; iii) We derive the convergence conditions and rate characterization to achieve the motion coordination under the TPMC algorithm. Extensive simulations and real-world experiments are conducted to verify the performance of the proposed method.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"473-486"},"PeriodicalIF":8.7,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Robot Perception Framework for Complex Environments Using Multiple mmWave Radars","authors":"Hongyu Chen;Yimin Liu;Yuwei Cheng","doi":"10.1109/JSTSP.2024.3420234","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3420234","url":null,"abstract":"The robust perception of environments is crucial for mobile robots to operate autonomously in complex environments. Over the years, mobile robots mainly rely on optical sensors for perception, which degrade severely in adverse weather conditions. Recently, single-chip millimeter-wave (mmWave) radars have been widely used for mobile perception, owing to their robustness to all-weather conditions, lightweight design, and low cost. However, existing research based on mmWave radars primarily focuses on single radar and single task. Due to the limited field of view and sparse observation, perception based on a single radar may not ensure the required robustness in complex environments. To address this challenge, we propose a novel robust perception framework for robots in complex environments based on multiple mmWave radars, named MMR-PFR. The framework integrates three critical tasks for robots, including ego-motion estimation, multi-radar fusion mapping, and dynamic target state estimation. Multiple tasks collaborate and facilitate each other to improve overall performance. In the framework, we propose a new multi-radar point cloud fusion method to generate a more accurate environmental map. In addition, we propose a new online calibration algorithm for multiple radars to ensure the long-term reliability of the system. To evaluate MMR-PRF, we build a prototype and carry out experiments in real-world scenarios. The evaluation results show the effectiveness and superiority of the proposed framework.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"380-395"},"PeriodicalIF":8.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Torcolacci;A. Guerra;H. Zhang;F. Guidi;Q. Yang;Y. C. Eldar;D. Dardari
{"title":"Holographic Imaging With XL-MIMO and RIS: Illumination and Reflection Design","authors":"G. Torcolacci;A. Guerra;H. Zhang;F. Guidi;Q. Yang;Y. C. Eldar;D. Dardari","doi":"10.1109/JSTSP.2024.3417356","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3417356","url":null,"abstract":"This paper addresses a near-field imaging problem utilizing extremely large-scale multiple-input multiple-output (XL-MIMO) antennas and reconfigurable intelligent surfaces (RISs) already in place for wireless communications. To this end, we consider a system with a fixed transmitting antenna array illuminating a region of interest (ROI) and a fixed receiving antenna array inferring the ROI's scattering coefficients. Leveraging XL-MIMO and high frequencies, the ROI is situated in the radiative near-field region of both antenna arrays, thus enhancing the degrees of freedom (DoF) (i.e., the channel matrix rank) of the illuminating and sensing channels available for imaging, here referred to as \u0000<italic>holographic imaging</i>\u0000. To further boost the imaging performance, we optimize the illuminating waveform by solving a min-max optimization problem having the upper bound of the mean squared error (MSE) of the image estimate as the objective function. Additionally, we address the challenge of non-line-of-sight (NLOS) scenarios by considering the presence of a RIS and deriving its optimal reflection coefficients. Numerical results investigate the interplay between illumination optimization, geometric configuration (monostatic and bistatic), the DoF of the illuminating and sensing channels, image estimation accuracy, and image complexity.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"587-602"},"PeriodicalIF":8.7,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}