Mo Zhou, M. Møller, Christian Pedersen, Jan Østergaard
{"title":"Robust Fir Filters for Wireless Low-Frequency Sound Zones","authors":"Mo Zhou, M. Møller, Christian Pedersen, Jan Østergaard","doi":"10.1109/ICASSP49357.2023.10095256","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095256","url":null,"abstract":"Low frequency personal sound zones can be created by controlling the sound pressure in separate spatially confined regions. The performance of a sound zone system using wireless communication may be degraded due to potential packet losses. In this paper, we propose robust FIR filters for low-frequency sound zone system by incorporating information about the expected packet losses into the design. A simulation study with eight loudspeakers surrounding two control regions shows that the proposed filters can improve the contrast and the sound quality when packet losses occur, with only a slight degradation in performance even when there is no packet loss. With the proposed filters, it is possible to gain 2 dB higher contrast on average across the frequency range 20-200Hz relative to the original filters when packet loss rate is 5%.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129755745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CSM In Motion Vector Steganalysis: The Effect of Coders on Motion Vectors in H.264 Video Encoding","authors":"Verena Lachner, K. Schaar, Ralf Zimmermann","doi":"10.1109/ICASSP49357.2023.10096323","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096323","url":null,"abstract":"Cover-Source Mismatch (CSM) has long been recognized as a major challenge in machine-learning based image steganalysis. However, little is known about the impact of CSM on motion vector (MV) based video steganalysis. As CSM stemming from factors at the end of the processing pipeline is particularly noticeable, we conduct a study on the impact of H.264 coders on motion vectors. This paper compares the outputs of compression operations of a range of coders according to specific MV statistics. We identify discrepancies between the coders in all cases and point out why this is an indicator for CSM. This result paves the way for more extensive research on CSM in MV steganalysis.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129811197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Z. Reichman, Anirudh S. Sundar, Christopher Richardson, Tamara Zubatiy, Prithwijit Chowdhury, Aaryan Shah, Jack Truxal, Micah Grimes, Dristi Shah, Woo Ju Chee, Saif Punjwani, Atishay Jain, L. Heck
{"title":"Outside Knowledge Visual Question Answering Version 2.0","authors":"Benjamin Z. Reichman, Anirudh S. Sundar, Christopher Richardson, Tamara Zubatiy, Prithwijit Chowdhury, Aaryan Shah, Jack Truxal, Micah Grimes, Dristi Shah, Woo Ju Chee, Saif Punjwani, Atishay Jain, L. Heck","doi":"10.1109/ICASSP49357.2023.10096074","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096074","url":null,"abstract":"Visual question answering (VQA) lies at the intersection of language and vision research. It functions as a building block for multimodal conversational AI and serves as a testbed for assessing a model’s capability for open-domain scene understanding. While progress in this area was initially accelerated with the 2015 release of the popular and large dataset \"VQA\", new datasets are required to continue this research momentum. For example, the 2019 Outside Knowledge VQA dataset \"OKVQA\" extends VQA by adding more challenging questions that require complex, factual, and commonsense knowledge. However, in our analysis, we found that 41.4% of the dataset needed to be corrected and 10.6% needed to be removed. This paper describes the analysis, corrections, and removals completed and presents a new dataset: OK-VQA Version 2.0. To gain insights into the impact of the changes on OK-VQA research, the paper presents results on state-of-the-art models retrained with this new dataset. The side-by-side comparisons show that one method in particular, Knowledge Augmented Transformer for Vision-and-Language, extends its relative lead over competing methods. The dataset is available online.1","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128506844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NSV-TTS: Non-Speech Vocalization Modeling And Transfer In Emotional Text-To-Speech","authors":"Haitong Zhang, Xinyuan Yu, Yue Lin","doi":"10.1109/ICASSP49357.2023.10096033","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096033","url":null,"abstract":"This paper addresses the problem of non-speech vocalization (NSV) modeling and transfer in emotional TTS. We propose an emotion TTS system (NSV-TTS) to model NSV and emotional speech. The model utilizes self-supervised learning to extract unsupervised linguistic units (ULUs) for NSV labeling and zero-shot NSV transfer. Furthermore, we propose token mixing and random masking to boost the performance. We evaluate the proposed method on various NSV types and emotion classes. The experimental results reveal that the proposed method performs well in the zero-shot NSV transfer task. Lastly, we conduct ablation studies to investigate the proposed method further.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128385571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Probabilistic Distance Metric with Application in Gaussian Mixture Reduction","authors":"A. Sajedi, Y. Lawryshyn, K. Plataniotis","doi":"10.1109/ICASSP49357.2023.10096094","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096094","url":null,"abstract":"This paper presents a new distance metric to compare two continuous probability density functions. The main advantage of this metric is that, unlike other statistical measurements, it can provide an analytic, closed-form expression for a mixture of Gaussian distributions while satisfying all metric properties. These characteristics enable fast, stable, and efficient calculations, which are highly desirable in real-world signal processing applications. The application in mind is Gaussian Mixture Reduction (GMR), which is widely used in density estimation, recursive tracking, and belief propagation. To address this problem, we developed a novel algorithm dubbed the Optimization-based Greedy GMR (OGGMR), which employs our metric as a criterion to approximate a high-order Gaussian mixture with a lower order. Experimental results show that the OGGMR algorithm is significantly faster and more efficient than state-of-the-art GMR algorithms while retaining the geometric shape of the original mixture.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129040473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyi He, Di Zhang, Shumeng Liu, Yuezhi Zhou, Yaoxue Zhang
{"title":"Managing Information Updating with Edge Computing: A Distributed and Learning Approach","authors":"Junyi He, Di Zhang, Shumeng Liu, Yuezhi Zhou, Yaoxue Zhang","doi":"10.1109/ICASSP49357.2023.10095129","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095129","url":null,"abstract":"The rapid proliferation of some real-time applications (e.g., video surveillance) has driven enormous interest in maximizing information freshness, quantified by the age of information (AoI). For some computation-intensive updates such as images or videos, the real-time update processing requires intensive resources, which edge servers can provide in mobile edge computing (MEC). In this paper, we investigate information updating scheduling with multiple users in MEC. Due to the centralized algorithms’ limitations in distributed systems where users are self-interested, we investigate an efficient distributed scheduling algorithm. We model the information updating scheduling as an uncooperative game and propose a distributed algorithm to compute the unique Nash equilibrium. Considering the unavailability of some global network information, we propose a learning algorithm where each user learns how to make decisions based on observable information in a distributed manner. Extensive evaluation results show the efficiency of the proposed algorithms.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129046311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian O. Jordan, Thomas W. Sherson, R. Heusdens
{"title":"Convergence of Stochastic PDMM","authors":"Sebastian O. Jordan, Thomas W. Sherson, R. Heusdens","doi":"10.1109/ICASSP49357.2023.10095808","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095808","url":null,"abstract":"In recent years, the large increase in connected devices and the data that are collected by these devices have caused a heightened interest in distributed processing. Many practical distributed networks are of heterogeneous nature, because different devices in the network can have different specifications. Because of this, it is highly desirable that algorithms operating within these networks can operate asynchronously, since in that case there is no need for clock synchronisation between the nodes, and the algorithm is not slowed down by the slowest device in the network. In this paper, we focus on the primal-dual method of multipliers (PDMM), which is a promising distributed optimisation algorithm that is suitable for distributed optimisation in heterogeneous networks. Most theoretical work that can be found in existing literature focuses on synchronous versions of PDMM. In this work, we prove the convergence of stochastic PDMM, which is a general framework that can model variations such as asynchronous PDMM and PDMM with transmission losses.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129263743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José V. de M. Cardoso, Jiaxi Ying, Sandeep Kumar, D. Palomar
{"title":"Estimating Normalized Graph Laplacians in Financial Markets","authors":"José V. de M. Cardoso, Jiaxi Ying, Sandeep Kumar, D. Palomar","doi":"10.1109/ICASSP49357.2023.10095870","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095870","url":null,"abstract":"Gaussian Markov random fields, a class of graphical models, play an increasingly important role in real-world problems, where they are often applied to uncover conditional correlations between pairs of entities in a network. Motivated by recent applications of graphs in financial markets, we investigate the problem of learning undirected, weighted, normalized, graphical models. More precisely, we design an optimization algorithm to learn precision matrices that are modeled as normalized graph Laplacians. The proposed algorithm takes advantages of frameworks such as the alternating direction method of multipliers and projected gradient descent, which allows us to decompose the original problem into subproblems that can be solved efficiently. We demonstrate the empirical performance of the proposed algorithm, in comparison to state-of-the-art benchmark models, in a number of datasets involving financial time-series.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124589808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yangjun Wu, H. Zhang, Lingyan Liang, Yaqian Zhao, Kaihua Zhang
{"title":"Group-Wise Co-Salient Object Detection with Siamese Transformers Via Brownian Distance Covariance Matching","authors":"Yangjun Wu, H. Zhang, Lingyan Liang, Yaqian Zhao, Kaihua Zhang","doi":"10.1109/ICASSP49357.2023.10096177","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096177","url":null,"abstract":"Co-salient object detection (CoSOD) aims to discover and segment foreground targets in a group of images with the same semantic category. Existing mainstream approaches often employ convolutional neural networks (CNNs) to learn the semantic-invariant features from a group of images. Despite demonstrated success, there exist two limitations: 1) The CNNs introduce the inductive bias of locality that are difficult to model long-range dependency, limiting their feature representation capability. 2) Their models lack discriminability to differentiate semantic differences between different groups since only one group of images with the same semantic category has been taken into account for model training. To address these issues, this paper presents a Siamese Transformer architecture for CoSOD that can fully mine the group-wise semantic contrast information for more discriminative feature learning. Specifically, the designed Siamese Transformer takes two groups of images as input for feature contrastive learning. Each group is processed by a Transformer branch with shared weights to capture the long-range interaction information. Besides, to model the complex non-linear interactions between these two branches, we further design a Brownian distance covariance (BDC) module that uses joint distribution to measure the inter- and intra-group semantic similarity. The BDC can be efficiently calculated in closed form that can fully characterize independence for effective feature contrastive learning. Extensive evaluations on the three largest and most challenging benchmark datasets (CoSal2015, CoCA, and CoSOD3k) demonstrate the superiority of our method over a variety of state-of-the-art methods.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124627077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Channel Estimation in Massive MIMO with Heavy-Tailed Noise: Gaussian-Mixture Versus Cauchy Models","authors":"Ziya Gülgün, E. Larsson","doi":"10.1109/ICASSP49357.2023.10095925","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095925","url":null,"abstract":"Impulsive noise can appear in communication links. In the literature, it was demonstrated that when the noise is impulsive, standard Gaussian receivers perform poorly because of the outliers in the noise. Therefore, appropriate receivers must be used when the noise is impulsive. In this paper, we compare two types of massive multiple- input multiple-output (MIMO) receivers, namely those based on a Gaussian-mixture assumption and those based on a Cauchy assumption, in terms of channel estimation quality, when the noise is impulsive. Symmetric α-stable (SαS) noises are used to model impulsive noises in the paper. In the numerical results, the Gaussian-mixture receiver outperforms the Cauchy-based receiver.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125109624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}