{"title":"Wishart Localization Prior On Spatial Covariance Matrix In Ambisonic Source Separation Using Non-Negative Tensor Factorization","authors":"Mateusz Guzik, K. Kowalczyk","doi":"10.1109/ICASSP43922.2022.9746222","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746222","url":null,"abstract":"This paper presents an extension of the existing Non-negative Tensor Factorization (NTF) based method for sound source separation under reverberant conditions, formulated for Ambisonic microphone mixture signals. In particular, we address the problem of optimal exploitation of the prior knowledge concerning the source localization, through the formulation of a suitable Maximum a Posteriori (MAP) framework. Within the presented approach, the magnitude spectrograms are modelled by the NTF and the individual source Spatial Covariance Matrices (SCM) are approximated as a sum of anechoic Spherical Harmonic (SH) components, weighted with the so-called spatial selector. We constrain the SCM using the Wishart distribution, which leads to a new posterior probability and in turn to the derivation of the extended update rules. The proposed solution avoids the issues encountered in the original method, related to the empirical binary initialization strategy for the spatial selector weights, which due to multiplicative update rules may result in sound coming from certain directions not being taken into account. The proposed method is evaluated against the original algorithm and another recently proposed Expectation Maximization (EM) algorithm that also incorporates a spatial localization prior, showing improved separation performance in experiments with first-order Ambisonic recordings of musical instruments and speech utterances.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133902049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Patch Steganalysis: A Sampling Based Defense Against Adversarial Steganography","authors":"Chuan Qin, Na Zhao, Weiming Zhang, Nenghai Yu","doi":"10.1109/icassp43922.2022.9747638","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747638","url":null,"abstract":"In recent years, the classification accuracy of CNN (convolutional neural network) steganalyzers has rapidly improved. However, as general CNN classifiers will misclassify adversarial samples, CNN steganalyzers can hardly detect adversarial steganography, which combines adversarial samples and steganography. Adversarial training and preprocessing are two effective methods to defend against adversarial samples. But literature shows adversarial training is ineffective for adversarial steganography. Steganographic modifications will also be destroyed by preprocessing, which aims to wipe out adversarial perturbations. In this paper, we propose a novel sampling based defense method for steganalysis. Specifically, by sampling image patches, CNN steganalyzers can bypass the sparse adversarial perturbations and extract effective features. Additionally, by calculating statistical vectors and regrouping deep features, the impact on the classification accuracy of common samples is effectively compressed. The experiments show that the proposed method can significantly improve the robustness against adversarial steganography without adversarial training.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131834370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quan Zhang, Xuyang Zhao, Jiangtao Wang, Yongchao Wang
{"title":"Designing a QAM Signal Detector for Massive Mimo Systems via PS-ADMM Approach","authors":"Quan Zhang, Xuyang Zhao, Jiangtao Wang, Yongchao Wang","doi":"10.1109/icassp43922.2022.9747281","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747281","url":null,"abstract":"This paper presents an efficient quadrature amplitude modulation (QAM) signal detector for massive multiple-input multiple-output (MIMO) communication systems via the penalty-sharing alternating direction method of multipliers (PS-ADMM). The content of the paper is summarized as follows: first, we formulate QAM-MIMO detection as a maximum-likelihood optimization problem with bound relaxation constraints. Decomposing QAM signals into a sum of multiple binary variables and exploiting introduced binary variables as penalty functions, we transform the detection optimization model to a non-convex sharing problem; second, a customized ADMM algorithm is presented to solve the formulated non-convex optimization problem. In the implementation, all variables can be solved analytically and in parallel; third, it is proved that the proposed PS-ADMM algorithm converges under mild conditions. Simulation results demonstrate the effectiveness of the proposed approach.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131836901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Point-Mass Filter with Decomposition of Transient Density","authors":"P. Tichavský, O. Straka, J. Duník","doi":"10.1109/icassp43922.2022.9747607","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747607","url":null,"abstract":"The paper deals with the state estimation of nonlinear stochastic dynamic systems with special attention on a grid-based numerical solution to the Bayesian recursive relations, the point-mass filter (PMF). In the paper, a novel functional decomposition of the transient density describing the system dynamics is proposed. The decomposition is based on a non-negative matrix factorization and separates the density into functions of the future and current states. Such decomposition facilitates a thrifty calculation of the convolution, which is a bottleneck of the PMF performance. The PMF estimate quality and computational costs can be efficiently controlled by choosing an appropriate rank of the decomposition. The performance of the PMF with the transient density decomposition is illustrated in a terrain-aided navigation scenario.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129391911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen M. Meng
{"title":"A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition","authors":"Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen M. Meng","doi":"10.1109/icassp43922.2022.9746116","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746116","url":null,"abstract":"Speaker Change Detection (SCD) is a task of determining the time boundaries between speech segments of different speakers. SCD system can be applied to many tasks, such as speaker diarization, speaker tracking, and transcribing audio with multiple speakers. Recent advancements in deep learning lead to approaches that can directly detect the speaker change points from audio data at the frame-level based on neural network models. These approaches may be further improved by utilizing speaker information in the training data, and utilizing content information extracted in an unsupervised manner. This work proposes a novel framework for the SCD task, which utilizes a multitask learning architecture to leverage speaker information during the training stage, and adds the content information extracted from an unsupervised speech decomposition model to help detect the speaker change points. Experiment results show that the architecture of multitask learning with speaker information can improve the performance of SCD, and adding content information extracted from unsupervised speech decomposition model can further improve the performance. To the best of our knowledge, this work outperforms the state-of-the-art SCD results [1] on the AMI dataset.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130727519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Stage Contrastive Learning Framework For Imbalanced Aerial Scene Recognition","authors":"Lexing Huang, Senlin Cai, Yihong Zhuang, Changxing Jing, Yue Huang, Xiaotong Tu, Xinghao Ding","doi":"10.1109/icassp43922.2022.9746248","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746248","url":null,"abstract":"In real-world scenarios, aerial image datasets are generally class imbalanced, where the majority classes have rich samples, while the minority classes only have a few samples. Such class imbalanced datasets bring great challenges to aerial scene recognition. In this paper, we explore a novel two-stage contrastive learning framework, which aims to take care of representation learning and classifier learning, thereby boosting aerial scene recognition. Specifically, in the representation learning stage, we design a data augmentation policy to improve the potential of contrastive learning according to the characteristics of aerial images. And we employ supervised contrastive learning to learn the association between aerial images of the same scene. In the classification learning stage, we fix the encoder to maintain good representation and use the re-balancing strategy to train a less biased classifier. A variety of experimental results on the imbalanced aerial image datasets show the advantages of the proposed two-stage contrastive learning framework for the imbalanced aerial scene recognition.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"87 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130922722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convex Clustering for Autocorrelated Time Series","authors":"Max Revay, V. Solo","doi":"10.1109/icassp43922.2022.9747143","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747143","url":null,"abstract":"While clustering in general is a heavily worked area, clustering of auto-correlated time series (CATS) has received relatively little attention. Here, we develop a convex clustering algorithm suited to auto-correlated time series and compare it with a state of the art method. We find the proposed algorithm is able to more accurately identify the true clusters.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131192679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain Adaptation for Speaker Recognition in Singing and Spoken Voice","authors":"Anurag Chowdhury, Austin Cozzo, A. Ross","doi":"10.1109/icassp43922.2022.9746111","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746111","url":null,"abstract":"In this work, we study the effect of speaking style and audio condition variability between the spoken and singing voice on speaker recognition performance. Furthermore, we also explore the utility of domain adaptation for bridging the gap between multiple speaking styles (singing versus spoken) and improving overall speaker recognition performance. In that regard, we first extend a publicly available singing voice dataset, JukeBox, with corresponding spoken voice data and refer to it as JukeBox-V2. Next, we use domain adaptation for developing a speaker recognition method robust to varying speaking styles and audio conditions. Finally, we analyze the speech embeddings of domain-adapted models to explain their generalizability across varying speaking styles and audio conditions.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132850386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mewe-Hezoudah Kahanam, L. Brusquet, Ségolène Martin, J. Pesquet
{"title":"A Non-Convex Proximal Approach for Centroid-Based Classification","authors":"Mewe-Hezoudah Kahanam, L. Brusquet, Ségolène Martin, J. Pesquet","doi":"10.1109/ICASSP43922.2022.9747071","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747071","url":null,"abstract":"In this paper, we propose a novel variational approach for supervised classification based on transform learning. Our approach consists of formulating an optimization problem on both the transform matrix and the centroids of the classes in a low-dimensional transformed space. The loss function is based on the distance to the centroids, which can be chosen in a flexible manner. To avoid trivial solutions or highly correlated clusters, our model incorporates a penalty term on the centroids, which encourages them to be separated. The resulting non-convex and non-smooth minimization problem is then solved by a primal-dual alternating minimization strategy. We assess the performance of our method on a bunch of supervised classification problems and compare it to state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133156341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinxin Shan, Tai Ma, Anqi Gu, Haibin Cai, Ying Wen
{"title":"TCRNet: Make Transformer, CNN and RNN Complement Each Other","authors":"Xinxin Shan, Tai Ma, Anqi Gu, Haibin Cai, Ying Wen","doi":"10.1109/icassp43922.2022.9747716","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747716","url":null,"abstract":"Recently, several Transformer-based methods have been presented to improve image segmentation. However, since Transformer needs regular square images and has difficulty in obtaining local feature information, the performance of image segmentation is seriously affected. In this paper, we propose a novel encoder-decoder network named TCRNet, which makes Transformer, Convolutional neural network (CNN) and Recurrent neural network (RNN) complement each other. In the encoder, we extract and concatenate the feature maps from Transformer and CNN to effectively capture global and local feature information of images. Then in the decoder, we utilize convolutional RNN in the proposed recurrent decoding unit to refine the feature maps from the decoder for finer prediction. Experimental results on three medical datasets demonstrate that TCRNet effectively improves the segmentation precision.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133174166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}