{"title":"SS-ADMM: Stationary and Sparse Granger Causal Discovery for Cortico-Muscular Coupling","authors":"Farwa Abbas, V. McClelland, Z. Cvetkovic, Wei Dai","doi":"10.1109/ICASSP49357.2023.10095111","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095111","url":null,"abstract":"Cortico-muscular communication patterns reveal important information about motor control. However, inferring significant causal relationships between motor cortex electroencephalogram (EEG) and surface electromyogram (sEMG) of concurrently active muscles is challenging since relevant processes involved in muscle control are relatively weak compared to additive noise and background activities. In this paper, a framework for identification of cortico-muscular linear time invariant communication is proposed that simultaneously estimates model order and its parameters by enforcing sparsity and stationarity conditions in a convex optimization program. The experimental results demonstrate that our proposed algorithm outperforms existing techniques for autoregressive model estimation, in terms of computational speed and model identification for causality estimation.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114361563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Zhang, Shubham Bansal, Aakash Lakhera, Jinzhu Li, G. Wang, Sandeepkumar Satpal, Sheng Zhao, Lei He
{"title":"LeanSpeech: The Microsoft Lightweight Speech Synthesis System for Limmits Challenge 2023","authors":"Chen Zhang, Shubham Bansal, Aakash Lakhera, Jinzhu Li, G. Wang, Sandeepkumar Satpal, Sheng Zhao, Lei He","doi":"10.1109/ICASSP49357.2023.10096039","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096039","url":null,"abstract":"This paper describes the Microsoft Text-to-Speech (TTS) system: LeanSpeech for LIMMITS (Lightweight, Multi-speaker, Multi-lingual Indic TTS) Challenge 20231, which is part of ICASSP2023 to encourage the advance of TTS in Indian Languages. We propose a lightweight encoder-decoder acoustic model composed of 1-D convolution and LSTM blocks, which is trained with knowledge distillation from a multi-speaker multi-lingual teacher model, DelightfulTTS [1]. The speech corpus is reprocessed and used in both AM training and vocoder fine-tuning. In Track-2 of the challenge, our system achieves MOS 4.56 and SMOS 3.98, which indicates the efficiency of the proposed model and training strategy.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114521800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defending Against Universal Patch Attacks by Restricting Token Attention in Vision Transformers","authors":"Hongwei Yu, Jiansheng Chen, Huimin Ma, Cheng Yu, Xinlong Ding","doi":"10.1109/ICASSP49357.2023.10096862","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096862","url":null,"abstract":"Previous works reveal that similar to CNNs, vision transformers (ViT) are also vulnerable to universal adversarial patch attacks. In this paper, we empirically reveal and mathematically explain that the shallow tokens in the transformer and the attention of the network can largely influence the classification result. Adversarial patches usually produce large feature norm for the corresponding shallow token vectors which can attract the attention anomalously. Inspired by this, we propose a restriction operation on the attention matrix, which effectively reduces the influence of the patch region. Experiments on ImageNet validate that our proposal can effectively improve ViT’s robustness towards white-box universal patch attacks while maintaining satisfactory classification accuracy for clean samples.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SEPDIFF: Speech Separation Based on Denoising Diffusion Model","authors":"Bo-Cheng Chen, Chao Wu, Wenbin Zhao","doi":"10.1109/ICASSP49357.2023.10095979","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095979","url":null,"abstract":"Speech separation aims to extract multiple speech sources from mixed signals. In this paper, we propose SepDiff - a monaural speech separation method based on the denoising diffusion model (diffusion model). By modifying the diffusion and reverse process, we show that the diffusion model achieves an impressive performance on speech separation. To generate speech sources, we use mel spectrogram of the mixture as a condition in the training procedure and insert it in every step of the sampling procedure. We propose a novel DNN structure to leverage local and global speech information through successive feature channel attention and dilated 2-D convolution blocks on multi-resolution time-frequency features. We use a neural vocoder to get waveform from the generated mel spectrogram. We evaluate SepDiff on LibriMix datasets. Compared to SepFormer approach, SepDiff yields a higher mean opinion score (MOS) of 0.11.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122147910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdulrahman Takiddin, R. Atat, Muhammad Ismail, K. Davis, E. Serpedin
{"title":"A Graph Neural Network Multi-Task Learning-Based Approach for Detection and Localization of Cyberattacks in Smart Grids","authors":"Abdulrahman Takiddin, R. Atat, Muhammad Ismail, K. Davis, E. Serpedin","doi":"10.1109/ICASSP49357.2023.10096822","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096822","url":null,"abstract":"False data injection attacks (FDIAs) on smart power grids’ measurement data present a threat to system stability. When malicious entities launch cyberattacks to manipulate the measurement data, different grid components will be affected, which leads to failures. For effective attack mitigation, two tasks are required: determining the status of the system (normal operation/under attack) and localizing the attacked bus/power substation. Existing mitigation techniques carry out these tasks separately and offer limited detection performance. In this paper, we propose a multi-task learning-based approach that performs both tasks simultaneously using a graph neural network (GNN) with stacked convolutional Chebyshev graph layers. Our results show that the proposed model presents superior system status identification and attack localization abilities with detection rates of 98.5−100% and 99 − 100%, respectively, presenting improvements of 5 − 30% compared to benchmarks.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116745481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miaomiao Zhang, Ji Chen, Xiaoyan Fu, Ge Xin, Jingzhi Zhang, Na Jiang, J. D’hooge
{"title":"Hankel Structured Low Rank and Sparse Representation Via L0-Norm Optimization for Compressed Ultrasound Plane Wave Signal Reconstruction","authors":"Miaomiao Zhang, Ji Chen, Xiaoyan Fu, Ge Xin, Jingzhi Zhang, Na Jiang, J. D’hooge","doi":"10.1109/ICASSP49357.2023.10095071","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095071","url":null,"abstract":"Ultrasound plane wave imaging is widely used in many applications thanks to its capability in reaching high frame rates. However, the amount of data acquisition and storage in a period of time can become a bottleneck in ultrasound system design for thousands frames per second. In our previous study, we proposed a low-rank and joint-sparse model to reduce the amount of sampled channel data of focused beam imaging by considering all the received data as a 2D matrix. However, for a single plane wave transmission, the number of channels is limited and the low-rank property of the received data matrix is no longer achieved. In this study, a L0-norm based Hankel structured low-rank and sparse model is proposed to reduce the channel data. An optimization algorithm, based on the alternating direction method of multipliers (ADMM), is proposed to efficiently solve the resulting optimization problem. The performance of the proposed approach was evaluated using the data published in Plane Wave Imaging Challenge in Medical Ultrasound (PICMUS) in 2016. Results on channel and plane wave data show that the proposed method is better adapted to the ultrasound channel signal and can recover the image with fewer samples than the conventional CS method.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116824222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data","authors":"Takashi Fukuda, Samuel Thomas","doi":"10.1109/ICASSP49357.2023.10095218","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095218","url":null,"abstract":"This paper proposes a novel modeling framework for effective training of end-to-end automatic speech recognition (ASR) models on various sources of data from diverse domains: speech paired with clean ground truth transcripts, speech with noisy pseudo transcripts from semi-supervised decodes and unpaired text-only data. In our proposed approach, we build a recurrent neural network transducer (RNN-T) model with a shared multimodal encoder, multi-branch prediction networks and a shared common joint network. To train on unpaired text-only data sets along with transcribed speech data, the shared encoder is trained to process both speech and text modalities. Differences in data from multiple domains are effectively handled by training a multi-branch prediction network on various different data sets before an interpolation step combines the multi-branch prediction networks back into a computationally-efficient single branch. We show the benefit of our proposed technique on several ASR test sets by comparing our models to those trained by simple data mixing. The technique provides a significant relative improvement of up to 6% over baseline systems operating at a similar decoding cost.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117077621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Yang, Hongbo Bo, Xinyu Yang, Jun Gao, Zijian Shi
{"title":"Conditional LS-GAN Based Skylight Polarization Image Restoration and Application in Meridian Localization","authors":"T. Yang, Hongbo Bo, Xinyu Yang, Jun Gao, Zijian Shi","doi":"10.1109/ICASSP49357.2023.10096855","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096855","url":null,"abstract":"Skylight polarization images (SPIs) contain crucial spatial information that can be used for navigation purposes. Under most circumstances, the quality of the images becomes a major concern, especially when there is blocking between the perception equipment and the sky. This paper introduces a deep learning-based methodology for restoring SPIs and utilizing the restored images for navigation. First, an adversarial training paradigm is adopted to restore the SPIs collected in a severe blocking environment. We utilize a self-developed simulation method that uses solar altitude and azimuth angle to generate ground truth, and no prior knowledge of the masking information for noise is used. Second, we show how to locate the meridian based on the restored SPIs using a residual neural network. In experiments, we demonstrate the superiority of the proposed model in restoring SPIs, and the enhanced meridian localization precision by using the restored SPIs.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117312975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naman Khetan, Tushar Arora, S. U. Rehman, Deepak K. Gupta
{"title":"Implicitly Rotation Equivariant Neural Networks","authors":"Naman Khetan, Tushar Arora, S. U. Rehman, Deepak K. Gupta","doi":"10.1109/ICASSP49357.2023.10095020","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095020","url":null,"abstract":"Convolutional Neural Networks (CNN) are inherently equivariant under translations, however, they do not have an equivalent embedded mechanism to handle other transformations such as rotations. The existing solutions require redesigning standard networks with filters mapped from combinations of predefined basis involving complex analytical functions. Such formulations are hard to implement as well as the imposed restrictions in the choice of basis can lead to model weights that are sub-optimal for the primary deep learning task (e.g. classification). We propose Implicitly Equivariant Network (IEN) which induces approximate equivariance in the different layers of a standard CNN by optimizing a multi-objective loss function. We show for ResNet models on Rot-MNIST and Rot-TinyImageNet that even with its simple formulation, IEN performs at par or even better than steerable networks. Also, IEN facilitates construction of heterogeneous filter groups allowing reduction in the number of channels in CNNs by a factor of over 30%. Further, we demonstrate that for the hard problem of visual object tracking, IEN outperforms the state-of-the-art rotation equivariant tracking method while providing faster inference speed.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129518166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Surrogate Based Post-HOC Calibration for Distributional Shift","authors":"Jun Zhang","doi":"10.1109/icassp49357.2023.10096090","DOIUrl":"https://doi.org/10.1109/icassp49357.2023.10096090","url":null,"abstract":"This paper focuses on improving the calibration performance of the post-hoc approach for the distributional shift. Taking the popular temperature scaling (TS) as a case in point, the key task is finding a matched temperature for the shifted test set. To address this issue, we pose an insight that temperature is strongly correlated with the shifting intensity by a tiny experiment. Based on the finding, we propose a simple yet effective approach named Surrogate Based Temperature Scaling (SBTS), where the surrogate model is trained to map the relationship between temperature and the shifting intensity. Empirical experimental results of various shift types on the CIFAR-10 and CIFAR-100 demonstrate that SBTS can significantly improve the calibration performance under distributional shift.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129550785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}