Sebastian J. Schlecht, Leonardo Fierro, V. Välimäki, J. Backman
{"title":"Audio Peak Reduction Using a Synced allpass Filter","authors":"Sebastian J. Schlecht, Leonardo Fierro, V. Välimäki, J. Backman","doi":"10.1109/icassp43922.2022.9747877","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747877","url":null,"abstract":"Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while maintaining the total energy of the signal. In this paper, a new technique for linear peak amplitude reduction is proposed based on a Schroeder allpass filter, whose delay line and gain parameters are synced to match peaks of the signal’s auto-correlation function. The proposed method is compared with a previous search method and is shown to be often superior. An evaluation conducted over a variety of test signals indicates that the achieved peak reduction spans from 0 to 5 dB depending on the input waveform. The proposed method is widely applicable to real-time sound reproduction with a minimal computational processing budget.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126113385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-Target Stance Detection Via Refined Meta-Learning","authors":"Huishan Ji, Zheng Lin, Peng Fu, Weiping Wang","doi":"10.1109/icassp43922.2022.9746302","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746302","url":null,"abstract":"Cross-target stance detection (CTSD) aims to identify the stance of the text towards a target, where stance annotations are available for (though related but) different targets. Recently, models based on external semantic and emotion knowledge have been proposed for CTSD, achieving promising performance. However, such solutions rely on much external resources and harness only one source target, which is a waste of other available targets. To address the problem above, we propose a many-to-one CTSD model based on meta-learning. To make the most of meta-learning, we further refine it with a balanced and easy-to-hard learning pattern. Specifically, for multiple-target training, we feed the model according to the similarity among targets, and utilize two kinds of re-balanced strategies to deal with the imbalance in data. We conduct experiments on SemEval 2016 task 6, and results demonstrate that our method is effective and establishes a new state-of-the-art macro-f1 score for CTSD.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123442291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weilai Li, Lanfeng Zhong, Weixi Xiang, Tongzhou Kang, Dakun Lai
{"title":"A Novel Unsupervised Autoencoder-Based HFOs Detector in Intracranial EEG Signals","authors":"Weilai Li, Lanfeng Zhong, Weixi Xiang, Tongzhou Kang, Dakun Lai","doi":"10.1109/icassp43922.2022.9746014","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746014","url":null,"abstract":"High frequency oscillations (HFOs) have demonstrated their potency acting as an effective biomarker in epilepsy. However, most of the existing HFOs detectors are based on manual feature extraction and supervised learning, which incur laborious feature selection and time-consuming labeling process. In order to tackle these issues, we propose an automatic unsupervised HFOs detector based on convolutional variational autoencoder (CVAE). First, each selected HFO candidate (via an initial detection method) is converted into a 2-D time-frequency map (TFM) using continuous wavelet transform (CWT). Then, CVAE is trained on the red channel of the TFM (R-TFM) dataset so as to achieve the goal of dimensionality reduction and reconstruction of input feature. The reconstructed R-TFM dataset is later classified by K-means algorithm. Experimental results show that the proposed method outperforms four existing detectors, and achieve 92.85% in accuracy, 93.91% in sensitivity, and 92.14% in specificity.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123573126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech","authors":"Varun Krishna, Sriram Ganapathy","doi":"10.1109/icassp43922.2022.9747259","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747259","url":null,"abstract":"The automatic discovery of acoustic sub-word units from raw speech, without any text or labels, is a growing field of research. The key challenge is to derive representations of speech that can be categorized into a small number of phoneme-like units which are speaker invariant and can broadly capture the content variability of speech. In this work, we propose a novel neural network paradigm that uses the deep clustering loss along with the autoregressive contrastive predictive coding (CPC) loss. Both the loss functions, the CPC and the clustering loss, are self-supervised. The clustering cost involves the loss function using the phoneme-like labels generated with an iterative k-means algorithm. The inclusion of this loss ensures that the model representations can be categorized into a small number of automatic speech units. We experiment with several sub-tasks described as part of the Zerospeech 2021 challenge to illustrate the effectiveness of the framework. In these experiments, we show that proposed representation learning approach improves significantly over the previous self-supervision based models as well as the wav2vec family of models on a range of word-level similarity tasks and language modeling tasks.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123768822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu
{"title":"Text Adaptive Detection for Customizable Keyword Spotting","authors":"Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu","doi":"10.1109/icassp43922.2022.9746647","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746647","url":null,"abstract":"Always-on keyword spotting (KWS), i.e., wake word detection, has been widely used in many voice assistant applications running on smart devices. Although fixed wakeup word detection trained on specifically collected data has reached high performance, it is still challenging to build an arbitrarily customizable detection system on general found data. A deep learning classifier, similar to the one in speech recognition, can be used, but the detection performance is usually significantly degraded. In this work, we propose a novel text adaptive detection framework to directly formulate KWS as a detection rather than a classification problem. Here, the text prompt is used as input to promote biased classification, and a series of frame and sequence level detection criteria are employed to replace the cross-entropy criterion and directly optimize detection performance. Experiments on a keyword spotting version of Wall Street Journal (WSJ) dataset show that the text adaptive detection framework can achieve an average relative improvement of 16.88% in the detection metric F1-score compared to the baseline model.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125331879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao He, Zongcai Du, Wenfeng Li, Jie Tang, Gangshan Wu
{"title":"Pyramid Fusion Attention Network For Single Image Super-Resolution","authors":"Hao He, Zongcai Du, Wenfeng Li, Jie Tang, Gangshan Wu","doi":"10.1109/icassp43922.2022.9747609","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747609","url":null,"abstract":"Recently, convolutional neural network (CNN) has made a mighty advance in image super-resolution (SR). Most recent models exploit attention mechanism (AM) to focus on high-frequency information. However, these methods exclusively consider interdependencies among channels or spatials, leading to equal treatment of channel-wise or spatial-wise features thus hindering the power of AM. In this paper, we propose a pyramid fusion attention network (PFAN) to tackle this problem. Specifically, a novel pyramid fusion attention (PFA) is developed where stacked residual blocks are employed to model the relationship between pixels among all channels, and pyramid fusion structure is adopted to expand receptive field. Besides, a progressive backward fusion strat-egy is introduced to make full use of hierarchical features, which are beneficial to obtaining more contextual representations. Comprehensive experiments demonstrate the superiority of our proposed PFAN against state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125398352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Cui, Liyong Guo, Wenfeng Li, Peng Gao, Yujun Wang
{"title":"Multi-Scale Refinement Network Based Acoustic Echo Cancellation","authors":"Fan Cui, Liyong Guo, Wenfeng Li, Peng Gao, Yujun Wang","doi":"10.1109/ICASSP43922.2022.9747891","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747891","url":null,"abstract":"Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decrease the feature resolution lead to fine-grained information loss. This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. In the encoder stage, high-level features are obtained to get a coarse result. Then, the decoder layers with multiple refinement paths can directly refine the result with fine-grained features. Refinement paths with different feature scales are combined by learnable weights. The experimental results show that using the proposed multi-scale refinement structure can significantly improve the objective criteria. In the ICASSP 2022 Acoustic echo cancellation Challenge, our submitted system achieves an overall MOS score of 4.439 with 4.37 million parameters at a system latency of 40ms.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125502959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable Span Trade-Off Filter for Sound Zone Control with Kernel Interpolation Weighting","authors":"Jesper Brunnström, Shoichi Koyama, Marc Moonen","doi":"10.1109/ICASSP43922.2022.9746550","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746550","url":null,"abstract":"A sound zone control method is proposed, based on the frequency domain variable span trade-off filter (VAST). Existing VAST methods optimizes the sound field at a set of discrete points, while the proposed method uses kernel interpolation to instead optimize the sound field over a continuous region. When the loudspeaker positions are known, the performance can be improved further by applying a directional weighting to the interpolation procedure. The proposed method is evaluated by simulating broadband sound in a reverberant environment, focusing on the case when microphone placement is restricted. The proposed method with directional weighting outperforms the pointwise VAST over the full bandwidth of the signal, and the proposed method without directional weighting outperforms the pointwise VAST at low frequencies.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126640540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context","authors":"Zejia Fan, Jiaying Liu, Wenhan Yang, Wei Xiang, Zongming Guo","doi":"10.1109/ICASSP43922.2022.9746371","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746371","url":null,"abstract":"Video super-resolution methods typically rely on paired training data, in which the low-resolution frames are usually synthetically generated under predetermined degradation conditions (e.g., Bicubic downsampling). However, in real applications, it is labor-consuming and expensive to obtain this kind of training data, which limits the practical performance of these methods. To address the issue and get rid of the synthetic paired data, in this paper, we make exploration in utilizing the internal self-similarity redundancy within the video to build a Self-Learned Video Super-Resolution (SLVSR) method, which only needs to be trained on the input testing video itself. We employ a series of data augmentation strategies to make full use of the spatial and temporal context of the target video clips. The idea is applied to two branches of mainstream SR methods: frame fusion and frame recurrence methods. Since the former takes advantage of the short-term temporal consistency and the latter of the long-term one, our method can satisfy different practical situations. The experimental results show the superiority of our proposed method, especially in addressing the video super-resolution problems in real applications.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115516220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single-Shot Balanced Detector for Geospatial Object Detection","authors":"Yanfeng Liu, Qiang Li, Yuan Yuan, Qi Wang","doi":"10.1109/icassp43922.2022.9746853","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746853","url":null,"abstract":"Geospatial object detection is an essential task in remote sensing community. One-stage methods based on deep learning have faster running speed but cannot reach higher detection accuracy than two-stage methods. In this paper, to achieve excellent speed/accuracy trade-off for geospatial object detection, a single-shot balanced detector is presented. First, a balanced feature pyramid network (BFPN) is designed, which can balance semantic information and spatial information between high-level and shallow-level features adaptively. Second, we propose a task-interactive head (TIH). It can reduce the task misalignment between classification and regression. Extensive experiments show that the improved detector obtains significant detection accuracy with considerable speed on two benchmark datasets.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122293822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}