ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Audio Peak Reduction Using a Synced allpass Filter 使用同步全通滤波器的音频峰值降低
Sebastian J. Schlecht, Leonardo Fierro, V. Välimäki, J. Backman
{"title":"Audio Peak Reduction Using a Synced allpass Filter","authors":"Sebastian J. Schlecht, Leonardo Fierro, V. Välimäki, J. Backman","doi":"10.1109/icassp43922.2022.9747877","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747877","url":null,"abstract":"Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while maintaining the total energy of the signal. In this paper, a new technique for linear peak amplitude reduction is proposed based on a Schroeder allpass filter, whose delay line and gain parameters are synced to match peaks of the signal’s auto-correlation function. The proposed method is compared with a previous search method and is shown to be often superior. An evaluation conducted over a variety of test signals indicates that the achieved peak reduction spans from 0 to 5 dB depending on the input waveform. The proposed method is widely applicable to real-time sound reproduction with a minimal computational processing budget.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126113385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Target Stance Detection Via Refined Meta-Learning 基于改进元学习的交叉目标姿态检测
Huishan Ji, Zheng Lin, Peng Fu, Weiping Wang
{"title":"Cross-Target Stance Detection Via Refined Meta-Learning","authors":"Huishan Ji, Zheng Lin, Peng Fu, Weiping Wang","doi":"10.1109/icassp43922.2022.9746302","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746302","url":null,"abstract":"Cross-target stance detection (CTSD) aims to identify the stance of the text towards a target, where stance annotations are available for (though related but) different targets. Recently, models based on external semantic and emotion knowledge have been proposed for CTSD, achieving promising performance. However, such solutions rely on much external resources and harness only one source target, which is a waste of other available targets. To address the problem above, we propose a many-to-one CTSD model based on meta-learning. To make the most of meta-learning, we further refine it with a balanced and easy-to-hard learning pattern. Specifically, for multiple-target training, we feed the model according to the similarity among targets, and utilize two kinds of re-balanced strategies to deal with the imbalance in data. We conduct experiments on SemEval 2016 task 6, and results demonstrate that our method is effective and establishes a new state-of-the-art macro-f1 score for CTSD.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123442291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Novel Unsupervised Autoencoder-Based HFOs Detector in Intracranial EEG Signals 一种新的基于无监督自编码器的颅内脑电信号HFOs检测器
Weilai Li, Lanfeng Zhong, Weixi Xiang, Tongzhou Kang, Dakun Lai
{"title":"A Novel Unsupervised Autoencoder-Based HFOs Detector in Intracranial EEG Signals","authors":"Weilai Li, Lanfeng Zhong, Weixi Xiang, Tongzhou Kang, Dakun Lai","doi":"10.1109/icassp43922.2022.9746014","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746014","url":null,"abstract":"High frequency oscillations (HFOs) have demonstrated their potency acting as an effective biomarker in epilepsy. However, most of the existing HFOs detectors are based on manual feature extraction and supervised learning, which incur laborious feature selection and time-consuming labeling process. In order to tackle these issues, we propose an automatic unsupervised HFOs detector based on convolutional variational autoencoder (CVAE). First, each selected HFO candidate (via an initial detection method) is converted into a 2-D time-frequency map (TFM) using continuous wavelet transform (CWT). Then, CVAE is trained on the red channel of the TFM (R-TFM) dataset so as to achieve the goal of dimensionality reduction and reconstruction of input feature. The reconstructed R-TFM dataset is later classified by K-means algorithm. Experimental results show that the proposed method outperforms four existing detectors, and achieve 92.85% in accuracy, 93.91% in sensitivity, and 92.14% in specificity.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123573126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech 基于深度聚类的自监督表示学习在原始语音声学单元发现中的应用
Varun Krishna, Sriram Ganapathy
{"title":"Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech","authors":"Varun Krishna, Sriram Ganapathy","doi":"10.1109/icassp43922.2022.9747259","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747259","url":null,"abstract":"The automatic discovery of acoustic sub-word units from raw speech, without any text or labels, is a growing field of research. The key challenge is to derive representations of speech that can be categorized into a small number of phoneme-like units which are speaker invariant and can broadly capture the content variability of speech. In this work, we propose a novel neural network paradigm that uses the deep clustering loss along with the autoregressive contrastive predictive coding (CPC) loss. Both the loss functions, the CPC and the clustering loss, are self-supervised. The clustering cost involves the loss function using the phoneme-like labels generated with an iterative k-means algorithm. The inclusion of this loss ensures that the model representations can be categorized into a small number of automatic speech units. We experiment with several sub-tasks described as part of the Zerospeech 2021 challenge to illustrate the effectiveness of the framework. In these experiments, we show that proposed representation learning approach improves significantly over the previous self-supervision based models as well as the wav2vec family of models on a range of word-level similarity tasks and language modeling tasks.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123768822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Text Adaptive Detection for Customizable Keyword Spotting 文本自适应检测自定义关键字发现
Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu
{"title":"Text Adaptive Detection for Customizable Keyword Spotting","authors":"Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu","doi":"10.1109/icassp43922.2022.9746647","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746647","url":null,"abstract":"Always-on keyword spotting (KWS), i.e., wake word detection, has been widely used in many voice assistant applications running on smart devices. Although fixed wakeup word detection trained on specifically collected data has reached high performance, it is still challenging to build an arbitrarily customizable detection system on general found data. A deep learning classifier, similar to the one in speech recognition, can be used, but the detection performance is usually significantly degraded. In this work, we propose a novel text adaptive detection framework to directly formulate KWS as a detection rather than a classification problem. Here, the text prompt is used as input to promote biased classification, and a series of frame and sequence level detection criteria are employed to replace the cross-entropy criterion and directly optimize detection performance. Experiments on a keyword spotting version of Wall Street Journal (WSJ) dataset show that the text adaptive detection framework can achieve an average relative improvement of 16.88% in the detection metric F1-score compared to the baseline model.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125331879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pyramid Fusion Attention Network For Single Image Super-Resolution 单幅图像超分辨率金字塔融合注意网络
Hao He, Zongcai Du, Wenfeng Li, Jie Tang, Gangshan Wu
{"title":"Pyramid Fusion Attention Network For Single Image Super-Resolution","authors":"Hao He, Zongcai Du, Wenfeng Li, Jie Tang, Gangshan Wu","doi":"10.1109/icassp43922.2022.9747609","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747609","url":null,"abstract":"Recently, convolutional neural network (CNN) has made a mighty advance in image super-resolution (SR). Most recent models exploit attention mechanism (AM) to focus on high-frequency information. However, these methods exclusively consider interdependencies among channels or spatials, leading to equal treatment of channel-wise or spatial-wise features thus hindering the power of AM. In this paper, we propose a pyramid fusion attention network (PFAN) to tackle this problem. Specifically, a novel pyramid fusion attention (PFA) is developed where stacked residual blocks are employed to model the relationship between pixels among all channels, and pyramid fusion structure is adopted to expand receptive field. Besides, a progressive backward fusion strat-egy is introduced to make full use of hierarchical features, which are beneficial to obtaining more contextual representations. Comprehensive experiments demonstrate the superiority of our proposed PFAN against state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125398352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Scale Refinement Network Based Acoustic Echo Cancellation 基于多尺度细化网络的声回波抵消
Fan Cui, Liyong Guo, Wenfeng Li, Peng Gao, Yujun Wang
{"title":"Multi-Scale Refinement Network Based Acoustic Echo Cancellation","authors":"Fan Cui, Liyong Guo, Wenfeng Li, Peng Gao, Yujun Wang","doi":"10.1109/ICASSP43922.2022.9747891","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747891","url":null,"abstract":"Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decrease the feature resolution lead to fine-grained information loss. This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. In the encoder stage, high-level features are obtained to get a coarse result. Then, the decoder layers with multiple refinement paths can directly refine the result with fine-grained features. Refinement paths with different feature scales are combined by learnable weights. The experimental results show that using the proposed multi-scale refinement structure can significantly improve the objective criteria. In the ICASSP 2022 Acoustic echo cancellation Challenge, our submitted system achieves an overall MOS score of 4.439 with 4.37 million parameters at a system latency of 40ms.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125502959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Variable Span Trade-Off Filter for Sound Zone Control with Kernel Interpolation Weighting 带核插值加权的声音区域控制变跨度权衡滤波器
Jesper Brunnström, Shoichi Koyama, Marc Moonen
{"title":"Variable Span Trade-Off Filter for Sound Zone Control with Kernel Interpolation Weighting","authors":"Jesper Brunnström, Shoichi Koyama, Marc Moonen","doi":"10.1109/ICASSP43922.2022.9746550","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746550","url":null,"abstract":"A sound zone control method is proposed, based on the frequency domain variable span trade-off filter (VAST). Existing VAST methods optimizes the sound field at a set of discrete points, while the proposed method uses kernel interpolation to instead optimize the sound field over a continuous region. When the loudspeaker positions are known, the performance can be improved further by applying a directional weighting to the interpolation procedure. The proposed method is evaluated by simulating broadband sound in a reverberant environment, focusing on the case when microphone placement is restricted. The proposed method with directional weighting outperforms the pointwise VAST over the full bandwidth of the signal, and the proposed method without directional weighting outperforms the pointwise VAST at low frequencies.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126640540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context 具有增强空间和时间背景的自学视频超分辨率
Zejia Fan, Jiaying Liu, Wenhan Yang, Wei Xiang, Zongming Guo
{"title":"Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context","authors":"Zejia Fan, Jiaying Liu, Wenhan Yang, Wei Xiang, Zongming Guo","doi":"10.1109/ICASSP43922.2022.9746371","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746371","url":null,"abstract":"Video super-resolution methods typically rely on paired training data, in which the low-resolution frames are usually synthetically generated under predetermined degradation conditions (e.g., Bicubic downsampling). However, in real applications, it is labor-consuming and expensive to obtain this kind of training data, which limits the practical performance of these methods. To address the issue and get rid of the synthetic paired data, in this paper, we make exploration in utilizing the internal self-similarity redundancy within the video to build a Self-Learned Video Super-Resolution (SLVSR) method, which only needs to be trained on the input testing video itself. We employ a series of data augmentation strategies to make full use of the spatial and temporal context of the target video clips. The idea is applied to two branches of mainstream SR methods: frame fusion and frame recurrence methods. Since the former takes advantage of the short-term temporal consistency and the latter of the long-term one, our method can satisfy different practical situations. The experimental results show the superiority of our proposed method, especially in addressing the video super-resolution problems in real applications.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115516220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-Shot Balanced Detector for Geospatial Object Detection 用于地理空间目标检测的单镜头平衡检测器
Yanfeng Liu, Qiang Li, Yuan Yuan, Qi Wang
{"title":"Single-Shot Balanced Detector for Geospatial Object Detection","authors":"Yanfeng Liu, Qiang Li, Yuan Yuan, Qi Wang","doi":"10.1109/icassp43922.2022.9746853","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746853","url":null,"abstract":"Geospatial object detection is an essential task in remote sensing community. One-stage methods based on deep learning have faster running speed but cannot reach higher detection accuracy than two-stage methods. In this paper, to achieve excellent speed/accuracy trade-off for geospatial object detection, a single-shot balanced detector is presented. First, a balanced feature pyramid network (BFPN) is designed, which can balance semantic information and spatial information between high-level and shallow-level features adaptively. Second, we propose a task-interactive head (TIH). It can reduce the task misalignment between classification and regression. Extensive experiments show that the improved detector obtains significant detection accuracy with considerable speed on two benchmark datasets.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122293822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信