ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第3页

Audio Peak Reduction Using a Synced allpass Filter 使用同步全通滤波器的音频峰值降低

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9747877

Sebastian J. Schlecht, Leonardo Fierro, V. Välimäki, J. Backman

引用次数: 0

Cross-Target Stance Detection Via Refined Meta-Learning 基于改进元学习的交叉目标姿态检测

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746302

Huishan Ji, Zheng Lin, Peng Fu, Weiping Wang

引用次数: 3

A Novel Unsupervised Autoencoder-Based HFOs Detector in Intracranial EEG Signals 一种新的基于无监督自编码器的颅内脑电信号HFOs检测器

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746014

Weilai Li, Lanfeng Zhong, Weixi Xiang, Tongzhou Kang, Dakun Lai

引用次数: 0

Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech 基于深度聚类的自监督表示学习在原始语音声学单元发现中的应用

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9747259

Varun Krishna, Sriram Ganapathy

{"title":"Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech","authors":"Varun Krishna, Sriram Ganapathy","doi":"10.1109/icassp43922.2022.9747259","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747259","url":null,"abstract":"The automatic discovery of acoustic sub-word units from raw speech, without any text or labels, is a growing field of research. The key challenge is to derive representations of speech that can be categorized into a small number of phoneme-like units which are speaker invariant and can broadly capture the content variability of speech. In this work, we propose a novel neural network paradigm that uses the deep clustering loss along with the autoregressive contrastive predictive coding (CPC) loss. Both the loss functions, the CPC and the clustering loss, are self-supervised. The clustering cost involves the loss function using the phoneme-like labels generated with an iterative k-means algorithm. The inclusion of this loss ensures that the model representations can be categorized into a small number of automatic speech units. We experiment with several sub-tasks described as part of the Zerospeech 2021 challenge to illustrate the effectiveness of the framework. In these experiments, we show that proposed representation learning approach improves significantly over the previous self-supervision based models as well as the wav2vec family of models on a range of word-level similarity tasks and language modeling tasks.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123768822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Text Adaptive Detection for Customizable Keyword Spotting 文本自适应检测自定义关键字发现

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746647

Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu

{"title":"Text Adaptive Detection for Customizable Keyword Spotting","authors":"Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu","doi":"10.1109/icassp43922.2022.9746647","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746647","url":null,"abstract":"Always-on keyword spotting (KWS), i.e., wake word detection, has been widely used in many voice assistant applications running on smart devices. Although fixed wakeup word detection trained on specifically collected data has reached high performance, it is still challenging to build an arbitrarily customizable detection system on general found data. A deep learning classifier, similar to the one in speech recognition, can be used, but the detection performance is usually significantly degraded. In this work, we propose a novel text adaptive detection framework to directly formulate KWS as a detection rather than a classification problem. Here, the text prompt is used as input to promote biased classification, and a series of frame and sequence level detection criteria are employed to replace the cross-entropy criterion and directly optimize detection performance. Experiments on a keyword spotting version of Wall Street Journal (WSJ) dataset show that the text adaptive detection framework can achieve an average relative improvement of 16.88% in the detection metric F1-score compared to the baseline model.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125331879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Pyramid Fusion Attention Network For Single Image Super-Resolution 单幅图像超分辨率金字塔融合注意网络

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9747609

Hao He, Zongcai Du, Wenfeng Li, Jie Tang, Gangshan Wu

引用次数: 1

Multi-Scale Refinement Network Based Acoustic Echo Cancellation 基于多尺度细化网络的声回波抵消

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/ICASSP43922.2022.9747891

Fan Cui, Liyong Guo, Wenfeng Li, Peng Gao, Yujun Wang

引用次数: 3

Variable Span Trade-Off Filter for Sound Zone Control with Kernel Interpolation Weighting 带核插值加权的声音区域控制变跨度权衡滤波器

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/ICASSP43922.2022.9746550

Jesper Brunnström, Shoichi Koyama, Marc Moonen

引用次数: 6

Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context 具有增强空间和时间背景的自学视频超分辨率

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/ICASSP43922.2022.9746371

Zejia Fan, Jiaying Liu, Wenhan Yang, Wei Xiang, Zongming Guo

{"title":"Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context","authors":"Zejia Fan, Jiaying Liu, Wenhan Yang, Wei Xiang, Zongming Guo","doi":"10.1109/ICASSP43922.2022.9746371","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746371","url":null,"abstract":"Video super-resolution methods typically rely on paired training data, in which the low-resolution frames are usually synthetically generated under predetermined degradation conditions (e.g., Bicubic downsampling). However, in real applications, it is labor-consuming and expensive to obtain this kind of training data, which limits the practical performance of these methods. To address the issue and get rid of the synthetic paired data, in this paper, we make exploration in utilizing the internal self-similarity redundancy within the video to build a Self-Learned Video Super-Resolution (SLVSR) method, which only needs to be trained on the input testing video itself. We employ a series of data augmentation strategies to make full use of the spatial and temporal context of the target video clips. The idea is applied to two branches of mainstream SR methods: frame fusion and frame recurrence methods. Since the former takes advantage of the short-term temporal consistency and the latter of the long-term one, our method can satisfy different practical situations. The experimental results show the superiority of our proposed method, especially in addressing the video super-resolution problems in real applications.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115516220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Single-Shot Balanced Detector for Geospatial Object Detection 用于地理空间目标检测的单镜头平衡检测器

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746853

Yanfeng Liu, Qiang Li, Yuan Yuan, Qi Wang

引用次数: 10