ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder 多声道变分自编码器混响混响的联合分离与去噪
S. Inoue, H. Kameoka, Li Li, Shogo Seki, S. Makino
{"title":"Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder","authors":"S. Inoue, H. Kameoka, Li Li, Shogo Seki, S. Makino","doi":"10.1109/ICASSP.2019.8683497","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683497","url":null,"abstract":"In this paper, we deal with a multichannel source separation problem under a highly reverberant condition. The multichan- nel variational autoencoder (MVAE) is a recently proposed source separation method that employs the decoder distribu- tion of a conditional VAE (CVAE) as the generative model for the complex spectrograms of the underlying source sig- nals. Although MVAE is notable in that it can significantly improve the source separation performance compared with conventional methods, its capability to separate highly rever- berant mixtures is still limited since MVAE uses an instan- taneous mixture model. To overcome this limitation, in this paper we propose extending MVAE to simultaneously solve source separation and dereverberation problems by formulat- ing the separation system as a frequency-domain convolutive mixture model. A convergence-guaranteed algorithm based on the coordinate descent method is derived for the optimiza- tion. Experimental results revealed that the proposed method outperformed the conventional methods in terms of all the source separation criteria in highly reverberant environments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"96-100"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84952009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Single-channel Speech Extraction Using Speaker Inventory and Attention Network 基于说话人清单和注意网络的单通道语音提取
Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, D. Dimitriadis, J. Droppo, Y. Gong
{"title":"Single-channel Speech Extraction Using Speaker Inventory and Attention Network","authors":"Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, D. Dimitriadis, J. Droppo, Y. Gong","doi":"10.1109/ICASSP.2019.8682245","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682245","url":null,"abstract":"Neural network-based speech separation has received a surge of interest in recent years. Previously proposed methods either are speaker independent or extract a target speaker’s voice by using his or her voice snippet. In applications such as home devices or office meeting transcriptions, a possible speaker list is available, which can be leveraged for speech separation. This paper proposes a novel speech extraction method that utilizes an inventory of voice snippets of possible interfering speakers, or speaker enrollment data, in addition to that of the target speaker. Furthermore, an attention-based network architecture is proposed to form time-varying masks for both the target and other speakers during the separation process. This architecture does not reduce the enrollment audio of each speaker into a single vector, thereby allowing each short time frame of the input mixture signal to be aligned and accurately compared with the enrollment signals. We evaluate the proposed system on a speaker extraction task derived from the Libri corpus and show the effectiveness of the method.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"124 1","pages":"86-90"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83544170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Non-local Self-attention Structure for Function Approximation in Deep Reinforcement Learning 深度强化学习中函数逼近的非局部自注意结构
Z. Wang, Xi Xiao, Guangwu Hu, Yao Yao, Dianyan Zhang, Zhendong Peng, Qing Li, Shutao Xia
{"title":"Non-local Self-attention Structure for Function Approximation in Deep Reinforcement Learning","authors":"Z. Wang, Xi Xiao, Guangwu Hu, Yao Yao, Dianyan Zhang, Zhendong Peng, Qing Li, Shutao Xia","doi":"10.1109/ICASSP.2019.8682832","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682832","url":null,"abstract":"Reinforcement learning is a framework to make sequential decisions. The combination with deep neural networks further improves the ability of this framework. Convolutional nerual networks make it possible to make sequential decisions based on raw pixels information directly and make reinforcement learning achieve satisfying performances in series of tasks. However, convolutional neural networks still have own limitations in representing geometric patterns and long-term dependencies that occur consistently in state inputs. To tackle with the limitation, we propose the self-attention architecture to augment the original network. It provides a better balance between ability to model long-range dependencies and computational efficiency. Experiments on Atari games illustrate that self-attention structure is significantly effective for function approximation in deep reinforcement learning.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"146 1","pages":"3042-3046"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86091462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spiking Neural Network Approach to Auditory Source Lateralisation 听觉源侧化的脉冲神经网络方法
R. Luke, D. McAlpine
{"title":"A Spiking Neural Network Approach to Auditory Source Lateralisation","authors":"R. Luke, D. McAlpine","doi":"10.1109/ICASSP.2019.8683767","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683767","url":null,"abstract":"A novel approach to multi-microphone acoustic source localisation based on spiking neural networks is presented. We demonstrate that a two microphone system connected to a spiking neural network can be used to localise acoustic sources based purely on inter microphone timing differences, with no need for manually configured delay lines. A two sensor example is provided which includes 1) a front end which converts the acoustic signal to a series of spikes, 2) a hidden layer of spiking neurons, 3) an output layer of spiking neurons which represents the location of the acoustic source. We present details on training the network, and evaluation of its performance in quiet and noisy conditions. The system is trained on two locations, and we show that the lateralisation accuracy is 100% when presented with previously unseen data in quiet conditions. We also demonstrate the network generalises to modulation rates and background noise on which it was not trained.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"1488-1492"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86447868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Neural Variational Identification and Filtering for Stochastic Non-linear Dynamical Systems with Application to Non-intrusive Load Monitoring 随机非线性动力系统的神经变分辨识与滤波及其在非侵入式负荷监测中的应用
Henning Lange, M. Berges, J. Z. Kolter
{"title":"Neural Variational Identification and Filtering for Stochastic Non-linear Dynamical Systems with Application to Non-intrusive Load Monitoring","authors":"Henning Lange, M. Berges, J. Z. Kolter","doi":"10.1109/ICASSP.2019.8683552","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683552","url":null,"abstract":"In this paper, an algorithm for performing System Identification and inference of the filtering recursion for stochastic non-linear dynamical systems is introduced. Additionally, the algorithm allows for enforcing domain-constraints of the state variable. The algorithm makes use of an approximate inference technique called Variational Inference in conjunction with Deep Neural Networks as the optimization engine. Although general in its nature, the algorithm is evaluated in the context of Non-Intrusive Load Monitoring, the problem of inferring the operational state of individual electrical appliances given aggregate measurements of electrical power collected in a home.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"8340-8344"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82402365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The Geometry of Equality-constrained Global Consensus Problems 等式约束的全局共识问题的几何
Qiuwei Li, Zhihui Zhu, Gongguo Tang, M. Wakin
{"title":"The Geometry of Equality-constrained Global Consensus Problems","authors":"Qiuwei Li, Zhihui Zhu, Gongguo Tang, M. Wakin","doi":"10.1109/ICASSP.2019.8682568","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682568","url":null,"abstract":"A variety of unconstrained nonconvex optimization problems have been shown to have benign geometric landscapes that satisfy the strict saddle property and have no spurious local minima. We present a general result relating the geometry of an unconstrained centralized problem to its equality-constrained distributed extension. It follows that many global consensus problems inherit the benign geometry of their original centralized counterpart. Taking advantage of this fact, we demonstrate the favorable performance of the Gradient ADMM algorithm on a distributed low-rank matrix approximation problem.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"7928-7932"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78630688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Information Theoretic Lower Bound of Restricted Isometry Property Constant 受限等距性质常数的信息论下界
Gen Li, Jingkai Yan, Yuantao Gu
{"title":"Information Theoretic Lower Bound of Restricted Isometry Property Constant","authors":"Gen Li, Jingkai Yan, Yuantao Gu","doi":"10.1109/ICASSP.2019.8683742","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683742","url":null,"abstract":"Compressed sensing seeks to recover an unknown sparse vector from undersampled rate measurements. Since its introduction, there have been enormous works on compressed sensing that develop efficient algorithms for sparse signal recovery. The restricted isometry property (RIP) has become the dominant tool used for the analysis of exact reconstruction from seemingly undersampled measurements. Although the upper bound of the RIP constant has been studied extensively, as far as we know, the result is missing for the lower bound. In this work, we first present a tight lower bound for the RIP constant, filling the gap there. The lower bound is at the same order as the upper bound for the RIP constant. Moreover, we also show that our lower bound is close to the upper bound by numerical simulations. Our bound on the RIP constant provides an information-theoretic lower bound about the sampling rate for the first time, which is the essential question for practitioners.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"25 1","pages":"5297-5301"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84616757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives 基于层次注意网络的语境建模在口语叙事中的情绪和自我评估情绪检测
Lukas Stappen, N. Cummins, Eva-Maria Messner, H. Baumeister, J. Dineley, Björn Schuller
{"title":"Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives","authors":"Lukas Stappen, N. Cummins, Eva-Maria Messner, H. Baumeister, J. Dineley, Björn Schuller","doi":"10.1109/ICASSP.2019.8683801","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683801","url":null,"abstract":"Automatic detection of sentiment and affect in personal narratives through word usage has the potential to assist in the automated detection of change in psychotherapy. Such a tool could, for instance, provide an efficient, objective measure of the time a person has been in a positive or negative state-of-mind. Towards this goal, we propose and develop a hierarchical attention model for the tasks of sentiment (positive and negative) and self-assessed affect detection in transcripts of personal narratives. We also perform a qualitative analysis of the word attentions learnt by our sentiment analysis model. In a key result, our attention model achieved an un-weighted average recall (UAR) of 91.0 % in a binary sentiment detection task on the test partition of the Ulm State-of-Mind in Speech (USoMS) corpus. We also achieved UARs of 73.7 % and 68.6 % in the 3-class tasks of arousal and valence detection respectively. Finally, our qualitative analysis associates colloquial reinforcements with positive sentiments, and uncertain phrasing with negative sentiments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"47 14","pages":"6680-6684"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91435895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Speech Augmentation Using Wavenet in Speech Recognition 基于小波网络的语音增强语音识别
Jisung Wang, Sangki Kim, Yeha Lee
{"title":"Speech Augmentation Using Wavenet in Speech Recognition","authors":"Jisung Wang, Sangki Kim, Yeha Lee","doi":"10.1109/ICASSP.2019.8683388","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683388","url":null,"abstract":"Data augmentation is crucial to improving the performance of deep neural networks by helping the model avoid overfitting and improve its generalization. In automatic speech recognition, previous work proposed several approaches to augment data by performing speed perturbation or spectral transformation. Since data augmented in this manner has similar acoustic representations as the original data, it has limited advantage in improving generalization of the acoustic model. In order to avoid generating data with limited diversity, we propose a voice conversion approach using a generative model (WaveNet), which generates a new utterance by transforming an utterance to a given target voice. Our method synthesizes speech with diverse pitch patterns by minimizing the use of acoustic features. With the Wall Street Journal dataset, we verify that our method led to better generalization compared to other data augmentation techniques such as speed perturbation and WORLD-based voice conversion. In addition, when combined with the speed perturbation technique, the two methods complement each other to further improve performance of the acoustic model.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"6770-6774"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80783918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Hybrid Method for Blind Estimation of Frequency Dependent Reverberation Time Using Speech Signals 利用语音信号盲估计频率相关混响时间的混合方法
Song Li, Roman Schlieper, J. Peissig
{"title":"A Hybrid Method for Blind Estimation of Frequency Dependent Reverberation Time Using Speech Signals","authors":"Song Li, Roman Schlieper, J. Peissig","doi":"10.1109/ICASSP.2019.8682661","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682661","url":null,"abstract":"Reverberation time is an important room acoustical parameter that can be used to identify the acoustic environment, predict speech intelligibility and model the late reverberation for binaural rendering, etc. Several blind estimation algorithms of reverberation time have been proposed by analyzing recorded speech signals. Unfortunately, the estimation accuracy for the frequency dependent reverberation time is lower than for the full-band reverberation time due to the lower signal energy in sub-band filters. This study presents a novel approach for the blind estimation of reverberation time in the full frequency range. The maximum likelihood method is applied for the estimation of the reverberation time from low- to mid-frequencies, and the reverberation time from mid- to high-frequencies is predicted by our proposed model based on the analysis of the reverberation time calculated from room impulse responses in different rooms. The proposed method is validated by two experiments and shows a good performance.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"211-215"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78067283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信