ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第10页

Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder 多声道变分自编码器混响混响的联合分离与去噪

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683497

S. Inoue, H. Kameoka, Li Li, Shogo Seki, S. Makino

{"title":"Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder","authors":"S. Inoue, H. Kameoka, Li Li, Shogo Seki, S. Makino","doi":"10.1109/ICASSP.2019.8683497","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683497","url":null,"abstract":"In this paper, we deal with a multichannel source separation problem under a highly reverberant condition. The multichan- nel variational autoencoder (MVAE) is a recently proposed source separation method that employs the decoder distribu- tion of a conditional VAE (CVAE) as the generative model for the complex spectrograms of the underlying source sig- nals. Although MVAE is notable in that it can significantly improve the source separation performance compared with conventional methods, its capability to separate highly rever- berant mixtures is still limited since MVAE uses an instan- taneous mixture model. To overcome this limitation, in this paper we propose extending MVAE to simultaneously solve source separation and dereverberation problems by formulat- ing the separation system as a frequency-domain convolutive mixture model. A convergence-guaranteed algorithm based on the coordinate descent method is derived for the optimiza- tion. Experimental results revealed that the proposed method outperformed the conventional methods in terms of all the source separation criteria in highly reverberant environments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"96-100"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84952009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Single-channel Speech Extraction Using Speaker Inventory and Attention Network 基于说话人清单和注意网络的单通道语音提取

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682245

Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, D. Dimitriadis, J. Droppo, Y. Gong

{"title":"Single-channel Speech Extraction Using Speaker Inventory and Attention Network","authors":"Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, D. Dimitriadis, J. Droppo, Y. Gong","doi":"10.1109/ICASSP.2019.8682245","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682245","url":null,"abstract":"Neural network-based speech separation has received a surge of interest in recent years. Previously proposed methods either are speaker independent or extract a target speaker’s voice by using his or her voice snippet. In applications such as home devices or office meeting transcriptions, a possible speaker list is available, which can be leveraged for speech separation. This paper proposes a novel speech extraction method that utilizes an inventory of voice snippets of possible interfering speakers, or speaker enrollment data, in addition to that of the target speaker. Furthermore, an attention-based network architecture is proposed to form time-varying masks for both the target and other speakers during the separation process. This architecture does not reduce the enrollment audio of each speaker into a single vector, thereby allowing each short time frame of the input mixture signal to be aligned and accurately compared with the enrollment signals. We evaluate the proposed system on a speaker extraction task derived from the Libri corpus and show the effectiveness of the method.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"124 1","pages":"86-90"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83544170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

Non-local Self-attention Structure for Function Approximation in Deep Reinforcement Learning 深度强化学习中函数逼近的非局部自注意结构

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682832

Z. Wang, Xi Xiao, Guangwu Hu, Yao Yao, Dianyan Zhang, Zhendong Peng, Qing Li, Shutao Xia

引用次数: 0

A Spiking Neural Network Approach to Auditory Source Lateralisation 听觉源侧化的脉冲神经网络方法

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683767

R. Luke, D. McAlpine

引用次数: 3

Neural Variational Identification and Filtering for Stochastic Non-linear Dynamical Systems with Application to Non-intrusive Load Monitoring 随机非线性动力系统的神经变分辨识与滤波及其在非侵入式负荷监测中的应用

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683552

Henning Lange, M. Berges, J. Z. Kolter

引用次数: 6

The Geometry of Equality-constrained Global Consensus Problems 等式约束的全局共识问题的几何

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682568

Qiuwei Li, Zhihui Zhu, Gongguo Tang, M. Wakin

引用次数: 6

Information Theoretic Lower Bound of Restricted Isometry Property Constant 受限等距性质常数的信息论下界

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683742

Gen Li, Jingkai Yan, Yuantao Gu

引用次数: 1

Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives 基于层次注意网络的语境建模在口语叙事中的情绪和自我评估情绪检测

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683801

Lukas Stappen, N. Cummins, Eva-Maria Messner, H. Baumeister, J. Dineley, Björn Schuller

{"title":"Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives","authors":"Lukas Stappen, N. Cummins, Eva-Maria Messner, H. Baumeister, J. Dineley, Björn Schuller","doi":"10.1109/ICASSP.2019.8683801","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683801","url":null,"abstract":"Automatic detection of sentiment and affect in personal narratives through word usage has the potential to assist in the automated detection of change in psychotherapy. Such a tool could, for instance, provide an efficient, objective measure of the time a person has been in a positive or negative state-of-mind. Towards this goal, we propose and develop a hierarchical attention model for the tasks of sentiment (positive and negative) and self-assessed affect detection in transcripts of personal narratives. We also perform a qualitative analysis of the word attentions learnt by our sentiment analysis model. In a key result, our attention model achieved an un-weighted average recall (UAR) of 91.0 % in a binary sentiment detection task on the test partition of the Ulm State-of-Mind in Speech (USoMS) corpus. We also achieved UARs of 73.7 % and 68.6 % in the 3-class tasks of arousal and valence detection respectively. Finally, our qualitative analysis associates colloquial reinforcements with positive sentiments, and uncertain phrasing with negative sentiments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"47 14","pages":"6680-6684"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91435895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Speech Augmentation Using Wavenet in Speech Recognition 基于小波网络的语音增强语音识别

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683388

Jisung Wang, Sangki Kim, Yeha Lee

{"title":"Speech Augmentation Using Wavenet in Speech Recognition","authors":"Jisung Wang, Sangki Kim, Yeha Lee","doi":"10.1109/ICASSP.2019.8683388","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683388","url":null,"abstract":"Data augmentation is crucial to improving the performance of deep neural networks by helping the model avoid overfitting and improve its generalization. In automatic speech recognition, previous work proposed several approaches to augment data by performing speed perturbation or spectral transformation. Since data augmented in this manner has similar acoustic representations as the original data, it has limited advantage in improving generalization of the acoustic model. In order to avoid generating data with limited diversity, we propose a voice conversion approach using a generative model (WaveNet), which generates a new utterance by transforming an utterance to a given target voice. Our method synthesizes speech with diverse pitch patterns by minimizing the use of acoustic features. With the Wall Street Journal dataset, we verify that our method led to better generalization compared to other data augmentation techniques such as speed perturbation and WORLD-based voice conversion. In addition, when combined with the speed perturbation technique, the two methods complement each other to further improve performance of the acoustic model.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"6770-6774"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80783918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Hybrid Method for Blind Estimation of Frequency Dependent Reverberation Time Using Speech Signals 利用语音信号盲估计频率相关混响时间的混合方法

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682661

Song Li, Roman Schlieper, J. Peissig

引用次数: 4