ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

Universal Acoustic Modeling Using Neural Mixture Models 使用神经混合模型的通用声学建模

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-09-01 DOI: 10.1109/ICASSP.2019.8682403

Amit Das, Jinyu Li, Changliang Liu, Y. Gong

{"title":"Universal Acoustic Modeling Using Neural Mixture Models","authors":"Amit Das, Jinyu Li, Changliang Liu, Y. Gong","doi":"10.1109/ICASSP.2019.8682403","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682403","url":null,"abstract":"Acoustic models are domain dependent and do not perform well if there is a mismatch between training and test conditions. As an alternative, the Mixture of Experts (MoE) model was introduced for multi-domain modeling. It combines the outputs of several domain specific models (or experts) using a gating network. However, one drawback is that the gating network directly uses raw features and is unaware of the state of the experts. In this work, we propose several alternatives to improve the MoE model. First, to make our MoE model state-aware, we use outputs of experts as inputs to the gating network. Then we show that vector based interpolation of the mixture weights is more effective than scalar interpolation. Second, we show that directly learning the mixture weights without using any complex gating is still effective. Finally, we introduce a hybrid attention model that uses the logits and mixture weights from the previous time step to generate the mixture weights at the current time. Our best proposed model outperforms a baseline model using LSTM based gating achieving about 20.48% relative reduction in word error rate (WER). Moreover, it beats an oracle model which picks the best expert for a given test condition.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"5681-5685"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88462698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech 基于自然智能手机语音的抑郁检测语音地标双元图

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-06-06 DOI: 10.1109/ICASSP.2019.8682916

Zhaocheng Huang, J. Epps, Dale Joachim

{"title":"Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech","authors":"Zhaocheng Huang, J. Epps, Dale Joachim","doi":"10.1109/ICASSP.2019.8682916","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682916","url":null,"abstract":"Detection of depression from speech has attracted significant research attention in recent years but remains a challenge, particularly for speech from diverse smartphones in natural environments. This paper proposes two sets of novel features based on speech landmark bigrams associated with abrupt speech articulatory events for depression detection from smartphone audio recordings. Combined with techniques adapted from natural language text processing, the proposed features further exploit landmark bigrams by discovering latent articulatory events. Experimental results on a large, naturalistic corpus containing various spoken tasks recorded from diverse smartphones suggest that speech landmark bigram features provide a 30.1% relative improvement in F1 (depressed) relative to an acoustic feature baseline system. As might be expected, a key finding was the importance of tailoring the choice of landmark bigrams to each elicitation task, revealing that different aspects of speech articulation are elicited by different tasks, which can be effectively captured by the landmark approaches.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"3 10 1","pages":"5856-5860"},"PeriodicalIF":0.0,"publicationDate":"2019-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81377779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Robust M-estimation Based Matrix Completion 基于稳健m估计的矩阵补全

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-27 DOI: 10.1109/ICASSP.2019.8682657

Michael Muma, W. Zeng, A. Zoubir

引用次数: 10

When Can a System of Subnetworks Be Registered Uniquely? 一个子网系统何时可以唯一注册?

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-27 DOI: 10.1109/ICASSP.2019.8682680

A. V. Singh, K. Chaudhury

引用次数: 0

Learning Search Path for Region-level Image Matching 区域级图像匹配的学习搜索路径

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-20 DOI: 10.1109/ICASSP.2019.8682714

Onkar Krishna, Go Irie, Xiaomeng Wu, T. Kawanishi, K. Kashino

{"title":"Learning Search Path for Region-level Image Matching","authors":"Onkar Krishna, Go Irie, Xiaomeng Wu, T. Kawanishi, K. Kashino","doi":"10.1109/ICASSP.2019.8682714","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682714","url":null,"abstract":"Finding a region of an image which matches to a query from a large number of candidates is a fundamental problem in image processing. The exhaustive nature of the sliding window approach has encouraged works that can reduce the run time by skipping unnecessary windows or pixels that do not play a substantial role in search results. However, such a pruning-based approach still needs to evaluate the non-ignorable number of candidates, which leads to a limited efficiency improvement. We propose an approach to learn efficient search paths from data. Our model is based on a CNN-LSTM architecture which is designed to sequentially determine a prospective location to be searched next based on the history of the locations attended. We propose a reinforcement learning algorithm to train the model in an end-to-end manner, which allows to jointly learn the search paths and deep image features for matching. These properties together significantly reduce the number of windows to be evaluated and makes it robust to background clutters. Our model gives remarkable matching accuracy with the reduced number of windows and run time on MNIST and FlickrLogos-32 datasets.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"1967-1971"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88674951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improving Children Speech Recognition through Feature Learning from Raw Speech Signal 基于原始语音信号特征学习的儿童语音识别研究

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-18 DOI: 10.1109/ICASSP.2019.8682826

Selen Hande Kabil, Mathew Magimai Doss

引用次数: 9

Beamformer Design under Time-correlated Interference and Online Implementation: Brain-activity Reconstruction from EEG 时间相关干扰下波束形成器设计与在线实现:脑电脑活动重建

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-17 DOI: 10.1109/ICASSP.2019.8682614

Takehiro Kono, M. Yukawa, Tomasz Piotrowski

引用次数: 1

Event-driven Pipeline for Low-latency Low-compute Keyword Spotting and Speaker Verification System 低延迟低计算关键字识别与说话人验证系统的事件驱动管道

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-17 DOI: 10.1109/ICASSP.2019.8683669

Enea Ceolini, Jithendar Anumula, Stefan Braun, Shih-Chii Liu

引用次数: 12

Importance of Analytic Phase of the Speech Signal for Detecting Replay Attacks in Automatic Speaker Verification Systems 语音信号分析相位在自动说话人验证系统中检测重放攻击中的重要性

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-16 DOI: 10.1109/ICASSP.2019.8683500

B. M. Rafi, K. Murty

{"title":"Importance of Analytic Phase of the Speech Signal for Detecting Replay Attacks in Automatic Speaker Verification Systems","authors":"B. M. Rafi, K. Murty","doi":"10.1109/ICASSP.2019.8683500","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683500","url":null,"abstract":"In this paper, the importance of analytic phase of the speech signal in automatic speaker verification systems is demonstrated in the context of replay spoof attacks. In order to accurately detect the replay spoof attacks, effective feature representations of speech signals are required to capture the distortion introduced due to the intermediate playback/recording devices, which is convolutive in nature. Since the convolutional distortion in time-domain translates to additive distortion in the phase-domain, we propose to use IFCC features extracted from the analytic phase of the speech signal. The IFCC features contain information from both clean speech and distortion components. The clean speech component has to be subtracted in order to highlight the distortion component introduced by the playback/recording devices. In this work, a dictionary learned from the IFCCs extracted from clean speech data is used to remove the clean speech component. The residual distortion component is used as a feature to build binary classifier for replay spoof detection. The proposed phase-based features delivered a 9% absolute improvement over the baseline system built using magnitude-based CQCC features.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"6306-6310"},"PeriodicalIF":0.0,"publicationDate":"2019-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85227808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Maximally Smooth Dirichlet Interpolation from Complete and Incomplete Sample Points on the Unit Circle 单位圆上完全和不完全采样点的最光滑狄利克雷插值

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-16 DOI: 10.1109/ICASSP.2019.8683366

Stephan Weiss, M. Macleod

引用次数: 8