2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第8页

Analysis of singing voice for epoch extraction using Zero Frequency Filtering method 用零频率滤波方法分析歌唱声音的历元提取

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178774

Sudarsana Reddy Kadiri, B. Yegnanarayana

{"title":"Analysis of singing voice for epoch extraction using Zero Frequency Filtering method","authors":"Sudarsana Reddy Kadiri, B. Yegnanarayana","doi":"10.1109/ICASSP.2015.7178774","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178774","url":null,"abstract":"Epoch is the instant of significant excitation of the vocal tract system during the production of voiced speech. Estimation of epochs or Glottal closure instants (GCIs) is a well studied topic in the speech analysis. From the recent studies on GCI detection from singing voice with state-of-art methods proposed for speech, there exist a clear gap in accuracy between speech and singing voice. This is because of source-filter interaction in singing voice compared to speech. Performance of existing algorithms deteriorates as most of the techniques depends on the ability to model the vocal tract system in order to emphasize the excitation characteristics in the residual. The objective of this paper is to analyze the singing voice for the estimation of epochs by studying the characteristics of the source-filter interaction and the effect of wider range of pitch using the Zero Frequency Filtering (ZFF) method. It is observed that high source-filter interaction can be captured in the form of the impulse-like excitation by passing the signal through three ideal digital resonators having poles at zero frequency, and the effect of wider range of pitch can be controlled by processing short segment (0.4-0.5 sec) signal.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121468375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Information extraction from large multi-layer social networks 大型多层次社交网络的信息提取

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7179013

Brandon Oselio, Alex Kulesza, A. Hero

引用次数: 12

Objective quality prediction for haptic texture signal compression 触觉纹理信号压缩的客观质量预测

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178366

R. Chaudhari, Yongjae Yoo, Clemens Schuwerk, Seungmoon Choi, E. Steinbach

引用次数: 1

Fast and efficient intra coding techniques for smooth regions in screen content coding based on boundary prediction samples 基于边界预测样本的屏幕内容平滑区域快速高效编码技术

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178202

Sik-Ho Tsang, Yui-Lam Chan, W. Siu

引用次数: 27

Optically visualized sound field reconstruction based on sparse selection of point sound sources 基于点声源稀疏选择的光学可视化声场重建

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178020

K. Yatabe, Yasuhiro Oikawa

引用次数: 11

Multivariate lattices for encrypted image processing 用于加密图像处理的多元格

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178262

A. Pedrouzo-Ulloa, J. Troncoso-Pastoriza, F. Pérez-González

引用次数: 16

Multiple target track-before-detect in compound Gaussian clutter 复合高斯杂波中多目标检测前跟踪

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178429

S. P. Ebenezer, A. Papandreou-Suppappola

引用次数: 8

Multi-task deep neural network acoustic models with model adaptation using discriminative speaker identity for whisper recognition 基于区分说话人身份的多任务深度神经网络声学模型用于耳语识别

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178916

Jingjie Li, I. Mcloughlin, Cong Liu, Shaofei Xue, Si Wei

{"title":"Multi-task deep neural network acoustic models with model adaptation using discriminative speaker identity for whisper recognition","authors":"Jingjie Li, I. Mcloughlin, Cong Liu, Shaofei Xue, Si Wei","doi":"10.1109/ICASSP.2015.7178916","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178916","url":null,"abstract":"This paper presents a study on large vocabulary continuous whisper automatic recognition (wLVCSR). wLVCSR provides the ability to use ASR equipment in public places without concern for disturbing others or leaking private information. However the task of wLVCSR is much more challenging than normal LVCSR due to the absence of pitch which not only causes the signal to noise ratio (SNR) of whispers to be much lower than normal speech but also leads to flatness and formant shifts in whisper spectra. Furthermore, the amount of whisper data available for training is much less than for normal speech. In this paper, multi-task deep neural network (DNN) acoustic models are deployed to solve these problems. Moreover, model adaptation is performed on the multi-task DNN to normalize speaker and environmental variability in whispers based on discriminative speaker identity information. On a Mandarin whisper dictation task, with 55 hours of whisper data, the proposed SI multi-task DNN model can achieve 56.7% character error rate (CER) improvement over a baseline Gaussian Mixture Model (GMM), discriminatively trained only using the whisper data. Besides, the CER of the proposed model for normal speech can reach 15.2%, which is close to the performance of a state-of-the-art DNN trained with one thousand hours of speech data. From this baseline, the model-adapted DNN gains a further 10.9% CER reduction over the generic model.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127737163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Doa estimation by covariance matrix sparse reconstruction of coprime array 基于协方差矩阵稀疏重构的协方差阵Doa估计

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178395

Chengwei Zhou, Zhiguo Shi, Yujie Gu, N. Goodman

引用次数: 35

Assistive listening headsets for high noise environments: Protection and communication 高噪音环境用助听耳机:保护和通讯

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7179074

S. Nordholm, A. Davis, Pei Chee Yong, H. H. Dam

引用次数: 3