Transient modeling for overlap-add sinusoidal model of speech

2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI:10.1109/ICASSP.2013.6639261

Slava Shechtman

引用次数: 2

Abstract

Speech sinusoidal modeling has been successfully applied to a broad range of speech analysis, synthesis and modification tasks. At most, it reproduces a high quality speech, however for speech transients (e.g. plosives, glottal stops) it suffers from reduced fidelity due to lack of intra-frame modeling of irregularities. Various extensions had been proposed for the stationary sinusoidal model to cope with this problem. One of simple and well-known in the art approaches is incorporating of an intra-frame magnitude envelope into the sinusoidal model. It used to be done by iterative analysis-by-synthesis procedure. In this paper we derive an optimal analytic solution for this problem. We will show that this solution yields significantly better model fit than the known-in-the-art analysis-by-synthesis approach.

查看原文本刊更多论文

语音叠加正弦模型的瞬态建模

语音正弦建模已经成功地应用于广泛的语音分析、合成和修改任务。它最多可以再现高质量的语音，但是对于语音瞬态(例如爆破音、声门停顿音)，由于缺乏帧内不规则建模，它的保真度降低了。为了解决这一问题，人们对平稳正弦模型提出了各种扩展。一种简单而众所周知的方法是将帧内幅度包络纳入正弦模型。它过去是通过迭代的综合分析过程来完成的。本文给出了这一问题的最优解析解。我们将证明，该解决方案比已知的合成分析方法产生更好的模型拟合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量