Multi-Dimensional Spectral Process for Cepstral Feature Engineering & Formant Coding

Q4 Engineering

Journal of Electrical and Electronics Engineering Pub Date : 2022-08-15 DOI:10.33140/jeee.01.01.01

{"title":"Multi-Dimensional Spectral Process for Cepstral Feature Engineering & Formant Coding","authors":"","doi":"10.33140/jeee.01.01.01","DOIUrl":null,"url":null,"abstract":"The fundamental frequency feature is essential for Automatic Speech Recognition because its patterns convey a paralanguage and its tuning normalizes other speech features. Human speech is multidimensional because it is minimally represented by three variables: the intonation (or pitch), the formants (or timbre), and the speech resolution (or depth). These variables represent the hidden states of the local glottal variation, the vocal tract response, and the frequency scale, respectively. Computing them one by one is not as efficient as computing them together, so this article introduces a new speech feature extraction approach. The article is introductory; it focuses on the basic concepts of our new approach and does not elaborate on all applications. It demonstrates that the unit of a cepstral value, which is a spectral value of spectrums, is a unit of acceleration since its discrete variable, the quefrency, can be expressed in Hertz-per-microsecond. The article shows how to produce refined voice analysis from robust estimates and how to reconstruct speech signals from feature spaces. And it concludes that the pitch track of the new approach is as good as two open-source pitch extractors. Combining multiple processes, attenuating background noises, and enabling distant-speech recognition, we introduce the Speech Quefrency Transform (SQT) approach as well as multiple quefrency scales. SQT is a set of frequency transforms whose spectral leakages are controlled per a frequency-modulation model. SQT captures the stationarity of time series onto a hyperspace that resembles the cepstrogram when it is reduced for pitch track extraction.","PeriodicalId":39047,"journal":{"name":"Journal of Electrical and Electronics Engineering","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electrical and Electronics Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33140/jeee.01.01.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 0

Abstract

The fundamental frequency feature is essential for Automatic Speech Recognition because its patterns convey a paralanguage and its tuning normalizes other speech features. Human speech is multidimensional because it is minimally represented by three variables: the intonation (or pitch), the formants (or timbre), and the speech resolution (or depth). These variables represent the hidden states of the local glottal variation, the vocal tract response, and the frequency scale, respectively. Computing them one by one is not as efficient as computing them together, so this article introduces a new speech feature extraction approach. The article is introductory; it focuses on the basic concepts of our new approach and does not elaborate on all applications. It demonstrates that the unit of a cepstral value, which is a spectral value of spectrums, is a unit of acceleration since its discrete variable, the quefrency, can be expressed in Hertz-per-microsecond. The article shows how to produce refined voice analysis from robust estimates and how to reconstruct speech signals from feature spaces. And it concludes that the pitch track of the new approach is as good as two open-source pitch extractors. Combining multiple processes, attenuating background noises, and enabling distant-speech recognition, we introduce the Speech Quefrency Transform (SQT) approach as well as multiple quefrency scales. SQT is a set of frequency transforms whose spectral leakages are controlled per a frequency-modulation model. SQT captures the stationarity of time series onto a hyperspace that resembles the cepstrogram when it is reduced for pitch track extraction.

查看原文本刊更多论文

基于倒谱特征工程的多维谱处理与构象编码

基频特征对自动语音识别至关重要，因为它的模式传达了一种副语言，它的调谐使其他语音特征规范化。人类语言是多维的，因为它最小程度上由三个变量表示:语调(或音高)、共振峰(或音色)和语音分辨率(或深度)。这些变量分别代表了局部声门变化、声道反应和频率尺度的隐藏状态。本文提出了一种新的语音特征提取方法。这篇文章是介绍性的;它侧重于我们的新方法的基本概念，并没有详细说明所有的应用。它证明了倒谱值的单位(谱的谱值)是加速度的单位，因为它的离散变量频率可以用赫兹/微秒表示。本文展示了如何从鲁棒估计生成精细的语音分析，以及如何从特征空间重建语音信号。结果表明，新方法的音高轨迹与两个开源的音高提取器一样好。结合多个过程，衰减背景噪声，实现远距离语音识别，我们介绍了语音频率变换(SQT)方法以及多个频率尺度。SQT是一组频率变换，其频谱泄漏由调频模型控制。SQT将时间序列的平稳性捕获到类似于倒图的超空间上，当它被减少用于音轨提取时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Electrical and Electronics Engineering Engineering-Electrical and Electronic Engineering

CiteScore

0.90

自引率

0.00%

发文量

审稿时长

16 weeks

期刊介绍： Journal of Electrical and Electronics Engineering is a scientific interdisciplinary, application-oriented publication that offer to the researchers and to the PhD students the possibility to disseminate their novel and original scientific and research contributions in the field of electrical and electronics engineering. The articles are reviewed by professionals and the selection of the papers is based only on the quality of their content and following the next criteria: the papers presents the research results of the authors, the papers / the content of the papers have not been submitted or published elsewhere, the paper must be written in English, as well as the fact that the papers should include in the reference list papers already published in recent years in the Journal of Electrical and Electronics Engineering that present similar research results. The topics and instructions for authors of this journal can be found to the appropiate sections.