语音信号分析使用频率检测器和平滑一、二阶导数

Serge E. Miheev, P. Morozov
{"title":"语音信号分析使用频率检测器和平滑一、二阶导数","authors":"Serge E. Miheev, P. Morozov","doi":"10.1109/SCP.2015.7342209","DOIUrl":null,"url":null,"abstract":"The amplitude and frequency analysis of continuous digital audio contained in the WAV-file of the unfixed length is produced with further playback on the basis of amplitude-frequency characteristics that are functions of time. Unlike traditionally used for these wavelets, there is no task of the original sound wave shape approximation, because the human ear cannot distinguish the shape of the sound wave, but only a set of harmonic amplitudes composing it. Therefore, negligible time outrunning or lag in phases of harmonic amplitude time-functions are acceptable, but the result of the analysis must be in a form available to change playback rate while preserving the frequency characteristics. To obtain the high quality of synthesized speech, frequency-phase detector is developed, which allows to identify the main frequency of the input digitized speech. The criterion of current quality, that lies in detector foundation, in addition to the standard deviation was enriched with penalty functions. This eliminates the harmonic amplitude gaps caused by local extremes in the pure standard deviation. Due to the gap smoothing the “pulsebeat” effect is eliminated. Additional smoothing was occurred by a lowpass filter. Thus, it provides high quality of digital output synthesized speech, despite the significant difference of its shape from the input one.","PeriodicalId":110366,"journal":{"name":"2015 International Conference \"Stability and Control Processes\" in Memory of V.I. Zubov (SCP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech signals analysis using a frequency detector and smoothing first and second derivatives\",\"authors\":\"Serge E. Miheev, P. Morozov\",\"doi\":\"10.1109/SCP.2015.7342209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amplitude and frequency analysis of continuous digital audio contained in the WAV-file of the unfixed length is produced with further playback on the basis of amplitude-frequency characteristics that are functions of time. Unlike traditionally used for these wavelets, there is no task of the original sound wave shape approximation, because the human ear cannot distinguish the shape of the sound wave, but only a set of harmonic amplitudes composing it. Therefore, negligible time outrunning or lag in phases of harmonic amplitude time-functions are acceptable, but the result of the analysis must be in a form available to change playback rate while preserving the frequency characteristics. To obtain the high quality of synthesized speech, frequency-phase detector is developed, which allows to identify the main frequency of the input digitized speech. The criterion of current quality, that lies in detector foundation, in addition to the standard deviation was enriched with penalty functions. This eliminates the harmonic amplitude gaps caused by local extremes in the pure standard deviation. Due to the gap smoothing the “pulsebeat” effect is eliminated. Additional smoothing was occurred by a lowpass filter. Thus, it provides high quality of digital output synthesized speech, despite the significant difference of its shape from the input one.\",\"PeriodicalId\":110366,\"journal\":{\"name\":\"2015 International Conference \\\"Stability and Control Processes\\\" in Memory of V.I. Zubov (SCP)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference \\\"Stability and Control Processes\\\" in Memory of V.I. Zubov (SCP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCP.2015.7342209\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference \"Stability and Control Processes\" in Memory of V.I. Zubov (SCP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCP.2015.7342209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

对长度不固定的wave文件中包含的连续数字音频进行幅频分析,并根据作为时间函数的幅频特性进行进一步回放。与传统的小波方法不同,这种方法不需要对原始的声波形状进行近似,因为人耳不能分辨出声波的形状,只有一组谐波幅值构成了声波。因此,谐波振幅时间函数的相位中可以忽略不计的时间跑偏或滞后是可以接受的,但是分析结果必须以一种可以在保持频率特性的同时改变重放速率的形式出现。为了获得高质量的合成语音,开发了频率相位检测器,可以识别输入的数字化语音的主频率。电流质量的判据在于检测器的基础,除了标准偏差外,还增加了惩罚函数。这消除了纯标准偏差中由局部极值引起的谐波振幅间隙。由于间隙平滑,消除了“脉冲”效应。通过低通滤波器进行额外的平滑处理。因此,它提供了高质量的数字输出合成语音,尽管它的形状与输入语音有很大的不同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Speech signals analysis using a frequency detector and smoothing first and second derivatives
The amplitude and frequency analysis of continuous digital audio contained in the WAV-file of the unfixed length is produced with further playback on the basis of amplitude-frequency characteristics that are functions of time. Unlike traditionally used for these wavelets, there is no task of the original sound wave shape approximation, because the human ear cannot distinguish the shape of the sound wave, but only a set of harmonic amplitudes composing it. Therefore, negligible time outrunning or lag in phases of harmonic amplitude time-functions are acceptable, but the result of the analysis must be in a form available to change playback rate while preserving the frequency characteristics. To obtain the high quality of synthesized speech, frequency-phase detector is developed, which allows to identify the main frequency of the input digitized speech. The criterion of current quality, that lies in detector foundation, in addition to the standard deviation was enriched with penalty functions. This eliminates the harmonic amplitude gaps caused by local extremes in the pure standard deviation. Due to the gap smoothing the “pulsebeat” effect is eliminated. Additional smoothing was occurred by a lowpass filter. Thus, it provides high quality of digital output synthesized speech, despite the significant difference of its shape from the input one.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信