{"title":"Speech signals analysis using a frequency detector and smoothing first and second derivatives","authors":"Serge E. Miheev, P. Morozov","doi":"10.1109/SCP.2015.7342209","DOIUrl":null,"url":null,"abstract":"The amplitude and frequency analysis of continuous digital audio contained in the WAV-file of the unfixed length is produced with further playback on the basis of amplitude-frequency characteristics that are functions of time. Unlike traditionally used for these wavelets, there is no task of the original sound wave shape approximation, because the human ear cannot distinguish the shape of the sound wave, but only a set of harmonic amplitudes composing it. Therefore, negligible time outrunning or lag in phases of harmonic amplitude time-functions are acceptable, but the result of the analysis must be in a form available to change playback rate while preserving the frequency characteristics. To obtain the high quality of synthesized speech, frequency-phase detector is developed, which allows to identify the main frequency of the input digitized speech. The criterion of current quality, that lies in detector foundation, in addition to the standard deviation was enriched with penalty functions. This eliminates the harmonic amplitude gaps caused by local extremes in the pure standard deviation. Due to the gap smoothing the “pulsebeat” effect is eliminated. Additional smoothing was occurred by a lowpass filter. Thus, it provides high quality of digital output synthesized speech, despite the significant difference of its shape from the input one.","PeriodicalId":110366,"journal":{"name":"2015 International Conference \"Stability and Control Processes\" in Memory of V.I. Zubov (SCP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference \"Stability and Control Processes\" in Memory of V.I. Zubov (SCP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCP.2015.7342209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The amplitude and frequency analysis of continuous digital audio contained in the WAV-file of the unfixed length is produced with further playback on the basis of amplitude-frequency characteristics that are functions of time. Unlike traditionally used for these wavelets, there is no task of the original sound wave shape approximation, because the human ear cannot distinguish the shape of the sound wave, but only a set of harmonic amplitudes composing it. Therefore, negligible time outrunning or lag in phases of harmonic amplitude time-functions are acceptable, but the result of the analysis must be in a form available to change playback rate while preserving the frequency characteristics. To obtain the high quality of synthesized speech, frequency-phase detector is developed, which allows to identify the main frequency of the input digitized speech. The criterion of current quality, that lies in detector foundation, in addition to the standard deviation was enriched with penalty functions. This eliminates the harmonic amplitude gaps caused by local extremes in the pure standard deviation. Due to the gap smoothing the “pulsebeat” effect is eliminated. Additional smoothing was occurred by a lowpass filter. Thus, it provides high quality of digital output synthesized speech, despite the significant difference of its shape from the input one.