A. Alimuradov, A. Tychkov, P. Churakov, Bogdan A. Porezanov, Ilya O. Steshkin, Kirill E. Platonov, A. Baranova, D. S. Dudnikov
{"title":"Novel EMD-Based Technological Procedure for Speech Signal Processing","authors":"A. Alimuradov, A. Tychkov, P. Churakov, Bogdan A. Porezanov, Ilya O. Steshkin, Kirill E. Platonov, A. Baranova, D. S. Dudnikov","doi":"10.1109/dspa53304.2022.9790747","DOIUrl":null,"url":null,"abstract":"The article presents a novel technological procedure for speech signal processing based on the empirical mode decomposition, being an adaptive time-frequency analysis method. The proposed procedure is based on the uniform splitting of the original speech signal into fragments, the decomposition of fragments into empirical modes, and the formation of new mode speech signals. The goal of the technological procedure elaboration is to expand the space for informatively significant amplitude, time, frequency, and energy characteristics of the original speech signal. A brief description of various types of empirical mode decomposition has been presented, and their advantages and disadvantages have been revealed. The functionality of the proposed technological procedure has been detailed, and the research outcomes have been reported. An analysis of the research results has evidenced that the minimum time for the formation of a set of modal speech signals is afforded when analyzing 300–1000 ms fragments; the minimum error in the formation of a set of mode speech signals is obtained when the fragments are decomposed into 8–10 empirical modes, and the difference between the original and reconstructed signals being less than 0.001 V (0.1 %). It has been concluded that the proposed technological procedure actually provides an expansion of the space for informatively significant amplitude, time, frequency, and energy characteristics due to the formation of a set of new mode speech signals. Thus, it can be efficiently used in the formation of an optimal set of speech parameters relevant to naturally expressed human emotions.","PeriodicalId":428492,"journal":{"name":"2022 24th International Conference on Digital Signal Processing and its Applications (DSPA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 24th International Conference on Digital Signal Processing and its Applications (DSPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/dspa53304.2022.9790747","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The article presents a novel technological procedure for speech signal processing based on the empirical mode decomposition, being an adaptive time-frequency analysis method. The proposed procedure is based on the uniform splitting of the original speech signal into fragments, the decomposition of fragments into empirical modes, and the formation of new mode speech signals. The goal of the technological procedure elaboration is to expand the space for informatively significant amplitude, time, frequency, and energy characteristics of the original speech signal. A brief description of various types of empirical mode decomposition has been presented, and their advantages and disadvantages have been revealed. The functionality of the proposed technological procedure has been detailed, and the research outcomes have been reported. An analysis of the research results has evidenced that the minimum time for the formation of a set of modal speech signals is afforded when analyzing 300–1000 ms fragments; the minimum error in the formation of a set of mode speech signals is obtained when the fragments are decomposed into 8–10 empirical modes, and the difference between the original and reconstructed signals being less than 0.001 V (0.1 %). It has been concluded that the proposed technological procedure actually provides an expansion of the space for informatively significant amplitude, time, frequency, and energy characteristics due to the formation of a set of new mode speech signals. Thus, it can be efficiently used in the formation of an optimal set of speech parameters relevant to naturally expressed human emotions.