{"title":"EMD-Based Method to Improve the Efficiency of Speech/Pause Segmentation","authors":"A. Alimuradov, A. Tychkov","doi":"10.1109/SIBCON50419.2021.9438905","DOIUrl":null,"url":null,"abstract":"Speech/pause segmentation is classification of informative sections into voiced and unvoiced speech, and pauses. Accurate detection of the boundaries of the beginning and the end of informative sections of speech signals is one of the most important tasks in speech applications. The article presents a method for increasing the efficiency of speech/pause segmentation based on empirical mode decomposition. The proposed method is based on the use of decomposition in preprocessing of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the sections of voiced and unvoiced speech, and pauses. The research has been carried out to evaluate the effect of the decomposition method and the duration of the fragments of the studied signals on the efficiency of speech/pause segmentation. The methods based on zero-crossing rate, short-time energy, and the analysis of one-dimensional Mahalanobis distance, were used for segmentation. The obtained research results have shown an increase in the efficiency of speech/pause segmentation by an average of 11.44 % for the first and second kind errors.","PeriodicalId":150550,"journal":{"name":"2021 International Siberian Conference on Control and Communications (SIBCON)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Siberian Conference on Control and Communications (SIBCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIBCON50419.2021.9438905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Speech/pause segmentation is classification of informative sections into voiced and unvoiced speech, and pauses. Accurate detection of the boundaries of the beginning and the end of informative sections of speech signals is one of the most important tasks in speech applications. The article presents a method for increasing the efficiency of speech/pause segmentation based on empirical mode decomposition. The proposed method is based on the use of decomposition in preprocessing of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the sections of voiced and unvoiced speech, and pauses. The research has been carried out to evaluate the effect of the decomposition method and the duration of the fragments of the studied signals on the efficiency of speech/pause segmentation. The methods based on zero-crossing rate, short-time energy, and the analysis of one-dimensional Mahalanobis distance, were used for segmentation. The obtained research results have shown an increase in the efficiency of speech/pause segmentation by an average of 11.44 % for the first and second kind errors.