A Novel Approach to Speech Signal Segmentation Based on Time-Frequency Analysis

A. Alimuradov, A. Tychkov, P. Churakov, D. S. Dudnikov
{"title":"A Novel Approach to Speech Signal Segmentation Based on Time-Frequency Analysis","authors":"A. Alimuradov, A. Tychkov, P. Churakov, D. S. Dudnikov","doi":"10.1109/DCNA56428.2022.9923223","DOIUrl":null,"url":null,"abstract":"The accuracy of speech signal segmentation depends directly on the parameters used to determine the boundaries of the beginning and the end of informative fragments in a continuous speech stream. The purpose of the work is to increase the efficiency of speech/pause segmentation due to the frequency-time analysis of speech signals. A novel original approach to speech/pause segmentation based on the analysis of the values of the mean frequency (in the frequency domain) and short-term energy of the Teager operator function (in the time domain) is proposed. The proposed approach is unique due to an auxiliary algorithm to correct speech/pause segmentation errors, developed on the basis of physiological functioning of the respiratory apparatus organs during the formation of a continuous speech stream. A brief overview of speech signal informative parameters used for speech/pause segmentation has been presented, and the proposed approach performance has been detailed. The suggested approach has been compared with the known methods of speech/pause segmentation for pure and noisy speech signals. The research findings have evidenced the best results of speech/pause segmentation for pure and noisy speech signals being achieved by the methods based on the proposed approach; the ratio of the short-term energy of the Teager operator function to the mean frequency as an informative parameter ensuring maximum relevance to the segmentation problem; an auxiliary algorithm to correct false states enhancing the efficiency of segmentation.","PeriodicalId":110836,"journal":{"name":"2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCNA56428.2022.9923223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The accuracy of speech signal segmentation depends directly on the parameters used to determine the boundaries of the beginning and the end of informative fragments in a continuous speech stream. The purpose of the work is to increase the efficiency of speech/pause segmentation due to the frequency-time analysis of speech signals. A novel original approach to speech/pause segmentation based on the analysis of the values of the mean frequency (in the frequency domain) and short-term energy of the Teager operator function (in the time domain) is proposed. The proposed approach is unique due to an auxiliary algorithm to correct speech/pause segmentation errors, developed on the basis of physiological functioning of the respiratory apparatus organs during the formation of a continuous speech stream. A brief overview of speech signal informative parameters used for speech/pause segmentation has been presented, and the proposed approach performance has been detailed. The suggested approach has been compared with the known methods of speech/pause segmentation for pure and noisy speech signals. The research findings have evidenced the best results of speech/pause segmentation for pure and noisy speech signals being achieved by the methods based on the proposed approach; the ratio of the short-term energy of the Teager operator function to the mean frequency as an informative parameter ensuring maximum relevance to the segmentation problem; an auxiliary algorithm to correct false states enhancing the efficiency of segmentation.
基于时频分析的语音信号分割新方法
语音信号分割的准确性直接取决于用于确定连续语音流中信息片段的开始和结束边界的参数。通过对语音信号进行频率-时间分析,提高语音/暂停分割的效率。提出了一种基于平均频率(频域)和Teager算子短时能量(时域)分析的语音/暂停分割新方法。该方法的独特之处在于,它基于连续语音流形成过程中呼吸器官的生理功能开发了一种辅助算法来纠正语音/暂停分割错误。简要概述了用于语音/暂停分割的语音信号信息参数,并详细介绍了所提出的方法的性能。所提出的方法已与已知的纯和噪声语音信号的语音/暂停分割方法进行了比较。研究结果表明,基于该方法的语音/暂停分割方法在纯语音和含噪语音信号中均取得了较好的分割效果;Teager算子函数的短期能量与平均频率的比值作为信息参数,确保与分割问题的最大相关性;一种校正假态的辅助算法,提高分割效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信