改进说话人识别的近似熵和经验模态分解

IF 0.9 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis Pub Date : 2020-12-02 DOI:10.1142/s2424922x20500114

R. A. Metzger, J. Doherty, D. Jenkins, D. L. Hall

{"title":"改进说话人识别的近似熵和经验模态分解","authors":"R. A. Metzger, J. Doherty, D. Jenkins, D. L. Hall","doi":"10.1142/s2424922x20500114","DOIUrl":null,"url":null,"abstract":"When processing real-world recordings of speech, it is highly probable noise will be present at some instance in the signal. Compounding this problem is the situation when the noise occurs in short, impulsive bursts at random intervals. Traditional signal processing methods used to detect speech rely on the spectral energy of the incoming signal to make a determination whether or not a segment of the signal contains speech. However when noise is present, this simple energy detection is prone to falsely flagging noise as speech. This paper will demonstrate an alternative way of processing a noisy speech signal utilizing a combination of information theoretic and signal processing principles to differentiate speech segments from noise. The utilization of this preprocessing technique will allow a speaker recognition system to train statistical speaker model using noise-corrupted speech files, and construct models statistically similar to those constructed from noise-free data. This preprocessing method will be shown to outperform traditional spectrum-based methods for both low-entropy and high-entropy noise in low signal-to-noise ratio environments, with a reduction in the feature space distortion when measured using the Cauchy–Schwarz (CS) distance metric.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"20 1","pages":"2050011:1-2050011:24"},"PeriodicalIF":0.9000,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Approximate Entropy and Empirical Mode Decomposition for Improved Speaker Recognition\",\"authors\":\"R. A. Metzger, J. Doherty, D. Jenkins, D. L. Hall\",\"doi\":\"10.1142/s2424922x20500114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When processing real-world recordings of speech, it is highly probable noise will be present at some instance in the signal. Compounding this problem is the situation when the noise occurs in short, impulsive bursts at random intervals. Traditional signal processing methods used to detect speech rely on the spectral energy of the incoming signal to make a determination whether or not a segment of the signal contains speech. However when noise is present, this simple energy detection is prone to falsely flagging noise as speech. This paper will demonstrate an alternative way of processing a noisy speech signal utilizing a combination of information theoretic and signal processing principles to differentiate speech segments from noise. The utilization of this preprocessing technique will allow a speaker recognition system to train statistical speaker model using noise-corrupted speech files, and construct models statistically similar to those constructed from noise-free data. This preprocessing method will be shown to outperform traditional spectrum-based methods for both low-entropy and high-entropy noise in low signal-to-noise ratio environments, with a reduction in the feature space distortion when measured using the Cauchy–Schwarz (CS) distance metric.\",\"PeriodicalId\":47145,\"journal\":{\"name\":\"Advances in Data Science and Adaptive Analysis\",\"volume\":\"20 1\",\"pages\":\"2050011:1-2050011:24\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2020-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Data Science and Adaptive Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s2424922x20500114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Science and Adaptive Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2424922x20500114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在处理真实的语音记录时，很可能在信号的某些实例中存在噪声。使这个问题更加复杂的是，噪声以随机间隔的短脉冲爆发的情况。用于检测语音的传统信号处理方法依赖于输入信号的频谱能量来确定信号的一段是否包含语音。然而，当噪声存在时，这种简单的能量检测容易错误地将噪声标记为语音。本文将展示一种处理噪声语音信号的替代方法，利用信息论和信号处理原理的结合来区分语音片段和噪声。利用这种预处理技术，说话人识别系统可以使用被噪声破坏的语音文件来训练统计说话人模型，并构建与无噪声数据相似的统计模型。在低信噪比环境中，这种预处理方法将优于传统的基于频谱的低熵和高熵噪声方法，并且在使用Cauchy-Schwarz (CS)距离度量测量时减少了特征空间失真。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Approximate Entropy and Empirical Mode Decomposition for Improved Speaker Recognition

When processing real-world recordings of speech, it is highly probable noise will be present at some instance in the signal. Compounding this problem is the situation when the noise occurs in short, impulsive bursts at random intervals. Traditional signal processing methods used to detect speech rely on the spectral energy of the incoming signal to make a determination whether or not a segment of the signal contains speech. However when noise is present, this simple energy detection is prone to falsely flagging noise as speech. This paper will demonstrate an alternative way of processing a noisy speech signal utilizing a combination of information theoretic and signal processing principles to differentiate speech segments from noise. The utilization of this preprocessing technique will allow a speaker recognition system to train statistical speaker model using noise-corrupted speech files, and construct models statistically similar to those constructed from noise-free data. This preprocessing method will be shown to outperform traditional spectrum-based methods for both low-entropy and high-entropy noise in low signal-to-noise ratio environments, with a reduction in the feature space distortion when measured using the Cauchy–Schwarz (CS) distance metric.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances in Data Science and Adaptive Analysis MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-

自引率

0.00%

发文量