利用 MFCC 特征提取、Chi-Square 和分类技术从伪装语音中识别说话人

IF 2.2 4区计算机科学 Q3 TELECOMMUNICATIONS

Wireless Personal Communications Pub Date : 2024-09-10 DOI:10.1007/s11277-024-11542-0

Mahesh K. Singh

{"title":"利用 MFCC 特征提取、Chi-Square 和分类技术从伪装语音中识别说话人","authors":"Mahesh K. Singh","doi":"10.1007/s11277-024-11542-0","DOIUrl":null,"url":null,"abstract":"<p>The purpose of this manuscript is to show that certain acoustic features can be used to recognize the disguised speech of unknown speakers. As the name implies, forensic speaker identification entails the use of scientific techniques to ascertain an unknown speaker’s identity during an inquiry. This study aims to provide a voice recognition method that works well. To distinguish between speech and background noise in each frame, chi-square tests are utilized. The estimated background noise is continuously modified to achieve this. Chi-square noise estimations are then obtained once background noise has initially been reduced. The observed signal distribution and the estimated noise distribution are compared using a second chi-square test, this time using a different approach. For the frame to be labelled as noise, the chi-square test scores must be close together. Mel-frequency cepstrum coefficient (MFCC), features are grouped as three-dimensional features. The correlation coefficient characteristics of speech are coupled with the different MFCC feature extraction technique. The feature-based classification is done with support vector machine (SVM) classifiers and k-nearest neighbor (k-NN) classification technique. Classification results show that applying these unique features in an SVM classifier boosts classification accuracy.</p>","PeriodicalId":23827,"journal":{"name":"Wireless Personal Communications","volume":"138 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique\",\"authors\":\"Mahesh K. Singh\",\"doi\":\"10.1007/s11277-024-11542-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The purpose of this manuscript is to show that certain acoustic features can be used to recognize the disguised speech of unknown speakers. As the name implies, forensic speaker identification entails the use of scientific techniques to ascertain an unknown speaker’s identity during an inquiry. This study aims to provide a voice recognition method that works well. To distinguish between speech and background noise in each frame, chi-square tests are utilized. The estimated background noise is continuously modified to achieve this. Chi-square noise estimations are then obtained once background noise has initially been reduced. The observed signal distribution and the estimated noise distribution are compared using a second chi-square test, this time using a different approach. For the frame to be labelled as noise, the chi-square test scores must be close together. Mel-frequency cepstrum coefficient (MFCC), features are grouped as three-dimensional features. The correlation coefficient characteristics of speech are coupled with the different MFCC feature extraction technique. The feature-based classification is done with support vector machine (SVM) classifiers and k-nearest neighbor (k-NN) classification technique. Classification results show that applying these unique features in an SVM classifier boosts classification accuracy.</p>\",\"PeriodicalId\":23827,\"journal\":{\"name\":\"Wireless Personal Communications\",\"volume\":\"138 1\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Wireless Personal Communications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11277-024-11542-0\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wireless Personal Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11277-024-11542-0","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

本手稿旨在说明某些声学特征可用于识别未知说话者的伪装语音。顾名思义，法证说话人识别就是在调查过程中使用科学技术来确定未知说话人的身份。本研究旨在提供一种行之有效的语音识别方法。为了区分每一帧中的语音和背景噪声，采用了秩方检验。为此，会不断修改估计的背景噪声。一旦背景噪声得到初步降低，便可获得噪声的卡方估计值。观察到的信号分布和估计的噪声分布将通过第二次卡方检验进行比较，这次使用的是另一种方法。要将帧标记为噪声，卡方检验得分必须接近。Mel-frequency cepstrum coefficient (MFCC)、特征作为三维特征进行分组。语音的相关系数特征与不同的 MFCC 特征提取技术相结合。支持向量机 (SVM) 分类器和 k 近邻 (k-NN) 分类技术完成了基于特征的分类。分类结果表明，在 SVM 分类器中应用这些独特的特征可提高分类准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique

查看原文本刊更多论文

Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique

The purpose of this manuscript is to show that certain acoustic features can be used to recognize the disguised speech of unknown speakers. As the name implies, forensic speaker identification entails the use of scientific techniques to ascertain an unknown speaker’s identity during an inquiry. This study aims to provide a voice recognition method that works well. To distinguish between speech and background noise in each frame, chi-square tests are utilized. The estimated background noise is continuously modified to achieve this. Chi-square noise estimations are then obtained once background noise has initially been reduced. The observed signal distribution and the estimated noise distribution are compared using a second chi-square test, this time using a different approach. For the frame to be labelled as noise, the chi-square test scores must be close together. Mel-frequency cepstrum coefficient (MFCC), features are grouped as three-dimensional features. The correlation coefficient characteristics of speech are coupled with the different MFCC feature extraction technique. The feature-based classification is done with support vector machine (SVM) classifiers and k-nearest neighbor (k-NN) classification technique. Classification results show that applying these unique features in an SVM classifier boosts classification accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Wireless Personal Communications 工程技术-电信学

CiteScore

5.80

自引率

9.10%

发文量

663

审稿时长

6.8 months

期刊介绍： The Journal on Mobile Communication and Computing ... Publishes tutorial, survey, and original research papers addressing mobile communications and computing; Investigates theoretical, engineering, and experimental aspects of radio communications, voice, data, images, and multimedia; Explores propagation, system models, speech and image coding, multiple access techniques, protocols, performance evaluation, radio local area networks, and networking and architectures, etc.; 98% of authors who answered a survey reported that they would definitely publish or probably publish in the journal again. Wireless Personal Communications is an archival, peer reviewed, scientific and technical journal addressing mobile communications and computing. It investigates theoretical, engineering, and experimental aspects of radio communications, voice, data, images, and multimedia. A partial list of topics included in the journal is: propagation, system models, speech and image coding, multiple access techniques, protocols performance evaluation, radio local area networks, and networking and architectures. In addition to the above mentioned areas, the journal also accepts papers that deal with interdisciplinary aspects of wireless communications along with: big data and analytics, business and economy, society, and the environment. The journal features five principal types of papers: full technical papers, short papers, technical aspects of policy and standardization, letters offering new research thoughts and experimental ideas, and invited papers on important and emerging topics authored by renowned experts.