基于高斯混合模型和Mel倒谱系数技术的录音性别识别研究

IF 1.1 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Innovative Computing Information and Control Pub Date : 2021-10-31 DOI:10.11113/ijic.v11n2.343

Thurgeaswary Rokanatnam, Hazinah Kutty Mammi

{"title":"基于高斯混合模型和Mel倒谱系数技术的录音性别识别研究","authors":"Thurgeaswary Rokanatnam, Hazinah Kutty Mammi","doi":"10.11113/ijic.v11n2.343","DOIUrl":null,"url":null,"abstract":"Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.","PeriodicalId":50314,"journal":{"name":"International Journal of Innovative Computing Information and Control","volume":"43 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique\",\"authors\":\"Thurgeaswary Rokanatnam, Hazinah Kutty Mammi\",\"doi\":\"10.11113/ijic.v11n2.343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.\",\"PeriodicalId\":50314,\"journal\":{\"name\":\"International Journal of Innovative Computing Information and Control\",\"volume\":\"43 1\",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2021-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Innovative Computing Information and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11113/ijic.v11n2.343\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Innovative Computing Information and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/ijic.v11n2.343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

说话人识别是一种基于口语识别说话人特征的能力。本研究的目的是根据录音来确定说话人的性别。本研究的目的是评估该技术区分性别的准确率，并确定即使使用自获得录音也能进行分类的表现率。音频取证使用录音作为破案证据的一部分。本研究主要是为了在法医领域提供一种更容易识别未知说话人特征的技术。该实验通过使用性别相关数据训练模式分类器来完成。为了训练模型，从在线语音语料库中获得一个由男性和女性演讲者组成的语音数据库。在测试阶段，除了语音语料库中的数据外，还将使用UTM学生的录音来确定本次说话人识别实验的准确率。在实验技术上，采用Mel Frequency倒频谱系数(MFCC)算法提取语音数据中的特征，采用高斯混合模型(GMM)对性别标识符进行建模。本实验未对任何语音数据进行去噪处理。使用Python软件使用MFCC系数提取，并使用GMM技术对行为进行建模。实验结果表明，GMM-MFCC技术可以在不同语言下识别性别，但准确率存在差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique

Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Innovative Computing Information and Control 工程技术-计算机：人工智能

CiteScore

3.20

自引率

20.00%

发文量

审稿时长

4.3 months

期刊介绍： The primary aim of the International Journal of Innovative Computing, Information and Control (IJICIC) is to publish high-quality papers of new developments and trends, novel techniques and approaches, innovative methodologies and technologies on the theory and applications of intelligent systems, information and control. The IJICIC is a peer-reviewed English language journal and is published bimonthly