Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique

IF 1.3 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Thurgeaswary Rokanatnam, Hazinah Kutty Mammi
{"title":"Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique","authors":"Thurgeaswary Rokanatnam, Hazinah Kutty Mammi","doi":"10.11113/ijic.v11n2.343","DOIUrl":null,"url":null,"abstract":"Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.","PeriodicalId":50314,"journal":{"name":"International Journal of Innovative Computing Information and Control","volume":"43 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Innovative Computing Information and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/ijic.v11n2.343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.
基于高斯混合模型和Mel倒谱系数技术的录音性别识别研究
说话人识别是一种基于口语识别说话人特征的能力。本研究的目的是根据录音来确定说话人的性别。本研究的目的是评估该技术区分性别的准确率,并确定即使使用自获得录音也能进行分类的表现率。音频取证使用录音作为破案证据的一部分。本研究主要是为了在法医领域提供一种更容易识别未知说话人特征的技术。该实验通过使用性别相关数据训练模式分类器来完成。为了训练模型,从在线语音语料库中获得一个由男性和女性演讲者组成的语音数据库。在测试阶段,除了语音语料库中的数据外,还将使用UTM学生的录音来确定本次说话人识别实验的准确率。在实验技术上,采用Mel Frequency倒频谱系数(MFCC)算法提取语音数据中的特征,采用高斯混合模型(GMM)对性别标识符进行建模。本实验未对任何语音数据进行去噪处理。使用Python软件使用MFCC系数提取,并使用GMM技术对行为进行建模。实验结果表明,GMM-MFCC技术可以在不同语言下识别性别,但准确率存在差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.20
自引率
20.00%
发文量
0
审稿时长
4.3 months
期刊介绍: The primary aim of the International Journal of Innovative Computing, Information and Control (IJICIC) is to publish high-quality papers of new developments and trends, novel techniques and approaches, innovative methodologies and technologies on the theory and applications of intelligent systems, information and control. The IJICIC is a peer-reviewed English language journal and is published bimonthly
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信