Improving Robustness of Age and Gender Prediction based on Custom Speech Data

Veerapandiyan Kandasamy, Anup Bera
{"title":"Improving Robustness of Age and Gender Prediction based on Custom Speech Data","authors":"Veerapandiyan Kandasamy, Anup Bera","doi":"10.5121/csit.2022.122005","DOIUrl":null,"url":null,"abstract":"With the increased use of human-machine interaction via voice enabled smart devices over the years, there are growing demands for better accuracy of the speech analytics systems. Several studies show that speech analytics system exhibits bias towards speaker demographics, such age, gender, race, accent etc. To avoid such a bias, speaker demographic information can be used to prepare training dataset for the speech analytics model. Also, speaker demographic information can be used for targeted advertisement, recommendation, and forensic science. In this research we will demonstrate some algorithms for age and gender prediction from speech data with our custom dataset that covers speakers from around the world with varying accents. In order to extract speaker age and gender from speech data, we’ve also included a method for determining the appropriate length of audio file to be ingested into the system, which will reduce computational time. This study also identifies the most effective padding and cropping mechanism for obtaining the best results from the input audio file. We investigated the impact of various parameters on the performance and end-to-end implementation of a real-time speaker age and gender information extraction system. Our best model has a RMSE value of 4.1 for age prediction and 99.5% for gender prediction on custom test dataset.","PeriodicalId":105776,"journal":{"name":"Signal, Image Processing and Embedded Systems Trends","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal, Image Processing and Embedded Systems Trends","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/csit.2022.122005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

With the increased use of human-machine interaction via voice enabled smart devices over the years, there are growing demands for better accuracy of the speech analytics systems. Several studies show that speech analytics system exhibits bias towards speaker demographics, such age, gender, race, accent etc. To avoid such a bias, speaker demographic information can be used to prepare training dataset for the speech analytics model. Also, speaker demographic information can be used for targeted advertisement, recommendation, and forensic science. In this research we will demonstrate some algorithms for age and gender prediction from speech data with our custom dataset that covers speakers from around the world with varying accents. In order to extract speaker age and gender from speech data, we’ve also included a method for determining the appropriate length of audio file to be ingested into the system, which will reduce computational time. This study also identifies the most effective padding and cropping mechanism for obtaining the best results from the input audio file. We investigated the impact of various parameters on the performance and end-to-end implementation of a real-time speaker age and gender information extraction system. Our best model has a RMSE value of 4.1 for age prediction and 99.5% for gender prediction on custom test dataset.
基于自定义语音数据的年龄和性别预测鲁棒性改进
随着多年来通过支持语音的智能设备越来越多地使用人机交互,人们对语音分析系统的准确性的要求越来越高。一些研究表明,语音分析系统对说话者的人口统计数据(如年龄、性别、种族、口音等)存在偏见。为了避免这种偏差,说话人的人口统计信息可以用来为语音分析模型准备训练数据集。此外,说话人的人口统计信息可以用于有针对性的广告、推荐和法医科学。在这项研究中,我们将展示一些从语音数据中预测年龄和性别的算法,我们的自定义数据集涵盖了来自世界各地不同口音的说话者。为了从语音数据中提取说话人的年龄和性别,我们还包含了一种方法来确定要输入系统的音频文件的适当长度,这将减少计算时间。本研究还确定了从输入音频文件中获得最佳结果的最有效的填充和裁剪机制。我们研究了各种参数对实时说话人年龄和性别信息提取系统的性能和端到端实现的影响。我们的最佳模型在自定义测试数据集上,年龄预测的RMSE值为4.1,性别预测的RMSE值为99.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信