Improving Robustness of Age and Gender Prediction based on Custom Speech Data

Signal, Image Processing and Embedded Systems Trends Pub Date : 2022-11-19 DOI:10.5121/csit.2022.122005

Veerapandiyan Kandasamy, Anup Bera

{"title":"Improving Robustness of Age and Gender Prediction based on Custom Speech Data","authors":"Veerapandiyan Kandasamy, Anup Bera","doi":"10.5121/csit.2022.122005","DOIUrl":null,"url":null,"abstract":"With the increased use of human-machine interaction via voice enabled smart devices over the years, there are growing demands for better accuracy of the speech analytics systems. Several studies show that speech analytics system exhibits bias towards speaker demographics, such age, gender, race, accent etc. To avoid such a bias, speaker demographic information can be used to prepare training dataset for the speech analytics model. Also, speaker demographic information can be used for targeted advertisement, recommendation, and forensic science. In this research we will demonstrate some algorithms for age and gender prediction from speech data with our custom dataset that covers speakers from around the world with varying accents. In order to extract speaker age and gender from speech data, we’ve also included a method for determining the appropriate length of audio file to be ingested into the system, which will reduce computational time. This study also identifies the most effective padding and cropping mechanism for obtaining the best results from the input audio file. We investigated the impact of various parameters on the performance and end-to-end implementation of a real-time speaker age and gender information extraction system. Our best model has a RMSE value of 4.1 for age prediction and 99.5% for gender prediction on custom test dataset.","PeriodicalId":105776,"journal":{"name":"Signal, Image Processing and Embedded Systems Trends","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal, Image Processing and Embedded Systems Trends","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/csit.2022.122005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

With the increased use of human-machine interaction via voice enabled smart devices over the years, there are growing demands for better accuracy of the speech analytics systems. Several studies show that speech analytics system exhibits bias towards speaker demographics, such age, gender, race, accent etc. To avoid such a bias, speaker demographic information can be used to prepare training dataset for the speech analytics model. Also, speaker demographic information can be used for targeted advertisement, recommendation, and forensic science. In this research we will demonstrate some algorithms for age and gender prediction from speech data with our custom dataset that covers speakers from around the world with varying accents. In order to extract speaker age and gender from speech data, we’ve also included a method for determining the appropriate length of audio file to be ingested into the system, which will reduce computational time. This study also identifies the most effective padding and cropping mechanism for obtaining the best results from the input audio file. We investigated the impact of various parameters on the performance and end-to-end implementation of a real-time speaker age and gender information extraction system. Our best model has a RMSE value of 4.1 for age prediction and 99.5% for gender prediction on custom test dataset.

查看原文本刊更多论文

基于自定义语音数据的年龄和性别预测鲁棒性改进

随着多年来通过支持语音的智能设备越来越多地使用人机交互，人们对语音分析系统的准确性的要求越来越高。一些研究表明，语音分析系统对说话者的人口统计数据(如年龄、性别、种族、口音等)存在偏见。为了避免这种偏差，说话人的人口统计信息可以用来为语音分析模型准备训练数据集。此外，说话人的人口统计信息可以用于有针对性的广告、推荐和法医科学。在这项研究中，我们将展示一些从语音数据中预测年龄和性别的算法，我们的自定义数据集涵盖了来自世界各地不同口音的说话者。为了从语音数据中提取说话人的年龄和性别，我们还包含了一种方法来确定要输入系统的音频文件的适当长度，这将减少计算时间。本研究还确定了从输入音频文件中获得最佳结果的最有效的填充和裁剪机制。我们研究了各种参数对实时说话人年龄和性别信息提取系统的性能和端到端实现的影响。我们的最佳模型在自定义测试数据集上，年龄预测的RMSE值为4.1，性别预测的RMSE值为99.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal, Image Processing and Embedded Systems Trends

自引率

0.00%

发文量