利用有限的数据集探索错配条件、噪声背景和说话人健康状况对基于卷积自动编码器的说话人识别系统的影响

Arundhati Niwatkar, Y. Kanse, Ajay Kumar Kushwaha
{"title":"利用有限的数据集探索错配条件、噪声背景和说话人健康状况对基于卷积自动编码器的说话人识别系统的影响","authors":"Arundhati Niwatkar, Y. Kanse, Ajay Kumar Kushwaha","doi":"10.4108/eetsis.5697","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach to enhance the success rate and accuracy of speaker recognition and identification systems. The methodology involves employing data augmentation techniques to enrich a small dataset with audio recordings from five speakers, covering both male and female voices. Python programming language is utilized for data processing, and a convolutional autoencoder is chosen as the model. Spectrograms are used to convert speech signals into images, serving as input for training the autoencoder. The developed speaker recognition system is compared against traditional systems relying on the MFCC feature extraction technique. In addition to addressing the challenges of a small dataset, the paper explores the impact of a \"mismatch condition\" by using different time durations of the audio signal during both training and testing phases. Through experiments involving various activation and loss functions, the optimal pair for the small dataset is identified, resulting in a high success rate of 92.4% in matched conditions. Traditionally, Mel-Frequency Cepstral Coefficients (MFCC) have been widely used for this purpose. However, the COVID-19 pandemic has drawn attention to the virus's impact on the human body, particularly on areas relevant to speech, such as the chest, throat, vocal cords, and related regions. COVID-19 symptoms, such as coughing, breathing difficulties, and throat swelling, raise questions about the influence of the virus on MFCC, pitch, jitter, and shimmer features. Therefore, this research aims to investigate and understand the potential effects of COVID-19 on these crucial features, contributing valuable insights to the development of robust speaker recognition systems.","PeriodicalId":155438,"journal":{"name":"ICST Transactions on Scalable Information Systems","volume":"56 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited Dataset\",\"authors\":\"Arundhati Niwatkar, Y. Kanse, Ajay Kumar Kushwaha\",\"doi\":\"10.4108/eetsis.5697\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel approach to enhance the success rate and accuracy of speaker recognition and identification systems. The methodology involves employing data augmentation techniques to enrich a small dataset with audio recordings from five speakers, covering both male and female voices. Python programming language is utilized for data processing, and a convolutional autoencoder is chosen as the model. Spectrograms are used to convert speech signals into images, serving as input for training the autoencoder. The developed speaker recognition system is compared against traditional systems relying on the MFCC feature extraction technique. In addition to addressing the challenges of a small dataset, the paper explores the impact of a \\\"mismatch condition\\\" by using different time durations of the audio signal during both training and testing phases. Through experiments involving various activation and loss functions, the optimal pair for the small dataset is identified, resulting in a high success rate of 92.4% in matched conditions. Traditionally, Mel-Frequency Cepstral Coefficients (MFCC) have been widely used for this purpose. However, the COVID-19 pandemic has drawn attention to the virus's impact on the human body, particularly on areas relevant to speech, such as the chest, throat, vocal cords, and related regions. COVID-19 symptoms, such as coughing, breathing difficulties, and throat swelling, raise questions about the influence of the virus on MFCC, pitch, jitter, and shimmer features. Therefore, this research aims to investigate and understand the potential effects of COVID-19 on these crucial features, contributing valuable insights to the development of robust speaker recognition systems.\",\"PeriodicalId\":155438,\"journal\":{\"name\":\"ICST Transactions on Scalable Information Systems\",\"volume\":\"56 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICST Transactions on Scalable Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4108/eetsis.5697\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICST Transactions on Scalable Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/eetsis.5697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文介绍了一种提高扬声器识别和鉴定系统成功率和准确性的新方法。该方法涉及采用数据增强技术来丰富一个小型数据集,该数据集包含来自五位扬声器的录音,其中既有男声也有女声。数据处理使用 Python 编程语言,并选择卷积自动编码器作为模型。使用频谱图将语音信号转换成图像,作为训练自动编码器的输入。所开发的说话人识别系统与传统的依靠 MFCC 特征提取技术的系统进行了比较。除了应对小数据集带来的挑战外,本文还通过在训练和测试阶段使用不同时间长度的音频信号,探讨了 "不匹配条件 "的影响。通过涉及各种激活和损失函数的实验,确定了小数据集的最佳配对,从而使匹配条件下的成功率高达 92.4%。传统上,Mel-Frequency Cepstral Coefficients(MFCC)被广泛用于此目的。然而,COVID-19 大流行引起了人们对病毒对人体影响的关注,特别是对胸部、喉咙、声带和相关区域等与语言有关的部位的影响。COVID-19 的症状,如咳嗽、呼吸困难和喉咙肿胀,提出了病毒对 MFCC、音高、抖动和闪烁特征的影响问题。因此,本研究旨在调查和了解 COVID-19 对这些关键特征的潜在影响,为开发稳健的说话者识别系统提供有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited Dataset
This paper presents a novel approach to enhance the success rate and accuracy of speaker recognition and identification systems. The methodology involves employing data augmentation techniques to enrich a small dataset with audio recordings from five speakers, covering both male and female voices. Python programming language is utilized for data processing, and a convolutional autoencoder is chosen as the model. Spectrograms are used to convert speech signals into images, serving as input for training the autoencoder. The developed speaker recognition system is compared against traditional systems relying on the MFCC feature extraction technique. In addition to addressing the challenges of a small dataset, the paper explores the impact of a "mismatch condition" by using different time durations of the audio signal during both training and testing phases. Through experiments involving various activation and loss functions, the optimal pair for the small dataset is identified, resulting in a high success rate of 92.4% in matched conditions. Traditionally, Mel-Frequency Cepstral Coefficients (MFCC) have been widely used for this purpose. However, the COVID-19 pandemic has drawn attention to the virus's impact on the human body, particularly on areas relevant to speech, such as the chest, throat, vocal cords, and related regions. COVID-19 symptoms, such as coughing, breathing difficulties, and throat swelling, raise questions about the influence of the virus on MFCC, pitch, jitter, and shimmer features. Therefore, this research aims to investigate and understand the potential effects of COVID-19 on these crucial features, contributing valuable insights to the development of robust speaker recognition systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信