探索使用双语音和基于深度学习的人工神经网络自动识别说话人的性能

Frontiers in Artificial Intelligence Pub Date : 2024-02-08 DOI:10.3389/frai.2024.1287877

Julio Cesar Cavalcanti, Ronaldo Rodrigues da Silva, Anders Eriksson, P. Barbosa

{"title":"探索使用双语音和基于深度学习的人工神经网络自动识别说话人的性能","authors":"Julio Cesar Cavalcanti, Ronaldo Rodrigues da Silva, Anders Eriksson, P. Barbosa","doi":"10.3389/frai.2024.1287877","DOIUrl":null,"url":null,"abstract":"This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"4 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks\",\"authors\":\"Julio Cesar Cavalcanti, Ronaldo Rodrigues da Silva, Anders Eriksson, P. Barbosa\",\"doi\":\"10.3389/frai.2024.1287877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.\",\"PeriodicalId\":508738,\"journal\":{\"name\":\"Frontiers in Artificial Intelligence\",\"volume\":\"4 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frai.2024.1287877\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1287877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本研究利用 SpeechBrain 工具包评估了说话者相似度和样本长度对自动说话者识别（ASR）系统性能的影响。数据集由 20 位男性同卵双胞胎说话者在自发对话和访谈中的录音组成。性能评估包括比较同卵双胞胎、数据集中的所有发言人（包括双胞胎）和不包括双胞胎的所有发言人。语音样本从 5 秒到 30 秒不等，根据等错误率 (EER) 和对数成本似然比 (Cllr) 进行评估。结果表明，同卵双胞胎给 ASR 系统带来了巨大的挑战，导致说话人的整体识别准确率下降。此外，基于较长语音样本的分析结果优于使用较短样本的分析结果。随着样本量的增加，说话人内部和说话人之间相似度得分的标准偏差值都有所下降，这表明与较短的样本相比，较长语音样本在估计说话人相似度/不相似度水平方面的变异性有所降低。研究还发现，同卵双胞胎之间的相似程度各不相同，某些双胞胎对自动识别系统提出了更大的挑战。这些结果与之前的研究结果一致，并在相关文献中进行了讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks

This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Artificial Intelligence

自引率

0.00%

发文量