{"title":"伪标签质量对自监督说话人验证任务的影响分析","authors":"A. Fathan, J. Alam","doi":"10.1109/IWBF57495.2023.10157651","DOIUrl":null,"url":null,"abstract":"One of the most widely used self-supervised (SS) speaker verification (SV) system training methods is to optimize the speaker embedding network in a discriminative fashion using clustering algorithm (CA)-driven Pseudo-Labels (PLs). Although the PL-based SS training scheme showed impressive performance, recent studies have shown that label noise can significantly impact performance. In this paper, we have explored various PLs driven by different CAs and conducted a fine-grained analysis of the relationship between the quality of the PLs and the SV performance. Experimentally, we shed light on several previously overlooked aspects of the PLs that can impact SV performance. Moreover, we could observe that the SS-SV performance is heavily dependent on multiple qualitative aspects of the CA used to generate the PLs. Furthermore, we show that SV performance can be severely degraded from overfitting the noisy PLs and that the mixup strategy can mitigate the memorization effects of label noise.","PeriodicalId":273412,"journal":{"name":"2023 11th International Workshop on Biometrics and Forensics (IWBF)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the influence of the quality of pseudo-labels on the self-supervised speaker verification task: a thorough analysis\",\"authors\":\"A. Fathan, J. Alam\",\"doi\":\"10.1109/IWBF57495.2023.10157651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the most widely used self-supervised (SS) speaker verification (SV) system training methods is to optimize the speaker embedding network in a discriminative fashion using clustering algorithm (CA)-driven Pseudo-Labels (PLs). Although the PL-based SS training scheme showed impressive performance, recent studies have shown that label noise can significantly impact performance. In this paper, we have explored various PLs driven by different CAs and conducted a fine-grained analysis of the relationship between the quality of the PLs and the SV performance. Experimentally, we shed light on several previously overlooked aspects of the PLs that can impact SV performance. Moreover, we could observe that the SS-SV performance is heavily dependent on multiple qualitative aspects of the CA used to generate the PLs. Furthermore, we show that SV performance can be severely degraded from overfitting the noisy PLs and that the mixup strategy can mitigate the memorization effects of label noise.\",\"PeriodicalId\":273412,\"journal\":{\"name\":\"2023 11th International Workshop on Biometrics and Forensics (IWBF)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 11th International Workshop on Biometrics and Forensics (IWBF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWBF57495.2023.10157651\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 11th International Workshop on Biometrics and Forensics (IWBF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWBF57495.2023.10157651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the influence of the quality of pseudo-labels on the self-supervised speaker verification task: a thorough analysis
One of the most widely used self-supervised (SS) speaker verification (SV) system training methods is to optimize the speaker embedding network in a discriminative fashion using clustering algorithm (CA)-driven Pseudo-Labels (PLs). Although the PL-based SS training scheme showed impressive performance, recent studies have shown that label noise can significantly impact performance. In this paper, we have explored various PLs driven by different CAs and conducted a fine-grained analysis of the relationship between the quality of the PLs and the SV performance. Experimentally, we shed light on several previously overlooked aspects of the PLs that can impact SV performance. Moreover, we could observe that the SS-SV performance is heavily dependent on multiple qualitative aspects of the CA used to generate the PLs. Furthermore, we show that SV performance can be severely degraded from overfitting the noisy PLs and that the mixup strategy can mitigate the memorization effects of label noise.