{"title":"基于说话人识别的耳语检测与校准","authors":"Finnian Kelly, J. Hansen","doi":"10.1109/SLT.2018.8639595","DOIUrl":null,"url":null,"abstract":"Whisper is a commonly encountered form of speech that differs significantly from modal speech. As speaker recognition technology becomes more ubiquitous, it is important to assess the abilities and limitations of systems in the presence of variability such as whisper. In this paper, a comparative evaluation of whispered speaker recognition performance across two independent datasets is presented. Whisper-neutral speech comparisons are observed to consistently degrade performance relative to both neutral-neutral and whisper-whisper comparisons. An i-vector-based approach to whisper detection is introduced, and is shown to perform accurately across datasets even at short durations. The output of the whisper detector is subsequently used to select score calibration parameters for whispered speech comparisons, leading to a reduction in global calibration and discrimination error.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Detection and Calibration of Whisper for Speaker Recognition\",\"authors\":\"Finnian Kelly, J. Hansen\",\"doi\":\"10.1109/SLT.2018.8639595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Whisper is a commonly encountered form of speech that differs significantly from modal speech. As speaker recognition technology becomes more ubiquitous, it is important to assess the abilities and limitations of systems in the presence of variability such as whisper. In this paper, a comparative evaluation of whispered speaker recognition performance across two independent datasets is presented. Whisper-neutral speech comparisons are observed to consistently degrade performance relative to both neutral-neutral and whisper-whisper comparisons. An i-vector-based approach to whisper detection is introduced, and is shown to perform accurately across datasets even at short durations. The output of the whisper detector is subsequently used to select score calibration parameters for whispered speech comparisons, leading to a reduction in global calibration and discrimination error.\",\"PeriodicalId\":377307,\"journal\":{\"name\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2018.8639595\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detection and Calibration of Whisper for Speaker Recognition
Whisper is a commonly encountered form of speech that differs significantly from modal speech. As speaker recognition technology becomes more ubiquitous, it is important to assess the abilities and limitations of systems in the presence of variability such as whisper. In this paper, a comparative evaluation of whispered speaker recognition performance across two independent datasets is presented. Whisper-neutral speech comparisons are observed to consistently degrade performance relative to both neutral-neutral and whisper-whisper comparisons. An i-vector-based approach to whisper detection is introduced, and is shown to perform accurately across datasets even at short durations. The output of the whisper detector is subsequently used to select score calibration parameters for whispered speech comparisons, leading to a reduction in global calibration and discrimination error.