{"title":"Speaker verification with TIMIT corpus - some remarks on classical methods","authors":"A. Dustor","doi":"10.23919/spa50552.2020.9241298","DOIUrl":null,"url":null,"abstract":"The aim of this paper is to present some research on speaker verification system based on Gaussian Mixture Model-Universal Background Model (GMM-UBM) approach. All tests were done for the TIMIT corpus. Performance for the standard Mel-Frequency Cepstral Coefficients (MFCC) and dynamic delta features is shown. Influence of feature dimensionality and model complexity on Equal Error Rate (EER) is presented. Additionally, an impact of Voice Activity Detection (VAD) and normalization techniques like Cepstral Mean and Variance Normalization (CMVN) and RelAtive SpecTrA (RASTA) filtering is covered. Each combination of factors was examined. It is shown that careful selection of traditional techniques may lead to very satisfying results when it comes to achieved EER values.","PeriodicalId":157578,"journal":{"name":"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/spa50552.2020.9241298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The aim of this paper is to present some research on speaker verification system based on Gaussian Mixture Model-Universal Background Model (GMM-UBM) approach. All tests were done for the TIMIT corpus. Performance for the standard Mel-Frequency Cepstral Coefficients (MFCC) and dynamic delta features is shown. Influence of feature dimensionality and model complexity on Equal Error Rate (EER) is presented. Additionally, an impact of Voice Activity Detection (VAD) and normalization techniques like Cepstral Mean and Variance Normalization (CMVN) and RelAtive SpecTrA (RASTA) filtering is covered. Each combination of factors was examined. It is shown that careful selection of traditional techniques may lead to very satisfying results when it comes to achieved EER values.