Son T. Nguyen, Viet Dac Lai, Quyen Dam-Ba, Anh Nguyen-Xuan, Cuong Pham
{"title":"使用深度模型的越南语说话人身份验证","authors":"Son T. Nguyen, Viet Dac Lai, Quyen Dam-Ba, Anh Nguyen-Xuan, Cuong Pham","doi":"10.1145/3287921.3287954","DOIUrl":null,"url":null,"abstract":"Speaker Authentication is the identification of a user from voice biometrics and has a wide range of applications such as banking security, human computer interaction and ambient authentication. In this work, we investigate the effectiveness of acoustic features such as Mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), and Linear Predictive Codes (LPC) extracted from audio streams for constructing feature spectral images. In addition, we propose to use the deep Residual Network models for user verification from feature spectrum images. We evaluate our proposed method under two settings over the dataset collected from 20 Vietnamese speakers. The results, with the Equal Error rate of around 4%, have demonstrated that the feasibility of Vietnamese speaker authentication by using deep Residual Network models trained with GFCC spectral feature images.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Vietnamese Speaker Authentication Using Deep Models\",\"authors\":\"Son T. Nguyen, Viet Dac Lai, Quyen Dam-Ba, Anh Nguyen-Xuan, Cuong Pham\",\"doi\":\"10.1145/3287921.3287954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker Authentication is the identification of a user from voice biometrics and has a wide range of applications such as banking security, human computer interaction and ambient authentication. In this work, we investigate the effectiveness of acoustic features such as Mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), and Linear Predictive Codes (LPC) extracted from audio streams for constructing feature spectral images. In addition, we propose to use the deep Residual Network models for user verification from feature spectrum images. We evaluate our proposed method under two settings over the dataset collected from 20 Vietnamese speakers. The results, with the Equal Error rate of around 4%, have demonstrated that the feasibility of Vietnamese speaker authentication by using deep Residual Network models trained with GFCC spectral feature images.\",\"PeriodicalId\":448008,\"journal\":{\"name\":\"Proceedings of the 9th International Symposium on Information and Communication Technology\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th International Symposium on Information and Communication Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3287921.3287954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Symposium on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3287921.3287954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Vietnamese Speaker Authentication Using Deep Models
Speaker Authentication is the identification of a user from voice biometrics and has a wide range of applications such as banking security, human computer interaction and ambient authentication. In this work, we investigate the effectiveness of acoustic features such as Mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), and Linear Predictive Codes (LPC) extracted from audio streams for constructing feature spectral images. In addition, we propose to use the deep Residual Network models for user verification from feature spectrum images. We evaluate our proposed method under two settings over the dataset collected from 20 Vietnamese speakers. The results, with the Equal Error rate of around 4%, have demonstrated that the feasibility of Vietnamese speaker authentication by using deep Residual Network models trained with GFCC spectral feature images.