{"title":"基于自组织映射的库识别短话语说话人","authors":"Narumitsu Ikeda, Yoshinao Sato, Hirokazu Takahashi","doi":"10.1109/SLT.2018.8639570","DOIUrl":null,"url":null,"abstract":"Short utterances cause performance degradation in conventional speaker recognition systems based on i-vector, which relies on the statistics of spectral features. To overcome this difficulty, we propose a novel method that utilizes the dynamics of the spectral features as well as their distribution. Our model integrates echo state network (ESN), a type of reservoir computing architecture, and self-organizing map (SOM), a competitive learning network. The ESN consists of a single-hidden-layer recurrent neural network with randomly fixed weights, which extracts temporal patterns of the spectral features. The input weights of our model are trained using the unsupervised competitive learning algorithm of the SOM, before enrollment, to extract the intrinsic structure of the spectral features, whereas the input weights are fixed randomly in the original ESN. In enrollment, the output weights are trained in a supervised manner to recognize an individual in a group of speakers. Our experiment demonstrates that the proposed method outperforms or is comparable to a baseline i-vector system for text-independent speaker identification on short utterances.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Short Utterance Speaker Recognition by Reservoir with Self-Organized Mapping\",\"authors\":\"Narumitsu Ikeda, Yoshinao Sato, Hirokazu Takahashi\",\"doi\":\"10.1109/SLT.2018.8639570\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Short utterances cause performance degradation in conventional speaker recognition systems based on i-vector, which relies on the statistics of spectral features. To overcome this difficulty, we propose a novel method that utilizes the dynamics of the spectral features as well as their distribution. Our model integrates echo state network (ESN), a type of reservoir computing architecture, and self-organizing map (SOM), a competitive learning network. The ESN consists of a single-hidden-layer recurrent neural network with randomly fixed weights, which extracts temporal patterns of the spectral features. The input weights of our model are trained using the unsupervised competitive learning algorithm of the SOM, before enrollment, to extract the intrinsic structure of the spectral features, whereas the input weights are fixed randomly in the original ESN. In enrollment, the output weights are trained in a supervised manner to recognize an individual in a group of speakers. Our experiment demonstrates that the proposed method outperforms or is comparable to a baseline i-vector system for text-independent speaker identification on short utterances.\",\"PeriodicalId\":377307,\"journal\":{\"name\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2018.8639570\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Short Utterance Speaker Recognition by Reservoir with Self-Organized Mapping
Short utterances cause performance degradation in conventional speaker recognition systems based on i-vector, which relies on the statistics of spectral features. To overcome this difficulty, we propose a novel method that utilizes the dynamics of the spectral features as well as their distribution. Our model integrates echo state network (ESN), a type of reservoir computing architecture, and self-organizing map (SOM), a competitive learning network. The ESN consists of a single-hidden-layer recurrent neural network with randomly fixed weights, which extracts temporal patterns of the spectral features. The input weights of our model are trained using the unsupervised competitive learning algorithm of the SOM, before enrollment, to extract the intrinsic structure of the spectral features, whereas the input weights are fixed randomly in the original ESN. In enrollment, the output weights are trained in a supervised manner to recognize an individual in a group of speakers. Our experiment demonstrates that the proposed method outperforms or is comparable to a baseline i-vector system for text-independent speaker identification on short utterances.