T. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy
{"title":"无监督声学建模的深度学习方法- Leap提交给2017年零语音挑战","authors":"T. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy","doi":"10.1109/ASRU.2017.8269013","DOIUrl":null,"url":null,"abstract":"In this paper, we present our system submission to the ZeroSpeech 2017 Challenge. The track1 of this challenge is intended to develop language independent speech representations that provide the least pairwise ABX distance computed for within speaker and across speaker pairs of spoken words. We investigate two approaches based on deep learning methods for unsupervised modeling. In the first approach, a deep neural network (DNN) is trained on the posteriors of mixture component indices obtained from training a Gaussian mixture model (GMM)-UBM. In the second approach, we develop a similar hidden Markov model (HMM) based DNN model to learn the unsupervised acoustic units provided by HMM state alignments. In addition, we also develop a deep autoencoder which learns language independent embeddings of speech to train the HMM-DNN model. Both the approaches do not use any labeled training data or require any supervision. We perform several experiments using the ZeroSpeech 2017 corpus with the minimal pair ABX error measure. In these experiments, we find that the two proposed approaches significantly improve over the baseline system using MFCC features (average relative improvements of 30–40%). Furthermore, the system combination of the two proposed approaches improves the performance over the best individual system.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Deep learning methods for unsupervised acoustic modeling — Leap submission to ZeroSpeech challenge 2017\",\"authors\":\"T. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy\",\"doi\":\"10.1109/ASRU.2017.8269013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present our system submission to the ZeroSpeech 2017 Challenge. The track1 of this challenge is intended to develop language independent speech representations that provide the least pairwise ABX distance computed for within speaker and across speaker pairs of spoken words. We investigate two approaches based on deep learning methods for unsupervised modeling. In the first approach, a deep neural network (DNN) is trained on the posteriors of mixture component indices obtained from training a Gaussian mixture model (GMM)-UBM. In the second approach, we develop a similar hidden Markov model (HMM) based DNN model to learn the unsupervised acoustic units provided by HMM state alignments. In addition, we also develop a deep autoencoder which learns language independent embeddings of speech to train the HMM-DNN model. Both the approaches do not use any labeled training data or require any supervision. We perform several experiments using the ZeroSpeech 2017 corpus with the minimal pair ABX error measure. In these experiments, we find that the two proposed approaches significantly improve over the baseline system using MFCC features (average relative improvements of 30–40%). Furthermore, the system combination of the two proposed approaches improves the performance over the best individual system.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8269013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8269013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep learning methods for unsupervised acoustic modeling — Leap submission to ZeroSpeech challenge 2017
In this paper, we present our system submission to the ZeroSpeech 2017 Challenge. The track1 of this challenge is intended to develop language independent speech representations that provide the least pairwise ABX distance computed for within speaker and across speaker pairs of spoken words. We investigate two approaches based on deep learning methods for unsupervised modeling. In the first approach, a deep neural network (DNN) is trained on the posteriors of mixture component indices obtained from training a Gaussian mixture model (GMM)-UBM. In the second approach, we develop a similar hidden Markov model (HMM) based DNN model to learn the unsupervised acoustic units provided by HMM state alignments. In addition, we also develop a deep autoencoder which learns language independent embeddings of speech to train the HMM-DNN model. Both the approaches do not use any labeled training data or require any supervision. We perform several experiments using the ZeroSpeech 2017 corpus with the minimal pair ABX error measure. In these experiments, we find that the two proposed approaches significantly improve over the baseline system using MFCC features (average relative improvements of 30–40%). Furthermore, the system combination of the two proposed approaches improves the performance over the best individual system.