理论学习保证了声学建模的应用

Journal of the Brazilian Computer Society Pub Date : 2019-01-04 DOI:10.1186/s13173-018-0081-3

Christopher D. Shulby, Martha D. Ferreira, Rodrigo F. de Mello, Sandra M. Aluisio

{"title":"理论学习保证了声学建模的应用","authors":"Christopher D. Shulby, Martha D. Ferreira, Rodrigo F. de Mello, Sandra M. Aluisio","doi":"10.1186/s13173-018-0081-3","DOIUrl":null,"url":null,"abstract":"In low-resource scenarios, for example, small datasets or a lack in computational resources available, state-of-the-art deep learning methods for speech recognition have been known to fail. It is possible to achieve more robust models if care is taken to ensure the learning guarantees provided by the statistical learning theory. This work presents a shallow and hybrid approach using a convolutional neural network feature extractor fed into a hierarchical tree of support vector machines for classification. Here, we show that gross errors present even in state-of-the-art systems can be avoided and that an accurate acoustic model can be built in a hierarchical fashion. Furthermore, we present proof that our algorithm does adhere to the learning guarantees provided by the statistical learning theory. The acoustic model produced in this work outperforms traditional hidden Markov models, and the hierarchical support vector machine tree outperforms a multi-class multilayer perceptron classifier using the same features. More importantly, we isolate the performance of the acoustic model and provide results on both the frame and phoneme level, considering the true robustness of the model. We show that even with a small amount of data, accurate and robust recognition rates can be obtained.","PeriodicalId":39760,"journal":{"name":"Journal of the Brazilian Computer Society","volume":"20 10","pages":"1-12"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Theoretical learning guarantees applied to acoustic modeling\",\"authors\":\"Christopher D. Shulby, Martha D. Ferreira, Rodrigo F. de Mello, Sandra M. Aluisio\",\"doi\":\"10.1186/s13173-018-0081-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In low-resource scenarios, for example, small datasets or a lack in computational resources available, state-of-the-art deep learning methods for speech recognition have been known to fail. It is possible to achieve more robust models if care is taken to ensure the learning guarantees provided by the statistical learning theory. This work presents a shallow and hybrid approach using a convolutional neural network feature extractor fed into a hierarchical tree of support vector machines for classification. Here, we show that gross errors present even in state-of-the-art systems can be avoided and that an accurate acoustic model can be built in a hierarchical fashion. Furthermore, we present proof that our algorithm does adhere to the learning guarantees provided by the statistical learning theory. The acoustic model produced in this work outperforms traditional hidden Markov models, and the hierarchical support vector machine tree outperforms a multi-class multilayer perceptron classifier using the same features. More importantly, we isolate the performance of the acoustic model and provide results on both the frame and phoneme level, considering the true robustness of the model. We show that even with a small amount of data, accurate and robust recognition rates can be obtained.\",\"PeriodicalId\":39760,\"journal\":{\"name\":\"Journal of the Brazilian Computer Society\",\"volume\":\"20 10\",\"pages\":\"1-12\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Brazilian Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13173-018-0081-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Brazilian Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13173-018-0081-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

在资源匮乏的情况下，例如，小数据集或缺乏可用的计算资源，最先进的语音识别深度学习方法已经失败。如果注意确保统计学习理论提供的学习保证，则有可能实现更健壮的模型。这项工作提出了一种浅层和混合的方法，使用卷积神经网络特征提取器馈送到支持向量机的分层树中进行分类。在这里，我们展示了即使在最先进的系统中也可以避免出现的严重误差，并且可以以分层方式建立准确的声学模型。此外，我们还证明了我们的算法确实遵循了统计学习理论提供的学习保证。在这项工作中产生的声学模型优于传统的隐马尔可夫模型，分层支持向量机树优于使用相同特征的多类多层感知器分类器。更重要的是，考虑到模型的真正鲁棒性，我们分离了声学模型的性能，并在框架和音素水平上提供了结果。我们表明，即使在少量数据下，也可以获得准确和鲁棒的识别率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Theoretical learning guarantees applied to acoustic modeling

In low-resource scenarios, for example, small datasets or a lack in computational resources available, state-of-the-art deep learning methods for speech recognition have been known to fail. It is possible to achieve more robust models if care is taken to ensure the learning guarantees provided by the statistical learning theory. This work presents a shallow and hybrid approach using a convolutional neural network feature extractor fed into a hierarchical tree of support vector machines for classification. Here, we show that gross errors present even in state-of-the-art systems can be avoided and that an accurate acoustic model can be built in a hierarchical fashion. Furthermore, we present proof that our algorithm does adhere to the learning guarantees provided by the statistical learning theory. The acoustic model produced in this work outperforms traditional hidden Markov models, and the hierarchical support vector machine tree outperforms a multi-class multilayer perceptron classifier using the same features. More importantly, we isolate the performance of the acoustic model and provide results on both the frame and phoneme level, considering the true robustness of the model. We show that even with a small amount of data, accurate and robust recognition rates can be obtained.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Brazilian Computer Society Computer Science-Computer Science (all)

CiteScore

2.40

自引率

0.00%

发文量

期刊介绍： JBCS is a formal quarterly publication of the Brazilian Computer Society. It is a peer-reviewed international journal which aims to serve as a forum to disseminate innovative research in all fields of computer science and related subjects. Theoretical, practical and experimental papers reporting original research contributions are welcome, as well as high quality survey papers. The journal is open to contributions in all computer science topics, computer systems development or in formal and theoretical aspects of computing, as the list of topics below is not exhaustive. Contributions will be considered for publication in JBCS if they have not been published previously and are not under consideration for publication elsewhere.