CNN在语音转文本中的应用——不同梯度优化器的比较分析

2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI) Pub Date : 2021-05-19 DOI:10.1109/SACI51354.2021.9465635

Theodora Gaiceanu, O. Pastravanu

{"title":"CNN在语音转文本中的应用——不同梯度优化器的比较分析","authors":"Theodora Gaiceanu, O. Pastravanu","doi":"10.1109/SACI51354.2021.9465635","DOIUrl":null,"url":null,"abstract":"In this paper the authors have developed a Convolutional Neural Network architecture adapted to Speech-to-Text research field. This type of network has been chosen due to its capacity to extract the relevant features and its popularity in classification problems. A particular model for a Speech-to-Text application has been designed. The parameters of the model (i.e. the size of filters and kernels), and the number of the layers have been chosen by conducting appropriate experiments, and the model that ensured the highest accuracy has been selected. The model takes raw waveforms of spoken digits as input, and outputs a text with the predicted digit. The network is capable of providing the right digit no matter the gender or age of the speaker. The overfitting has been avoided by using Dropout layers and early stopping function. In order to select the best model, the authors have taken into account two basic criteria: the accuracy of the model, and the execution time, respectively. Considering the computational time, the first order cost function has been chosen. By testing different gradient descent optimization algorithms, the best optimizer has been selected. The application has been developed using Python programming language.","PeriodicalId":321907,"journal":{"name":"2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On CNN Applied to Speech-to-Text – Comparative Analysis of Different Gradient Based Optimizers\",\"authors\":\"Theodora Gaiceanu, O. Pastravanu\",\"doi\":\"10.1109/SACI51354.2021.9465635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper the authors have developed a Convolutional Neural Network architecture adapted to Speech-to-Text research field. This type of network has been chosen due to its capacity to extract the relevant features and its popularity in classification problems. A particular model for a Speech-to-Text application has been designed. The parameters of the model (i.e. the size of filters and kernels), and the number of the layers have been chosen by conducting appropriate experiments, and the model that ensured the highest accuracy has been selected. The model takes raw waveforms of spoken digits as input, and outputs a text with the predicted digit. The network is capable of providing the right digit no matter the gender or age of the speaker. The overfitting has been avoided by using Dropout layers and early stopping function. In order to select the best model, the authors have taken into account two basic criteria: the accuracy of the model, and the execution time, respectively. Considering the computational time, the first order cost function has been chosen. By testing different gradient descent optimization algorithms, the best optimizer has been selected. The application has been developed using Python programming language.\",\"PeriodicalId\":321907,\"journal\":{\"name\":\"2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SACI51354.2021.9465635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SACI51354.2021.9465635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种适用于语音转文本研究领域的卷积神经网络结构。选择这种类型的网络是因为它具有提取相关特征的能力，并且在分类问题中很受欢迎。为语音到文本的应用程序设计了一个特定的模型。通过适当的实验选择模型的参数(即滤波器和核的大小)和层数，并选择保证最高精度的模型。该模型以语音数字的原始波形作为输入，并输出带有预测数字的文本。无论说话者的性别或年龄如何，该网络都能够提供正确的数字。通过使用Dropout层和早期停止函数避免了过拟合。为了选择最佳模型，作者分别考虑了两个基本标准:模型的准确性和执行时间。考虑到计算时间，选择了一阶代价函数。通过对不同梯度下降优化算法的测试，选出了最优的优化算法。该应用程序是使用Python编程语言开发的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On CNN Applied to Speech-to-Text – Comparative Analysis of Different Gradient Based Optimizers

In this paper the authors have developed a Convolutional Neural Network architecture adapted to Speech-to-Text research field. This type of network has been chosen due to its capacity to extract the relevant features and its popularity in classification problems. A particular model for a Speech-to-Text application has been designed. The parameters of the model (i.e. the size of filters and kernels), and the number of the layers have been chosen by conducting appropriate experiments, and the model that ensured the highest accuracy has been selected. The model takes raw waveforms of spoken digits as input, and outputs a text with the predicted digit. The network is capable of providing the right digit no matter the gender or age of the speaker. The overfitting has been avoided by using Dropout layers and early stopping function. In order to select the best model, the authors have taken into account two basic criteria: the accuracy of the model, and the execution time, respectively. Considering the computational time, the first order cost function has been chosen. By testing different gradient descent optimization algorithms, the best optimizer has been selected. The application has been developed using Python programming language.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)

自引率

0.00%

发文量