CNN在语音转文本中的应用——不同梯度优化器的比较分析

Theodora Gaiceanu, O. Pastravanu
{"title":"CNN在语音转文本中的应用——不同梯度优化器的比较分析","authors":"Theodora Gaiceanu, O. Pastravanu","doi":"10.1109/SACI51354.2021.9465635","DOIUrl":null,"url":null,"abstract":"In this paper the authors have developed a Convolutional Neural Network architecture adapted to Speech-to-Text research field. This type of network has been chosen due to its capacity to extract the relevant features and its popularity in classification problems. A particular model for a Speech-to-Text application has been designed. The parameters of the model (i.e. the size of filters and kernels), and the number of the layers have been chosen by conducting appropriate experiments, and the model that ensured the highest accuracy has been selected. The model takes raw waveforms of spoken digits as input, and outputs a text with the predicted digit. The network is capable of providing the right digit no matter the gender or age of the speaker. The overfitting has been avoided by using Dropout layers and early stopping function. In order to select the best model, the authors have taken into account two basic criteria: the accuracy of the model, and the execution time, respectively. Considering the computational time, the first order cost function has been chosen. By testing different gradient descent optimization algorithms, the best optimizer has been selected. The application has been developed using Python programming language.","PeriodicalId":321907,"journal":{"name":"2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On CNN Applied to Speech-to-Text – Comparative Analysis of Different Gradient Based Optimizers\",\"authors\":\"Theodora Gaiceanu, O. Pastravanu\",\"doi\":\"10.1109/SACI51354.2021.9465635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper the authors have developed a Convolutional Neural Network architecture adapted to Speech-to-Text research field. This type of network has been chosen due to its capacity to extract the relevant features and its popularity in classification problems. A particular model for a Speech-to-Text application has been designed. The parameters of the model (i.e. the size of filters and kernels), and the number of the layers have been chosen by conducting appropriate experiments, and the model that ensured the highest accuracy has been selected. The model takes raw waveforms of spoken digits as input, and outputs a text with the predicted digit. The network is capable of providing the right digit no matter the gender or age of the speaker. The overfitting has been avoided by using Dropout layers and early stopping function. In order to select the best model, the authors have taken into account two basic criteria: the accuracy of the model, and the execution time, respectively. Considering the computational time, the first order cost function has been chosen. By testing different gradient descent optimization algorithms, the best optimizer has been selected. The application has been developed using Python programming language.\",\"PeriodicalId\":321907,\"journal\":{\"name\":\"2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SACI51354.2021.9465635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SACI51354.2021.9465635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文提出了一种适用于语音转文本研究领域的卷积神经网络结构。选择这种类型的网络是因为它具有提取相关特征的能力,并且在分类问题中很受欢迎。为语音到文本的应用程序设计了一个特定的模型。通过适当的实验选择模型的参数(即滤波器和核的大小)和层数,并选择保证最高精度的模型。该模型以语音数字的原始波形作为输入,并输出带有预测数字的文本。无论说话者的性别或年龄如何,该网络都能够提供正确的数字。通过使用Dropout层和早期停止函数避免了过拟合。为了选择最佳模型,作者分别考虑了两个基本标准:模型的准确性和执行时间。考虑到计算时间,选择了一阶代价函数。通过对不同梯度下降优化算法的测试,选出了最优的优化算法。该应用程序是使用Python编程语言开发的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On CNN Applied to Speech-to-Text – Comparative Analysis of Different Gradient Based Optimizers
In this paper the authors have developed a Convolutional Neural Network architecture adapted to Speech-to-Text research field. This type of network has been chosen due to its capacity to extract the relevant features and its popularity in classification problems. A particular model for a Speech-to-Text application has been designed. The parameters of the model (i.e. the size of filters and kernels), and the number of the layers have been chosen by conducting appropriate experiments, and the model that ensured the highest accuracy has been selected. The model takes raw waveforms of spoken digits as input, and outputs a text with the predicted digit. The network is capable of providing the right digit no matter the gender or age of the speaker. The overfitting has been avoided by using Dropout layers and early stopping function. In order to select the best model, the authors have taken into account two basic criteria: the accuracy of the model, and the execution time, respectively. Considering the computational time, the first order cost function has been chosen. By testing different gradient descent optimization algorithms, the best optimizer has been selected. The application has been developed using Python programming language.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信