基于注意的音素识别方法研究

Yupei Zhang
{"title":"基于注意的音素识别方法研究","authors":"Yupei Zhang","doi":"10.1145/3581807.3581866","DOIUrl":null,"url":null,"abstract":"A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Phoneme Recognition using Attention-based Methods\",\"authors\":\"Yupei Zhang\",\"doi\":\"10.1145/3581807.3581866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581866\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

音素是语言中最小的声音单位。每种语言都有相应的音素。音素识别可以用于基于语音的应用程序,如自动语音识别和口型同步。本文提出了一个端到端的深度学习模型,称为连接时间分类(CTC)和基于注意力的seq2seq网络,该网络由编码器中的一个双GRU层和解码器中的一个GRU层组成,用于识别语音中的音素。在TIMIT数据集上的实验证明了它在其他一些seq2seq网络上的优势,在应用注意机制后,提高了50%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research on Phoneme Recognition using Attention-based Methods
A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信