端到端日语多方言语音识别与多任务学习的方言识别

IF 3.2 Q1 Computer Science
Ryo Imaizumi, Ryo Masumura, Sayaka Shiota, H. Kiya
{"title":"端到端日语多方言语音识别与多任务学习的方言识别","authors":"Ryo Imaizumi, Ryo Masumura, Sayaka Shiota, H. Kiya","doi":"10.1561/116.00000045","DOIUrl":null,"url":null,"abstract":"End-to-end systems have demonstrated state-of-the-art performance on many tasks related to automatic speech recognition (ASR) and dialect identification (DID). In this paper, we propose multi-task learning of Japanese DID and multi-dialect ASR (MD-ASR) systems with end-to-end models. Since Japanese dialects have variety in both linguistic and acoustic aspects of each dialect, Japanese DID requires simultaneously considering linguistic and acoustic features. One solution realizing Japanese DID using these features is to use transcriptions from ASR when performing DID. However, transcribing Japanese multi-dialect speech into text is regarded as a challenging task in ASR because there are big gaps in linguistic and acoustic features between a dialect and standard Japanese. One solution is dialect-aware ASR modeling, which means DID is performed with ASR. Therefore, the multi-task learning framework of Japanese DID and ASR is proposed to represent the dependency of them. We explore three systems as part of the proposed framework, changing the order in which DID and ASR are performed. In the experiments, Japanese multi-dialect ASR and DID tests were conducted on our home-made Japanese multi-dialect database and a standard Japanese database. The proposed transformer-based systems outperformed the conventional single task systems on both DID and ASR tests.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":"1 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning\",\"authors\":\"Ryo Imaizumi, Ryo Masumura, Sayaka Shiota, H. Kiya\",\"doi\":\"10.1561/116.00000045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"End-to-end systems have demonstrated state-of-the-art performance on many tasks related to automatic speech recognition (ASR) and dialect identification (DID). In this paper, we propose multi-task learning of Japanese DID and multi-dialect ASR (MD-ASR) systems with end-to-end models. Since Japanese dialects have variety in both linguistic and acoustic aspects of each dialect, Japanese DID requires simultaneously considering linguistic and acoustic features. One solution realizing Japanese DID using these features is to use transcriptions from ASR when performing DID. However, transcribing Japanese multi-dialect speech into text is regarded as a challenging task in ASR because there are big gaps in linguistic and acoustic features between a dialect and standard Japanese. One solution is dialect-aware ASR modeling, which means DID is performed with ASR. Therefore, the multi-task learning framework of Japanese DID and ASR is proposed to represent the dependency of them. We explore three systems as part of the proposed framework, changing the order in which DID and ASR are performed. In the experiments, Japanese multi-dialect ASR and DID tests were conducted on our home-made Japanese multi-dialect database and a standard Japanese database. The proposed transformer-based systems outperformed the conventional single task systems on both DID and ASR tests.\",\"PeriodicalId\":44812,\"journal\":{\"name\":\"APSIPA Transactions on Signal and Information Processing\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"APSIPA Transactions on Signal and Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1561/116.00000045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"APSIPA Transactions on Signal and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1561/116.00000045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 8

摘要

端到端系统已经在许多与自动语音识别(ASR)和方言识别(DID)相关的任务中展示了最先进的性能。本文提出了基于端到端模型的日语DID和多方言ASR (MD-ASR)系统的多任务学习。由于日语方言在语言和声学方面各不相同,因此日语DID需要同时考虑语言和声学特征。使用这些功能实现日语DID的一个解决方案是在执行DID时使用来自ASR的转录。然而,由于日语多方言语音与标准日语在语言和声学特征上存在很大差异,将日语多方言语音转录成文本在ASR中被认为是一项具有挑战性的任务。一种解决方案是方言感知的ASR建模,这意味着DID是用ASR执行的。因此,本文提出了日语DID和ASR的多任务学习框架来表示它们之间的依赖关系。我们探索了三个系统作为拟议框架的一部分,改变了DID和ASR的执行顺序。实验中,在自制的日语多方言数据库和标准日语数据库上进行了日语多方言ASR和DID测试。所提出的基于变压器的系统在DID和ASR测试中都优于传统的单任务系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
End-to-end systems have demonstrated state-of-the-art performance on many tasks related to automatic speech recognition (ASR) and dialect identification (DID). In this paper, we propose multi-task learning of Japanese DID and multi-dialect ASR (MD-ASR) systems with end-to-end models. Since Japanese dialects have variety in both linguistic and acoustic aspects of each dialect, Japanese DID requires simultaneously considering linguistic and acoustic features. One solution realizing Japanese DID using these features is to use transcriptions from ASR when performing DID. However, transcribing Japanese multi-dialect speech into text is regarded as a challenging task in ASR because there are big gaps in linguistic and acoustic features between a dialect and standard Japanese. One solution is dialect-aware ASR modeling, which means DID is performed with ASR. Therefore, the multi-task learning framework of Japanese DID and ASR is proposed to represent the dependency of them. We explore three systems as part of the proposed framework, changing the order in which DID and ASR are performed. In the experiments, Japanese multi-dialect ASR and DID tests were conducted on our home-made Japanese multi-dialect database and a standard Japanese database. The proposed transformer-based systems outperformed the conventional single task systems on both DID and ASR tests.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
APSIPA Transactions on Signal and Information Processing
APSIPA Transactions on Signal and Information Processing ENGINEERING, ELECTRICAL & ELECTRONIC-
CiteScore
8.60
自引率
6.20%
发文量
30
审稿时长
40 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信