A method of phonemic annotation for Chinese dialects based on a deep learning model with adaptive temporal attention and a feature disentangling structure

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Bowen Jiang , Qianhui Dong , Guojin Liu
{"title":"A method of phonemic annotation for Chinese dialects based on a deep learning model with adaptive temporal attention and a feature disentangling structure","authors":"Bowen Jiang ,&nbsp;Qianhui Dong ,&nbsp;Guojin Liu","doi":"10.1016/j.csl.2024.101624","DOIUrl":null,"url":null,"abstract":"<div><p>Phonemic annotation is aimed at annotating a speech fragment with phonemic symbols. As the phonetic features of a speech fragment vary greatly among different languages including their dialects, it is a significant way to describe and write down the phonetic system of a language utilizing phonemic symbols. It is meaningful to develop an automatic and effective method for this task. In this paper, we first establish a Chinese dataset where each datum consists of an original speech signal and the corresponding phonemic characters which are annotated manually. Furthermore, we propose a deep learning model to realize automatic phonemic annotation for speech fragments spoken in diverse Chinese dialects. The overall structure of the model is a many-to-many deep bi-directional gated recurrent unit (GRU) network, and an adaptive temporal attention mechanism is applied to communicate the encoder and decoder modules to prevent any loss of features adaptively. Meanwhile, a feature disentangling structure based on a generative adversarial network (GAN) is adopted to attenuate the interference towards the phonemic annotation task caused by unrelated tone features in the original speech signal and further improve the phonemic annotation performance. Extensive experimental results have verified the superiority of our model and proposed strategies over the utilized dataset.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S088523082400007X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Phonemic annotation is aimed at annotating a speech fragment with phonemic symbols. As the phonetic features of a speech fragment vary greatly among different languages including their dialects, it is a significant way to describe and write down the phonetic system of a language utilizing phonemic symbols. It is meaningful to develop an automatic and effective method for this task. In this paper, we first establish a Chinese dataset where each datum consists of an original speech signal and the corresponding phonemic characters which are annotated manually. Furthermore, we propose a deep learning model to realize automatic phonemic annotation for speech fragments spoken in diverse Chinese dialects. The overall structure of the model is a many-to-many deep bi-directional gated recurrent unit (GRU) network, and an adaptive temporal attention mechanism is applied to communicate the encoder and decoder modules to prevent any loss of features adaptively. Meanwhile, a feature disentangling structure based on a generative adversarial network (GAN) is adopted to attenuate the interference towards the phonemic annotation task caused by unrelated tone features in the original speech signal and further improve the phonemic annotation performance. Extensive experimental results have verified the superiority of our model and proposed strategies over the utilized dataset.

基于具有自适应时空注意力的深度学习模型和特征分解结构的汉语方言音位标注方法
音位标注的目的是用音位符号标注语音片段。由于不同语言(包括方言)语音片段的语音特征差异很大,因此利用音位符号来描述和记录一种语言的语音系统是一种重要的方法。为音位标注任务开发一种自动有效的方法很有意义。在本文中,我们首先利用了一个中文数据集,其中每个数据都由原始语音信号和相应的人工标注音位字符组成。此外,我们还提出了一种深度学习模型,以实现对不同汉语方言语音片段的音位标注。该模型的整体结构是一个多对多的深度双向门控递归单元(GRU)网络,并采用自适应时间注意力机制来沟通编码器和解码器模块,以自适应地防止任何特征丢失。同时,还采用了基于生成对抗网络(GAN)的特征分离结构,以减弱原始语音片段中不相关的音调特征对音位标注任务的干扰,进一步提高音位标注性能。广泛的实验结果验证了我们的模型和建议的策略在所使用的数据集上的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信