Issues in Sub-Utterance Level Language Identification in a Code Switched Bilingual Scenario

Jagabandhu Mishra, Joshitha Gandra, Vaishnavi Patil, S. Prasanna
{"title":"Issues in Sub-Utterance Level Language Identification in a Code Switched Bilingual Scenario","authors":"Jagabandhu Mishra, Joshitha Gandra, Vaishnavi Patil, S. Prasanna","doi":"10.1109/SPCOM55316.2022.9840813","DOIUrl":null,"url":null,"abstract":"Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS utterance, a single speaker speaks both the languages. Hence the phoneme-level acoustic characteristic (sub-segmental and segmental evidence) of the secondary language is mostly biased towards the primary. This hypothesizes that the acoustic-based language identification system using CS training data may end with a biased performance towards the primary language. This study proves the hypothesis by observing the performance in terms of the confusion matrix of the earlier proposed approaches. At the same time, language discrimination also can be done at the suprasegmental-level, by capturing language-specific phonemic temporal evidence. Hence, to resolve the biasing issue, this study proposes a wav2vec2-based approach, which captures suprasegmental phonemic temporal patterns in the pre-training stage and merges it to capture language-specific suprasegmental evidence in the finetuning stage. The experimental results show the proposed approach is able to resolve the issue to some extent. As the fine-tuning stage uses a discriminative approach, the weighted loss and secondary language augmentation methods can be explored in the future for further performance improvement. Index Terms: Code switched (CS) bilingual speech, Sub-utterance level language identification (SLID), wav2vec2, Deepspeech2.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS utterance, a single speaker speaks both the languages. Hence the phoneme-level acoustic characteristic (sub-segmental and segmental evidence) of the secondary language is mostly biased towards the primary. This hypothesizes that the acoustic-based language identification system using CS training data may end with a biased performance towards the primary language. This study proves the hypothesis by observing the performance in terms of the confusion matrix of the earlier proposed approaches. At the same time, language discrimination also can be done at the suprasegmental-level, by capturing language-specific phonemic temporal evidence. Hence, to resolve the biasing issue, this study proposes a wav2vec2-based approach, which captures suprasegmental phonemic temporal patterns in the pre-training stage and merges it to capture language-specific suprasegmental evidence in the finetuning stage. The experimental results show the proposed approach is able to resolve the issue to some extent. As the fine-tuning stage uses a discriminative approach, the weighted loss and secondary language augmentation methods can be explored in the future for further performance improvement. Index Terms: Code switched (CS) bilingual speech, Sub-utterance level language identification (SLID), wav2vec2, Deepspeech2.
语码转换双语情境下的次话语层语言识别问题
语次层次语言识别是在语次层次上对语码转换语进行自动识别的过程。CS话语的性质表明,第一语言的出现时间比第二语言长得多。在CS话语中,一个人同时说两种语言。因此,第二语言的音素水平声学特征(亚音段和音段证据)大多偏向于母语。这假设使用CS训练数据的基于声学的语言识别系统可能会以偏向主要语言的表现结束。本研究通过观察先前提出的方法在混淆矩阵方面的性能来证明这一假设。与此同时,语言歧视也可以在超分段水平上进行,通过捕捉语言特定的音位时间证据。因此,为了解决偏置问题,本研究提出了一种基于wav2vec2的方法,该方法在预训练阶段捕获超分段音位时间模式,并在微调阶段将其合并以捕获特定语言的超分段证据。实验结果表明,该方法在一定程度上解决了这一问题。由于微调阶段使用了判别方法,因此可以在未来探索加权损失和辅助语言增强方法以进一步提高性能。索引术语:代码切换(CS)双语语音,次话语级语言识别(slide), wav2vec2, Deepspeech2。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信