Issues in Sub-Utterance Level Language Identification in a Code Switched Bilingual Scenario

2022 IEEE International Conference on Signal Processing and Communications (SPCOM) Pub Date : 2022-07-11 DOI:10.1109/SPCOM55316.2022.9840813

Jagabandhu Mishra, Joshitha Gandra, Vaishnavi Patil, S. Prasanna

{"title":"Issues in Sub-Utterance Level Language Identification in a Code Switched Bilingual Scenario","authors":"Jagabandhu Mishra, Joshitha Gandra, Vaishnavi Patil, S. Prasanna","doi":"10.1109/SPCOM55316.2022.9840813","DOIUrl":null,"url":null,"abstract":"Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS utterance, a single speaker speaks both the languages. Hence the phoneme-level acoustic characteristic (sub-segmental and segmental evidence) of the secondary language is mostly biased towards the primary. This hypothesizes that the acoustic-based language identification system using CS training data may end with a biased performance towards the primary language. This study proves the hypothesis by observing the performance in terms of the confusion matrix of the earlier proposed approaches. At the same time, language discrimination also can be done at the suprasegmental-level, by capturing language-specific phonemic temporal evidence. Hence, to resolve the biasing issue, this study proposes a wav2vec2-based approach, which captures suprasegmental phonemic temporal patterns in the pre-training stage and merges it to capture language-specific suprasegmental evidence in the finetuning stage. The experimental results show the proposed approach is able to resolve the issue to some extent. As the fine-tuning stage uses a discriminative approach, the weighted loss and secondary language augmentation methods can be explored in the future for further performance improvement. Index Terms: Code switched (CS) bilingual speech, Sub-utterance level language identification (SLID), wav2vec2, Deepspeech2.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS utterance, a single speaker speaks both the languages. Hence the phoneme-level acoustic characteristic (sub-segmental and segmental evidence) of the secondary language is mostly biased towards the primary. This hypothesizes that the acoustic-based language identification system using CS training data may end with a biased performance towards the primary language. This study proves the hypothesis by observing the performance in terms of the confusion matrix of the earlier proposed approaches. At the same time, language discrimination also can be done at the suprasegmental-level, by capturing language-specific phonemic temporal evidence. Hence, to resolve the biasing issue, this study proposes a wav2vec2-based approach, which captures suprasegmental phonemic temporal patterns in the pre-training stage and merges it to capture language-specific suprasegmental evidence in the finetuning stage. The experimental results show the proposed approach is able to resolve the issue to some extent. As the fine-tuning stage uses a discriminative approach, the weighted loss and secondary language augmentation methods can be explored in the future for further performance improvement. Index Terms: Code switched (CS) bilingual speech, Sub-utterance level language identification (SLID), wav2vec2, Deepspeech2.

查看原文本刊更多论文

语码转换双语情境下的次话语层语言识别问题

语次层次语言识别是在语次层次上对语码转换语进行自动识别的过程。CS话语的性质表明，第一语言的出现时间比第二语言长得多。在CS话语中，一个人同时说两种语言。因此，第二语言的音素水平声学特征(亚音段和音段证据)大多偏向于母语。这假设使用CS训练数据的基于声学的语言识别系统可能会以偏向主要语言的表现结束。本研究通过观察先前提出的方法在混淆矩阵方面的性能来证明这一假设。与此同时，语言歧视也可以在超分段水平上进行，通过捕捉语言特定的音位时间证据。因此，为了解决偏置问题，本研究提出了一种基于wav2vec2的方法，该方法在预训练阶段捕获超分段音位时间模式，并在微调阶段将其合并以捕获特定语言的超分段证据。实验结果表明，该方法在一定程度上解决了这一问题。由于微调阶段使用了判别方法，因此可以在未来探索加权损失和辅助语言增强方法以进一步提高性能。索引术语:代码切换(CS)双语语音，次话语级语言识别(slide)， wav2vec2, Deepspeech2。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Signal Processing and Communications (SPCOM)

自引率

0.00%

发文量