Spoken Language Diarization Using an Attention based Neural Network

2021 National Conference on Communications (NCC) Pub Date : 2021-07-27 DOI:10.1109/NCC52529.2021.9530035

Jagabandhu Mishra, Ayush Agarwal, S. Prasanna

引用次数: 8

Abstract

Spoken language diarization (SLD) is a task to perform automatic segmentation and labeling of the languages present in a given code-switched speech utterance. Inspiring from the way humans perform SLD (i.e capturing the language specific long term information), this work has proposed an acoustic-phonetic approach to perform SLD. This acoustic phonetic approach consists of an attention based neural network modelling to capture the language specific information and a Gaussian smoothing approach to locate the language change points. From the experimental study, it has been observed that the proposed approach performs better when dealing with code-switched segment containing monolingual segments of longer duration. However, the performance of the approach decreases with decrease in the monolingual segment duration. This issue poses a challenge in the further exploration of the proposed approach.

查看原文本刊更多论文

基于注意神经网络的口语辨析

语音分类(SLD)是对给定的语码转换语音中存在的语言进行自动分割和标记的一项任务。受人类执行特殊语言学习的方式(即捕获语言特定的长期信息)的启发，本工作提出了一种声学-语音方法来执行特殊语言学习。这种声学语音方法包括基于注意的神经网络建模来捕获语言特定信息和高斯平滑方法来定位语言变化点。从实验研究中可以观察到，该方法在处理包含较长持续时间的单语片段的代码切换片段时表现更好。然而，该方法的性能随着单语段持续时间的减少而下降。这个问题对进一步探索所提出的方法提出了挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 National Conference on Communications (NCC)

自引率

0.00%

发文量