SVM Based Language Diarization for Code-Switched Bilingual Indian Speech Using Bottleneck Features

Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI:10.21437/SLTU.2018-28

V. Spoorthy, Veena Thenkanidiyoor, Dileep Aroor Dinesh

引用次数: 8

Abstract

This paper proposes an SVM-based language diarizer for code-switched bilingual Indian speech. Code-switching corresponds to usage of more than one language within a single utterance. Language diarization involves identifying code-switch points in an utterance and segmenting it into homogeneous language segments. This is very important for Indian context because every Indian is at least bilingual and code-switching is inevitable. For building an effective language diarizer, it is helpful to consider phonotactic features. In this work, we propose to consider bottleneck features for language diarization. Bottleneck features correspond to output of a narrow hidden layer of a multilayer neural network trained to perform phone state classiﬁcation. The studies conducted using the standard datasets have shown the effectiveness of the proposed approach.

查看原文本刊更多论文

基于瓶颈特征的码交换双语印度语支持向量机语言分类

本文提出了一种基于支持向量机的编码切换双语印度语日记器。语码转换是指在一个话语中使用一种以上的语言。语言分割是指识别话语中的语码转换点，并将其分割成同质的语言片段。这对印度语境来说非常重要，因为每个印度人至少都会说两种语言，代码转换是不可避免的。为了建立一个有效的语言日记，考虑语音特征是有帮助的。在这项工作中，我们建议考虑语言化的瓶颈特征。瓶颈特征对应于多层神经网络的窄隐藏层的输出，用于执行电话状态分类。使用标准数据集进行的研究显示了所提出方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Spoken Language Technologies for Under-resourced Languages

自引率

0.00%

发文量