V. Spoorthy, Veena Thenkanidiyoor, Dileep Aroor Dinesh
{"title":"SVM Based Language Diarization for Code-Switched Bilingual Indian Speech Using Bottleneck Features","authors":"V. Spoorthy, Veena Thenkanidiyoor, Dileep Aroor Dinesh","doi":"10.21437/SLTU.2018-28","DOIUrl":null,"url":null,"abstract":"This paper proposes an SVM-based language diarizer for code-switched bilingual Indian speech. Code-switching corresponds to usage of more than one language within a single utterance. Language diarization involves identifying code-switch points in an utterance and segmenting it into homogeneous language segments. This is very important for Indian context because every Indian is at least bilingual and code-switching is inevitable. For building an effective language diarizer, it is helpful to consider phonotactic features. In this work, we propose to consider bottleneck features for language diarization. Bottleneck features correspond to output of a narrow hidden layer of a multilayer neural network trained to perform phone state classification. The studies conducted using the standard datasets have shown the effectiveness of the proposed approach.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Spoken Language Technologies for Under-resourced Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SLTU.2018-28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper proposes an SVM-based language diarizer for code-switched bilingual Indian speech. Code-switching corresponds to usage of more than one language within a single utterance. Language diarization involves identifying code-switch points in an utterance and segmenting it into homogeneous language segments. This is very important for Indian context because every Indian is at least bilingual and code-switching is inevitable. For building an effective language diarizer, it is helpful to consider phonotactic features. In this work, we propose to consider bottleneck features for language diarization. Bottleneck features correspond to output of a narrow hidden layer of a multilayer neural network trained to perform phone state classification. The studies conducted using the standard datasets have shown the effectiveness of the proposed approach.