用于语言识别的无监督音节聚类

S. Dey, H. Murthy
{"title":"用于语言识别的无监督音节聚类","authors":"S. Dey, H. Murthy","doi":"10.5281/ZENODO.52577","DOIUrl":null,"url":null,"abstract":"Automatic Language Recognition makes extensive use of phonotactics for identifying a language. The accuracy of phonotactic information depends upon the amount of data available for training. The state of the art approaches capture the phonotactics in terms of cross-lingual GMM tokens. The accuracy of such tokenisers crucially depends upon the availability of specific corpora. In this paper, we suggest an alternative to GMM tokens, namely, syllable based tokens. Syllables implicitly capture the phonotactics across phonemes in a language. Unsupervised Syllable tokenisation for language identification requires a) segmentation of speech into syllable-like units syllable level, and b) unsupervised modeling of the syllable tokens by Hidden Markov Models. The first issue is addressed by segmenting the wavform into syllable-like units using a well-established group delay based segmentation algorithm. To address the second issue, two different solutions are proposed, namely, (i) a top down clustering approach, which does not require significant parameter tuning, and is also robust, and (ii) a universal syllable approach. In this syllable models for every language are obtained from adapted universal syllable models. Experimental results on the OGI 1992 multilingual corpus and NIST 2003 LRE corpus show that the proposed approaches donot require significant tuning of parameters and the performance is comparable to that of a well-tuned baseline syllable tokenisation system.","PeriodicalId":201182,"journal":{"name":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Unsupervised clustering of syllables for language identification\",\"authors\":\"S. Dey, H. Murthy\",\"doi\":\"10.5281/ZENODO.52577\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Language Recognition makes extensive use of phonotactics for identifying a language. The accuracy of phonotactic information depends upon the amount of data available for training. The state of the art approaches capture the phonotactics in terms of cross-lingual GMM tokens. The accuracy of such tokenisers crucially depends upon the availability of specific corpora. In this paper, we suggest an alternative to GMM tokens, namely, syllable based tokens. Syllables implicitly capture the phonotactics across phonemes in a language. Unsupervised Syllable tokenisation for language identification requires a) segmentation of speech into syllable-like units syllable level, and b) unsupervised modeling of the syllable tokens by Hidden Markov Models. The first issue is addressed by segmenting the wavform into syllable-like units using a well-established group delay based segmentation algorithm. To address the second issue, two different solutions are proposed, namely, (i) a top down clustering approach, which does not require significant parameter tuning, and is also robust, and (ii) a universal syllable approach. In this syllable models for every language are obtained from adapted universal syllable models. Experimental results on the OGI 1992 multilingual corpus and NIST 2003 LRE corpus show that the proposed approaches donot require significant tuning of parameters and the performance is comparable to that of a well-tuned baseline syllable tokenisation system.\",\"PeriodicalId\":201182,\"journal\":{\"name\":\"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5281/ZENODO.52577\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.52577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

自动语言识别广泛使用语音策略来识别语言。语音识别信息的准确性取决于可用于训练的数据量。最先进的方法以跨语言GMM标记的形式捕捉语音战术。这种标记器的准确性关键取决于特定语料库的可用性。在本文中,我们提出了一种替代GMM标记的方法,即基于音节的标记。音节隐含地捕捉语言中跨音素的语音策略。用于语言识别的无监督音节标记化需要a)将语音分割成音节级别的类音节单位,b)通过隐马尔可夫模型对音节标记进行无监督建模。第一个问题是通过使用一种完善的基于群延迟的分割算法将波形分割成音节状单元来解决的。为了解决第二个问题,提出了两种不同的解决方案,即(i)自上而下的聚类方法,该方法不需要显著的参数调整,并且也是鲁棒的,以及(ii)通用音节方法。在这个音节模型中,每种语言的音节模型都是从适应的通用音节模型中得到的。在OGI 1992多语言语料库和NIST 2003 LRE语料库上的实验结果表明,所提出的方法不需要对参数进行重大调整,性能与调优的基线音节标记系统相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Unsupervised clustering of syllables for language identification
Automatic Language Recognition makes extensive use of phonotactics for identifying a language. The accuracy of phonotactic information depends upon the amount of data available for training. The state of the art approaches capture the phonotactics in terms of cross-lingual GMM tokens. The accuracy of such tokenisers crucially depends upon the availability of specific corpora. In this paper, we suggest an alternative to GMM tokens, namely, syllable based tokens. Syllables implicitly capture the phonotactics across phonemes in a language. Unsupervised Syllable tokenisation for language identification requires a) segmentation of speech into syllable-like units syllable level, and b) unsupervised modeling of the syllable tokens by Hidden Markov Models. The first issue is addressed by segmenting the wavform into syllable-like units using a well-established group delay based segmentation algorithm. To address the second issue, two different solutions are proposed, namely, (i) a top down clustering approach, which does not require significant parameter tuning, and is also robust, and (ii) a universal syllable approach. In this syllable models for every language are obtained from adapted universal syllable models. Experimental results on the OGI 1992 multilingual corpus and NIST 2003 LRE corpus show that the proposed approaches donot require significant tuning of parameters and the performance is comparable to that of a well-tuned baseline syllable tokenisation system.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信