改进印度语言识别系统的性能

2014 First International Conference on Computational Systems and Communications (ICCSC) Pub Date : 2014-12-01 DOI:10.1109/COMPSC.2014.7032618

Abitha Anto, K. T. Sreekumar, C. S. Kumar, P. Raj

{"title":"改进印度语言识别系统的性能","authors":"Abitha Anto, K. T. Sreekumar, C. S. Kumar, P. Raj","doi":"10.1109/COMPSC.2014.7032618","DOIUrl":null,"url":null,"abstract":"In this paper, we present the details of a phonotactic language identification (LID) system developed for five Indian languages, English (Indian), Hindi, Malayalam, Tamil and Kan-nada. Since there are no publicly available speech databases for English, Malayalam and Kannada, we developed the database for each of the target languages by downloading the audio files from YouTube videos and removing the non-speech signals manually. The system was tested using a test data set consisting of 40 utterances with duration of 30, 10, and 3 sees, in each of 5 target languages. The performance evaluation was done separately accordingly to the NIST benchmarking sessions, for 30s, 10s and 3s segments separately. For the baseline system, we got an overall EER of 10.41 %, 19.56 % and 31.45 % for 30, 10, and 3 sees segments when tested with a 3-gram language model. The use of 4-gram language model has helped enhance the performance of the LID system to 9.81 %, 19.38 % and 32.77% respectively for 30,10 and 3 sees test segments. Further, by using the n-gram smoothing, we were able to improve the EER of the LID system, 9.02 %, 18.70 % and 29.24 % for 3-gram language models and 8.88 %, 16.46 % and 32.03 % for 4-gram language models, respectively for 30,10, and 3 sec test segments. The study shows that the use of 4-gram language models can help enhance the performance of LID systems for Indian languages.","PeriodicalId":388270,"journal":{"name":"2014 First International Conference on Computational Systems and Communications (ICCSC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Towards improving the performance of language identification system for Indian languages\",\"authors\":\"Abitha Anto, K. T. Sreekumar, C. S. Kumar, P. Raj\",\"doi\":\"10.1109/COMPSC.2014.7032618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present the details of a phonotactic language identification (LID) system developed for five Indian languages, English (Indian), Hindi, Malayalam, Tamil and Kan-nada. Since there are no publicly available speech databases for English, Malayalam and Kannada, we developed the database for each of the target languages by downloading the audio files from YouTube videos and removing the non-speech signals manually. The system was tested using a test data set consisting of 40 utterances with duration of 30, 10, and 3 sees, in each of 5 target languages. The performance evaluation was done separately accordingly to the NIST benchmarking sessions, for 30s, 10s and 3s segments separately. For the baseline system, we got an overall EER of 10.41 %, 19.56 % and 31.45 % for 30, 10, and 3 sees segments when tested with a 3-gram language model. The use of 4-gram language model has helped enhance the performance of the LID system to 9.81 %, 19.38 % and 32.77% respectively for 30,10 and 3 sees test segments. Further, by using the n-gram smoothing, we were able to improve the EER of the LID system, 9.02 %, 18.70 % and 29.24 % for 3-gram language models and 8.88 %, 16.46 % and 32.03 % for 4-gram language models, respectively for 30,10, and 3 sec test segments. The study shows that the use of 4-gram language models can help enhance the performance of LID systems for Indian languages.\",\"PeriodicalId\":388270,\"journal\":{\"name\":\"2014 First International Conference on Computational Systems and Communications (ICCSC)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 First International Conference on Computational Systems and Communications (ICCSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSC.2014.7032618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 First International Conference on Computational Systems and Communications (ICCSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSC.2014.7032618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在本文中，我们详细介绍了为五种印度语言，英语(印度语)，印地语，马拉雅拉姆语，泰米尔语和坎那达语开发的语音语音识别系统。由于没有公开的英语、马拉雅拉姆语和卡纳达语的语音数据库，我们通过从YouTube视频中下载音频文件，并手动删除非语音信号，为每种目标语言开发了数据库。该系统使用一个测试数据集进行测试，该数据集包含5种目标语言中每种语言的40个持续时间为30、10和3个单词的话语。根据NIST基准测试会话，分别对30秒、10秒和3秒段进行性能评估。对于基线系统，当使用3克语言模型进行测试时，我们获得了30,10和3个see片段的总体EER为10.41%，19.56%和31.45%。4克语言模型的使用使LID系统在30、10和3个测试段的性能分别提高了9.81%、19.38%和32.77%。此外，通过使用n-gram平滑，我们能够提高LID系统的EER, 3-gram语言模型的EER分别为9.02%，18.70%和29.24%，4-gram语言模型的EER分别为8.88%，16.46%和32.03%，分别用于30,10和3秒的测试段。该研究表明，使用4克语言模型可以帮助提高印度语言的LID系统的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards improving the performance of language identification system for Indian languages

In this paper, we present the details of a phonotactic language identification (LID) system developed for five Indian languages, English (Indian), Hindi, Malayalam, Tamil and Kan-nada. Since there are no publicly available speech databases for English, Malayalam and Kannada, we developed the database for each of the target languages by downloading the audio files from YouTube videos and removing the non-speech signals manually. The system was tested using a test data set consisting of 40 utterances with duration of 30, 10, and 3 sees, in each of 5 target languages. The performance evaluation was done separately accordingly to the NIST benchmarking sessions, for 30s, 10s and 3s segments separately. For the baseline system, we got an overall EER of 10.41 %, 19.56 % and 31.45 % for 30, 10, and 3 sees segments when tested with a 3-gram language model. The use of 4-gram language model has helped enhance the performance of the LID system to 9.81 %, 19.38 % and 32.77% respectively for 30,10 and 3 sees test segments. Further, by using the n-gram smoothing, we were able to improve the EER of the LID system, 9.02 %, 18.70 % and 29.24 % for 3-gram language models and 8.88 %, 16.46 % and 32.03 % for 4-gram language models, respectively for 30,10, and 3 sec test segments. The study shows that the use of 4-gram language models can help enhance the performance of LID systems for Indian languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 First International Conference on Computational Systems and Communications (ICCSC)

自引率

0.00%

发文量