半监督双语声学模型训练的语言化

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-12-01 DOI:10.1109/ASRU.2017.8268921

Emre Yilmaz, Mitchell McLaren, H. V. D. Heuvel, D. V. Leeuwen

{"title":"半监督双语声学模型训练的语言化","authors":"Emre Yilmaz, Mitchell McLaren, H. V. D. Heuvel, D. V. Leeuwen","doi":"10.1109/ASRU.2017.8268921","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate several automatic transcription schemes for using raw bilingual broadcast news data in semi-supervised bilingual acoustic model training. Specifically, we compare the transcription quality provided by a bilingual ASR system with another system performing language diarization at the front-end followed by two monolingual ASR systems chosen based on the assigned language label. Our research focuses on the Frisian-Dutch code-switching (CS) speech that is extracted from the archives of a local radio broadcaster. Using 11 hours of manually transcribed Frisian speech as a reference, we aim to increase the amount of available training data by using these automatic transcription techniques. By merging the manually and automatically transcribed data, we learn bilingual acoustic models and run ASR experiments on the development and test data of the FAME! speech corpus to quantify the quality of the automatic transcriptions. Using these acoustic models, we present speech recognition and CS detection accuracies. The results demonstrate that applying language diarization to the raw speech data to enable using the monolingual resources improves the automatic transcription quality compared to a baseline system using a bilingual ASR system.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Language diarization for semi-supervised bilingual acoustic model training\",\"authors\":\"Emre Yilmaz, Mitchell McLaren, H. V. D. Heuvel, D. V. Leeuwen\",\"doi\":\"10.1109/ASRU.2017.8268921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate several automatic transcription schemes for using raw bilingual broadcast news data in semi-supervised bilingual acoustic model training. Specifically, we compare the transcription quality provided by a bilingual ASR system with another system performing language diarization at the front-end followed by two monolingual ASR systems chosen based on the assigned language label. Our research focuses on the Frisian-Dutch code-switching (CS) speech that is extracted from the archives of a local radio broadcaster. Using 11 hours of manually transcribed Frisian speech as a reference, we aim to increase the amount of available training data by using these automatic transcription techniques. By merging the manually and automatically transcribed data, we learn bilingual acoustic models and run ASR experiments on the development and test data of the FAME! speech corpus to quantify the quality of the automatic transcriptions. Using these acoustic models, we present speech recognition and CS detection accuracies. The results demonstrate that applying language diarization to the raw speech data to enable using the monolingual resources improves the automatic transcription quality compared to a baseline system using a bilingual ASR system.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

在本文中，我们研究了几种使用原始双语广播新闻数据进行半监督双语声学模型训练的自动转录方案。具体来说，我们比较了双语ASR系统与另一个系统提供的转录质量，该系统在前端执行语言拨号，然后根据指定的语言标签选择两个单语ASR系统。我们的研究重点是弗里斯兰-荷兰语代码转换(CS)语音，这是从当地广播电台的档案中提取的。使用11个小时的手动转录弗里西亚语语音作为参考，我们的目标是通过使用这些自动转录技术来增加可用的训练数据量。通过合并人工和自动转录的数据，我们学习双语声学模型，并在FAME的开发和测试数据上进行ASR实验!语音语料库的质量量化自动转录。利用这些声学模型，我们提出了语音识别和CS检测的精度。结果表明，与使用双语ASR系统的基线系统相比，对原始语音数据进行语言分类以启用单语资源可以提高自动转录质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Language diarization for semi-supervised bilingual acoustic model training

In this paper, we investigate several automatic transcription schemes for using raw bilingual broadcast news data in semi-supervised bilingual acoustic model training. Specifically, we compare the transcription quality provided by a bilingual ASR system with another system performing language diarization at the front-end followed by two monolingual ASR systems chosen based on the assigned language label. Our research focuses on the Frisian-Dutch code-switching (CS) speech that is extracted from the archives of a local radio broadcaster. Using 11 hours of manually transcribed Frisian speech as a reference, we aim to increase the amount of available training data by using these automatic transcription techniques. By merging the manually and automatically transcribed data, we learn bilingual acoustic models and run ASR experiments on the development and test data of the FAME! speech corpus to quantify the quality of the automatic transcriptions. Using these acoustic models, we present speech recognition and CS detection accuracies. The results demonstrate that applying language diarization to the raw speech data to enable using the monolingual resources improves the automatic transcription quality compared to a baseline system using a bilingual ASR system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量