{"title":"基于自动语音识别的日本国会转录系统","authors":"Tatsuya Kawahara","doi":"10.1609/aaai.v26i2.18962","DOIUrl":null,"url":null,"abstract":"This article describes a new automatic transcription system in the Japanese Parliament which deploys our automatic speech recognition (ASR) technology. To achieve high recognition performance in spontaneous meeting speech, we have investigated an efficient training scheme with minimal supervision which can exploit a huge amount of real data. Specifically, we have proposed a lightly-supervised training scheme based on statistical language model transformation, which fills the gap between faithful transcripts of spoken utterances and final texts for documentation. Once this mapping is trained, we no longer need faithful transcripts for training both acoustic and language models. Instead, we can fully exploit the speech and text data available in Parliament as they are. This scheme also realizes a sustainable ASR system which evolves, i.e. update/re-train the models, only with speech and text generated during the system operation. The ASR system has been deployed in the Japanese Parliament since 2010, and consistently achieved character accuracy of nearly 90%, which is useful for streamlining the transcription process.","PeriodicalId":408078,"journal":{"name":"Conference on Innovative Applications of Artificial Intelligence","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)\",\"authors\":\"Tatsuya Kawahara\",\"doi\":\"10.1609/aaai.v26i2.18962\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article describes a new automatic transcription system in the Japanese Parliament which deploys our automatic speech recognition (ASR) technology. To achieve high recognition performance in spontaneous meeting speech, we have investigated an efficient training scheme with minimal supervision which can exploit a huge amount of real data. Specifically, we have proposed a lightly-supervised training scheme based on statistical language model transformation, which fills the gap between faithful transcripts of spoken utterances and final texts for documentation. Once this mapping is trained, we no longer need faithful transcripts for training both acoustic and language models. Instead, we can fully exploit the speech and text data available in Parliament as they are. This scheme also realizes a sustainable ASR system which evolves, i.e. update/re-train the models, only with speech and text generated during the system operation. The ASR system has been deployed in the Japanese Parliament since 2010, and consistently achieved character accuracy of nearly 90%, which is useful for streamlining the transcription process.\",\"PeriodicalId\":408078,\"journal\":{\"name\":\"Conference on Innovative Applications of Artificial Intelligence\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Innovative Applications of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aaai.v26i2.18962\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Innovative Applications of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v26i2.18962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)
This article describes a new automatic transcription system in the Japanese Parliament which deploys our automatic speech recognition (ASR) technology. To achieve high recognition performance in spontaneous meeting speech, we have investigated an efficient training scheme with minimal supervision which can exploit a huge amount of real data. Specifically, we have proposed a lightly-supervised training scheme based on statistical language model transformation, which fills the gap between faithful transcripts of spoken utterances and final texts for documentation. Once this mapping is trained, we no longer need faithful transcripts for training both acoustic and language models. Instead, we can fully exploit the speech and text data available in Parliament as they are. This scheme also realizes a sustainable ASR system which evolves, i.e. update/re-train the models, only with speech and text generated during the system operation. The ASR system has been deployed in the Japanese Parliament since 2010, and consistently achieved character accuracy of nearly 90%, which is useful for streamlining the transcription process.