基于字素的语音和文本数据自动对齐方法

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI:10.1109/SLT.2012.6424237

Adriana Stan, P. Bell, Simon King

{"title":"基于字素的语音和文本数据自动对齐方法","authors":"Adriana Stan, P. Bell, Simon King","doi":"10.1109/SLT.2012.6424237","DOIUrl":null,"url":null,"abstract":"This paper introduces a method for automatic alignment of speech data with unsynchronised, imperfect transcripts, for a domain where no initial acoustic models are available. Using grapheme-based acoustic models, word skip networks and orthographic speech transcripts, we are able to harvest 55% of the speech with a 93% utterance-level accuracy and 99% word accuracy for the produced transcriptions. The work is based on the assumption that there is a high degree of correspondence between the speech and text, and that a full transcription of all of the speech is not required. The method is language independent and the only prior knowledge and resources required are the speech and text transcripts, and a few minor user interventions.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"A grapheme-based method for automatic alignment of speech and text data\",\"authors\":\"Adriana Stan, P. Bell, Simon King\",\"doi\":\"10.1109/SLT.2012.6424237\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a method for automatic alignment of speech data with unsynchronised, imperfect transcripts, for a domain where no initial acoustic models are available. Using grapheme-based acoustic models, word skip networks and orthographic speech transcripts, we are able to harvest 55% of the speech with a 93% utterance-level accuracy and 99% word accuracy for the produced transcriptions. The work is based on the assumption that there is a high degree of correspondence between the speech and text, and that a full transcription of all of the speech is not required. The method is language independent and the only prior knowledge and resources required are the speech and text transcripts, and a few minor user interventions.\",\"PeriodicalId\":375378,\"journal\":{\"name\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2012.6424237\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

摘要

本文介绍了一种自动对齐语音数据与不同步，不完美的转录本的方法，在没有初始声学模型可用的领域。使用基于字素的声学模型、单词跳过网络和正字法语音转录本，我们能够收获55%的语音，产生的转录文本具有93%的话语级精度和99%的单词精度。这项工作是基于语音和文本之间高度对应的假设，并且不需要所有语音的完整转录。该方法是语言独立的，唯一需要的先验知识和资源是语音和文本文本，以及一些次要的用户干预。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A grapheme-based method for automatic alignment of speech and text data

This paper introduces a method for automatic alignment of speech data with unsynchronised, imperfect transcripts, for a domain where no initial acoustic models are available. Using grapheme-based acoustic models, word skip networks and orthographic speech transcripts, we are able to harvest 55% of the speech with a 93% utterance-level accuracy and 99% word accuracy for the produced transcriptions. The work is based on the assumption that there is a high degree of correspondence between the speech and text, and that a full transcription of all of the speech is not required. The method is language independent and the only prior knowledge and resources required are the speech and text transcripts, and a few minor user interventions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量