A grapheme-based method for automatic alignment of speech and text data

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI:10.1109/SLT.2012.6424237

Adriana Stan, P. Bell, Simon King

引用次数: 39

Abstract

This paper introduces a method for automatic alignment of speech data with unsynchronised, imperfect transcripts, for a domain where no initial acoustic models are available. Using grapheme-based acoustic models, word skip networks and orthographic speech transcripts, we are able to harvest 55% of the speech with a 93% utterance-level accuracy and 99% word accuracy for the produced transcriptions. The work is based on the assumption that there is a high degree of correspondence between the speech and text, and that a full transcription of all of the speech is not required. The method is language independent and the only prior knowledge and resources required are the speech and text transcripts, and a few minor user interventions.

查看原文本刊更多论文

基于字素的语音和文本数据自动对齐方法

本文介绍了一种自动对齐语音数据与不同步，不完美的转录本的方法，在没有初始声学模型可用的领域。使用基于字素的声学模型、单词跳过网络和正字法语音转录本，我们能够收获55%的语音，产生的转录文本具有93%的话语级精度和99%的单词精度。这项工作是基于语音和文本之间高度对应的假设，并且不需要所有语音的完整转录。该方法是语言独立的，唯一需要的先验知识和资源是语音和文本文本，以及一些次要的用户干预。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量