Dynamic language modeling for a daily broadcast news transcription system

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI:10.1109/ASRU.2007.4430103

Ciro Martins, A. Teixeira, J. Neto

引用次数: 34

Abstract

When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.

查看原文本刊更多论文

每日广播新闻转录系统的动态语言建模

在用高屈折语转录广播新闻数据时，词汇量的增长导致了高词汇外率。为了解决这个问题，我们提出了一种每日无监督自适应方法，该方法在多通道语音识别过程中动态地使活动词汇和LM适应当前新闻片段的主题。基于Web上每天可用的文本，使用形态-句法技术选择基于故事的词汇表。使用信息检索引擎，从大型语料库中提取相关文档，生成基于故事的LM。实验进行了欧洲葡萄牙语BN转录系统。初步结果显示，OOV相对降低65.2%，WER相对降低6.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

自引率

0.00%

发文量