Automatic title generation for Chinese spoken documents with a delicate scored Viterbi algorithm

2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI:10.1109/SLT.2008.4777866

Sheng-yi Kong, Chien-Chih Wang, Ko-chien Kuo, Lin-Shan Lee

引用次数: 8

Abstract

Automatic title generation for spoken documents is believed to be an important key for browsing and navigation over huge quantities of multimedia content. A new framework of automatic title generation for Chinese spoken documents is proposed in this paper using a delicate scored Viterbi algorithm performed over automatically generated text summaries of the testing spoken documents. The Viterbi beam search is guided by a delicate score evaluated from three sets of models: term selection model tells the most suitable terms to be included in the title, term ordering model gives the best ordering of the terms to make the title readable, and title length model tells the reasonable length of the title. The models are trained from a training corpus which is not required to be matched with the testing spoken documents. Both objective evaluation based on F1 measure and subjective human evaluation for relevance and readability indicated the approach is very attractive.

查看原文本刊更多论文

基于精细评分维特比算法的中文口语文档自动标题生成

语音文档的自动标题生成被认为是浏览和导航大量多媒体内容的重要关键。本文提出了一种新的中文口语文档标题自动生成框架，该框架采用精细评分Viterbi算法对测试口语文档自动生成的文本摘要进行处理。Viterbi束搜索由三组模型的精细评分指导:术语选择模型告诉标题中包含最合适的术语，术语排序模型给出术语的最佳排序以使标题可读，标题长度模型告诉标题的合理长度。这些模型是从一个训练语料库中训练出来的，这个语料库不需要与测试口语文档相匹配。无论是基于F1测度的客观评价，还是对相关性和可读性的主观评价，都表明该方法非常有吸引力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE Spoken Language Technology Workshop

自引率

0.00%

发文量