One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization

ACM Trans. Speech Lang. Process. Pub Date : 2006-07-01 DOI:10.1145/1149290.1151099

Pascale Fung, G. Ngai

{"title":"One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization","authors":"Pascale Fung, G. Ngai","doi":"10.1145/1149290.1151099","DOIUrl":null,"url":null,"abstract":"This article presents a multidocument, multilingual, theme-based summarization system based on modeling text cohesion (story flow). Conventional extractive summarization systems which pick out salient sentences to include in a summary often disregard any flow or sequence that might exist between these sentences. We argue that such inherent text cohesion exists and is (1) specific to a particular story and (2) specific to a particular language. Documents within the same story, and in the same language, share a common story flow, and this flow differs across stories, and across languages. We propose using Hidden Markov Models (HMMs) as story models. An unsupervised segmental K-means method is used to iteratively cluster multiple documents into different topics (stories) and learn the parameters of parallel Hidden Markov Story Models (HMSM), one for each story. We compare story models within and across stories and within and across languages (English and Chinese). The experimental results support our “one story, one flow” and “one language, one flow” hypotheses. We also propose a Naïve Bayes classifier for document summarization. The performance of our summarizer is superior to conventional methods that do not incorporate text cohesion information. Our HMSM method also provides a simple way to compile a single metasummary for multiple documents from individual summaries via state labeled sentences.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Speech Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1149290.1151099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 63

Abstract

This article presents a multidocument, multilingual, theme-based summarization system based on modeling text cohesion (story flow). Conventional extractive summarization systems which pick out salient sentences to include in a summary often disregard any flow or sequence that might exist between these sentences. We argue that such inherent text cohesion exists and is (1) specific to a particular story and (2) specific to a particular language. Documents within the same story, and in the same language, share a common story flow, and this flow differs across stories, and across languages. We propose using Hidden Markov Models (HMMs) as story models. An unsupervised segmental K-means method is used to iteratively cluster multiple documents into different topics (stories) and learn the parameters of parallel Hidden Markov Story Models (HMSM), one for each story. We compare story models within and across stories and within and across languages (English and Chinese). The experimental results support our “one story, one flow” and “one language, one flow” hypotheses. We also propose a Naïve Bayes classifier for document summarization. The performance of our summarizer is superior to conventional methods that do not incorporate text cohesion information. Our HMSM method also provides a simple way to compile a single metasummary for multiple documents from individual summaries via state labeled sentences.

查看原文本刊更多论文

一个故事，一个流程:多语言多文档摘要的隐马尔可夫故事模型

本文提出了一个基于文本衔接(故事流)建模的多文档、多语言、基于主题的摘要系统。传统的提取摘要系统挑选出突出的句子包括在摘要中，往往忽略了这些句子之间可能存在的任何流程或顺序。我们认为这种内在的语篇衔接是存在的，并且:(1)特定于一个特定的故事，(2)特定于一种特定的语言。相同故事中使用相同语言的文档共享一个共同的故事流程，而这个流程在不同的故事和不同的语言中是不同的。我们建议使用隐马尔可夫模型(hmm)作为故事模型。使用无监督分段K-means方法将多个文档迭代聚类到不同的主题(故事)中，并学习并行隐马尔可夫故事模型(HMSM)的参数，每个故事一个。我们比较故事内部和故事之间的故事模型，以及不同语言(英语和汉语)之间的故事模型。实验结果支持我们的“一个故事，一种流程”和“一种语言，一种流程”假设。我们还提出了一个Naïve贝叶斯分类器用于文档摘要。我们的摘要器的性能优于不包含文本衔接信息的传统方法。我们的HMSM方法还提供了一种简单的方法，可以通过状态标记的句子从单个摘要为多个文档编译单个元摘要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Trans. Speech Lang. Process.

自引率

0.00%

发文量