Subtopic-based Multi-documents Summarization

2010 Third International Joint Conference on Computational Science and Optimization Pub Date : 2010-05-28 DOI:10.1109/CSO.2010.239

Shu Gong, Y. Qu, Shengfeng Tian

{"title":"Subtopic-based Multi-documents Summarization","authors":"Shu Gong, Y. Qu, Shengfeng Tian","doi":"10.1109/CSO.2010.239","DOIUrl":null,"url":null,"abstract":"Multi-documents summarization is an important research area of NLP. Most methods or techniques of multidocument summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopic-based Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it. With the sentences represented as subtopic vectors, it assesses the semantic distances of sentences from the documents collection’s main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topic’s documents collection with some other topics’ documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.","PeriodicalId":427481,"journal":{"name":"2010 Third International Joint Conference on Computational Science and Optimization","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Third International Joint Conference on Computational Science and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSO.2010.239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Multi-documents summarization is an important research area of NLP. Most methods or techniques of multidocument summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopic-based Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it. With the sentences represented as subtopic vectors, it assesses the semantic distances of sentences from the documents collection’s main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topic’s documents collection with some other topics’ documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.

查看原文本刊更多论文

基于子主题的多文档摘要

多文献摘要是自然语言处理的一个重要研究领域。大多数多文档摘要方法或技术要么将文档集合视为单主题，要么仅将每个句子视为单主题，但缺乏对隐藏在文档中的子主题语义的系统分析。提出了一种基于子主题的多文档摘要(SubTMS)方法。它采用概率主题模型来发现每个句子中的子主题信息，并使用合适的分层子主题结构来描述整个文档集合和其中的所有句子。将句子表示为子主题向量，评估句子与文档集合的主要子主题之间的语义距离，并选择距离较短的句子作为文档集合的最终总结。在DUC 2007数据集上的实验中，我们发现:当使用其他一些主题的文档集作为背景知识来训练一个主题的文档集时，我们的方法与其他同行系统相比可以获得相当好的ROUGE分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 Third International Joint Conference on Computational Science and Optimization

自引率

0.00%

发文量