基于子主题的多文档摘要

Shu Gong, Y. Qu, Shengfeng Tian
{"title":"基于子主题的多文档摘要","authors":"Shu Gong, Y. Qu, Shengfeng Tian","doi":"10.1109/CSO.2010.239","DOIUrl":null,"url":null,"abstract":"Multi-documents summarization is an important research area of NLP. Most methods or techniques of multidocument summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopic-based Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it. With the sentences represented as subtopic vectors, it assesses the semantic distances of sentences from the documents collection’s main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topic’s documents collection with some other topics’ documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.","PeriodicalId":427481,"journal":{"name":"2010 Third International Joint Conference on Computational Science and Optimization","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Subtopic-based Multi-documents Summarization\",\"authors\":\"Shu Gong, Y. Qu, Shengfeng Tian\",\"doi\":\"10.1109/CSO.2010.239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-documents summarization is an important research area of NLP. Most methods or techniques of multidocument summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopic-based Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it. With the sentences represented as subtopic vectors, it assesses the semantic distances of sentences from the documents collection’s main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topic’s documents collection with some other topics’ documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.\",\"PeriodicalId\":427481,\"journal\":{\"name\":\"2010 Third International Joint Conference on Computational Science and Optimization\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Third International Joint Conference on Computational Science and Optimization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSO.2010.239\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Third International Joint Conference on Computational Science and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSO.2010.239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

多文献摘要是自然语言处理的一个重要研究领域。大多数多文档摘要方法或技术要么将文档集合视为单主题,要么仅将每个句子视为单主题,但缺乏对隐藏在文档中的子主题语义的系统分析。提出了一种基于子主题的多文档摘要(SubTMS)方法。它采用概率主题模型来发现每个句子中的子主题信息,并使用合适的分层子主题结构来描述整个文档集合和其中的所有句子。将句子表示为子主题向量,评估句子与文档集合的主要子主题之间的语义距离,并选择距离较短的句子作为文档集合的最终总结。在DUC 2007数据集上的实验中,我们发现:当使用其他一些主题的文档集作为背景知识来训练一个主题的文档集时,我们的方法与其他同行系统相比可以获得相当好的ROUGE分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Subtopic-based Multi-documents Summarization
Multi-documents summarization is an important research area of NLP. Most methods or techniques of multidocument summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopic-based Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it. With the sentences represented as subtopic vectors, it assesses the semantic distances of sentences from the documents collection’s main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topic’s documents collection with some other topics’ documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信