基于潜在狄利克雷分配和奇异值分解的多文档摘要

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI:10.1109/ICDM.2008.55

Rachit Arora, Balaraman Ravindran

{"title":"基于潜在狄利克雷分配和奇异值分解的多文档摘要","authors":"Rachit Arora, Balaraman Ravindran","doi":"10.1109/ICDM.2008.55","DOIUrl":null,"url":null,"abstract":"Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":"{\"title\":\"Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization\",\"authors\":\"Rachit Arora, Balaraman Ravindran\",\"doi\":\"10.1109/ICDM.2008.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.\",\"PeriodicalId\":252958,\"journal\":{\"name\":\"2008 Eighth IEEE International Conference on Data Mining\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"58\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Eighth IEEE International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2008.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Eighth IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2008.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

摘要

多文档摘要处理的是计算一组相关文章的摘要，以便它们向用户提供关于事件的一般视图。其中一个目标是句子应该涵盖文档中的不同事件，并且尽可能少地用句子涵盖信息。潜在狄利克雷分配可以将这些文档分解为不同的主题或事件。然而，为了减少共同的信息含量，摘要的句子需要彼此正交，因为正交向量之间的相似性和相关性尽可能低。利用奇异值分解得到向量的正交表示，并将句子表示为向量，在LDA混合模型加权项域中得到彼此正交的句子。因此，使用LDA我们找到文档中的不同主题，使用SVD我们找到最能代表这些主题的句子。最后，利用ROUGE评估器对DUC2002语料库的多文档摘要任务进行了算法评估。与2002年DUC获奖者相比，我们的算法给出了更好的ROUGE-1召回措施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization

Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 Eighth IEEE International Conference on Data Mining

自引率

0.00%

发文量