A Novel Relational Learning-to-Rank Approach for Topic-Focused Multi-document Summarization

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI:10.1109/ICDM.2013.38

Yadong Zhu, Yanyan Lan, J. Guo, Pan Du, Xueqi Cheng

{"title":"A Novel Relational Learning-to-Rank Approach for Topic-Focused Multi-document Summarization","authors":"Yadong Zhu, Yanyan Lan, J. Guo, Pan Du, Xueqi Cheng","doi":"10.1109/ICDM.2013.38","DOIUrl":null,"url":null,"abstract":"Topic-focused multi-document summarization aims to produce a summary over a set of documents and conveys the most important aspects of a given topic. Most existing extractive methods view the task as a multi-criteria ranking problem over sentences, where relevance, salience and diversity are three typical requirements. However, diversity is a challenging problem as it involves modeling the relationship between sentences during ranking, where traditional methods usually tackle it in a heuristic or implicit way. In this paper, we propose a novel relational learning-to-rank approach (R-LTR) to solve this problem. Relational learning-to-rank is a new learning framework which further incorporates relationships into traditional learning-to-rank in an elegant way. Specifically, the ranking function is defined as the combination of content-based score of individual sentence, and relation-based score between the current sentence and those already selected. On this basis, we propose to learn the ranking function by minimizing the likelihood loss based on Plackett-Luce model, which can naturally model the sequential ranking procedure of candidate sentences. Stochastic gradient descent is then employed to conduct the learning process, and the summary is predicted by the greedy selection procedure based on the learned ranking function. Finally, we conduct extensive experiments on benchmark data sets TAC2008 and TAC2009. Experimental results show that our approach can significantly outperform the state-of-the-art methods from both quantitative and qualitative aspects.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 13th International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2013.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Topic-focused multi-document summarization aims to produce a summary over a set of documents and conveys the most important aspects of a given topic. Most existing extractive methods view the task as a multi-criteria ranking problem over sentences, where relevance, salience and diversity are three typical requirements. However, diversity is a challenging problem as it involves modeling the relationship between sentences during ranking, where traditional methods usually tackle it in a heuristic or implicit way. In this paper, we propose a novel relational learning-to-rank approach (R-LTR) to solve this problem. Relational learning-to-rank is a new learning framework which further incorporates relationships into traditional learning-to-rank in an elegant way. Specifically, the ranking function is defined as the combination of content-based score of individual sentence, and relation-based score between the current sentence and those already selected. On this basis, we propose to learn the ranking function by minimizing the likelihood loss based on Plackett-Luce model, which can naturally model the sequential ranking procedure of candidate sentences. Stochastic gradient descent is then employed to conduct the learning process, and the summary is predicted by the greedy selection procedure based on the learned ranking function. Finally, we conduct extensive experiments on benchmark data sets TAC2008 and TAC2009. Experimental results show that our approach can significantly outperform the state-of-the-art methods from both quantitative and qualitative aspects.

查看原文本刊更多论文

面向主题的多文档摘要的一种新的关系学习排序方法

以主题为中心的多文档摘要旨在生成一组文档的摘要，并传达给定主题的最重要方面。大多数现有的提取方法将任务视为对句子进行多标准排序的问题，其中相关性、显著性和多样性是三个典型要求。然而，多样性是一个具有挑战性的问题，因为它涉及到在排序过程中对句子之间的关系进行建模，而传统的方法通常采用启发式或隐式的方式来解决它。在本文中，我们提出了一种新的关系学习排序方法(R-LTR)来解决这个问题。关系排序学习是一种新的学习框架，它以一种优雅的方式将关系进一步整合到传统的排序学习中。具体来说，排序函数定义为单个句子的基于内容的得分，以及当前句子与已选句子之间基于关系的得分的组合。在此基础上，我们提出基于Plackett-Luce模型通过最小化似然损失来学习排序函数，该模型可以自然地对候选句子的顺序排序过程进行建模。然后采用随机梯度下降法进行学习，并根据学习到的排序函数，采用贪心选择法进行汇总预测。最后，在基准数据集TAC2008和TAC2009上进行了广泛的实验。实验结果表明，我们的方法在定量和定性方面都明显优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 13th International Conference on Data Mining

自引率

0.00%

发文量