Balance learning to rank in big data

2014 22nd European Signal Processing Conference (EUSIPCO) Pub Date : 2014-11-13 DOI:10.5281/ZENODO.44026

G. Cao, I. Ahmad, Honglei Zhang, Weiyi Xie, M. Gabbouj

引用次数: 2

Abstract

We propose a distributed learning to rank method, and demonstrate its effectiveness in web-scale image retrieval. With the increasing amount of data, it is not applicable to train a centralized ranking model for any large scale learning problems. In distributed learning, the discrepancy between the training subsets and the whole when building the models are non-trivial but overlooked in the previous work. In this paper, we firstly include a cost factor to boosting algorithms to balance the individual models toward the whole data. Then, we propose to decompose the original algorithm to multiple layers, and their aggregation forms a superior ranker which can be easily scaled up to billions of images. The extensive experiments show the proposed method outperforms the straightforward aggregation of boosting algorithms.

查看原文本刊更多论文

平衡学习，在大数据中排名

我们提出了一种分布式学习排序方法，并证明了其在web规模图像检索中的有效性。随着数据量的不断增加，对于任何大规模的学习问题，都不适合训练集中式排名模型。在分布式学习中，在建立模型时，训练子集与整体之间的差异很重要，但在以前的工作中被忽略了。在本文中，我们首先在增强算法中加入一个成本因素，以平衡单个模型与整个数据。然后，我们提出将原始算法分解为多个层，它们的聚合形成一个更高级的秩，可以很容易地扩展到数十亿张图像。大量的实验表明，该方法优于直接聚合的增强算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 22nd European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量