Learning Large Scale Ordinal Ranking Model via Divide-and-Conquer Technique

Companion Proceedings of the The Web Conference 2018 Pub Date : 2018-04-23 DOI:10.1145/3184558.3191658

Lu Tang, Sougata Chaudhuri, A. Bagherjeiran, Lingzhi Zhou

{"title":"Learning Large Scale Ordinal Ranking Model via Divide-and-Conquer Technique","authors":"Lu Tang, Sougata Chaudhuri, A. Bagherjeiran, Lingzhi Zhou","doi":"10.1145/3184558.3191658","DOIUrl":null,"url":null,"abstract":"Structured prediction, where outcomes have a precedence order, lies at the heart of machine learning for information retrieval, movie recommendation, product review prediction, and digital advertising. Ordinal ranking, in particular, assumes that the structured response has a linear ranked order. Due to the extensive applicability of these models, substantial research has been devoted to understanding them, as well as developing efficient training techniques. One popular and widely cited technique of training ordinal ranking models is to exploit the linear precedence order and systematically reduce it to a binary classification problem. This facilitates the usage of readily available, powerful binary classifiers, but necessitates an expansion of the original training data, where the training data increases by $K-1$ times of its original size, with K being the number of ordinal classes. Due to prevalent nature of problems with large number of ordered classes, the reduction leads to datasets which are too large to train on single machines. While approximation methods like stochastic gradient descent are typically applied here, we investigate exact optimization solutions that can scale. In this paper, we present a divide-and-conquer (DC) algorithm, which divides large scale binary classification data into a cluster of machines and trains logistic models in parallel, and combines them at the end of the training phase to create a single binary classifier, which can then be used as an ordinal ranker. It requires no synchronization between the parallel learning algorithms during the training period, which makes training on large datasets feasible and efficient. We prove consistency and asymptotic normality property of the learned models using our proposed algorithm. We provide empirical evidence, on various ordinal datasets, of improved estimation and prediction performance of the model learnt using our algorithm, over several standard divide-and-conquer algorithms.","PeriodicalId":235572,"journal":{"name":"Companion Proceedings of the The Web Conference 2018","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the The Web Conference 2018","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3184558.3191658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Structured prediction, where outcomes have a precedence order, lies at the heart of machine learning for information retrieval, movie recommendation, product review prediction, and digital advertising. Ordinal ranking, in particular, assumes that the structured response has a linear ranked order. Due to the extensive applicability of these models, substantial research has been devoted to understanding them, as well as developing efficient training techniques. One popular and widely cited technique of training ordinal ranking models is to exploit the linear precedence order and systematically reduce it to a binary classification problem. This facilitates the usage of readily available, powerful binary classifiers, but necessitates an expansion of the original training data, where the training data increases by $K-1$ times of its original size, with K being the number of ordinal classes. Due to prevalent nature of problems with large number of ordered classes, the reduction leads to datasets which are too large to train on single machines. While approximation methods like stochastic gradient descent are typically applied here, we investigate exact optimization solutions that can scale. In this paper, we present a divide-and-conquer (DC) algorithm, which divides large scale binary classification data into a cluster of machines and trains logistic models in parallel, and combines them at the end of the training phase to create a single binary classifier, which can then be used as an ordinal ranker. It requires no synchronization between the parallel learning algorithms during the training period, which makes training on large datasets feasible and efficient. We prove consistency and asymptotic normality property of the learned models using our proposed algorithm. We provide empirical evidence, on various ordinal datasets, of improved estimation and prediction performance of the model learnt using our algorithm, over several standard divide-and-conquer algorithms.

查看原文本刊更多论文

用分治法学习大规模有序排序模型

结构化预测是信息检索、电影推荐、产品评论预测和数字广告等机器学习的核心。结构化预测的结果具有优先顺序。特别是，序数排序假定结构化响应具有线性排序顺序。由于这些模型的广泛适用性，大量的研究一直致力于理解它们，以及开发有效的训练技术。一种流行和被广泛引用的训练有序排序模型的技术是利用线性优先顺序并系统地将其简化为二进制分类问题。这有利于使用现成的、强大的二元分类器，但需要对原始训练数据进行扩展，其中训练数据增加了原始大小的$K-1$倍，其中K为有序类的数量。由于大量有序类问题的普遍性质，减少导致数据集太大而无法在单个机器上训练。虽然近似方法，如随机梯度下降通常应用在这里，我们研究精确的优化解决方案，可以扩展。在本文中，我们提出了一种分而治之(DC)算法，该算法将大规模二进制分类数据划分为机器集群并并行训练逻辑模型，并在训练阶段结束时将它们组合成单个二进制分类器，然后将其用作有序排序器。它在训练期间不需要并行学习算法之间的同步，这使得在大数据集上的训练是可行的和高效的。利用所提出的算法证明了所学习模型的一致性和渐近正态性。我们提供了经验证据，在各种有序数据集上，使用我们的算法学习的模型的改进估计和预测性能，超过几种标准的分治算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion Proceedings of the The Web Conference 2018

自引率

0.00%

发文量