One-Pass Ranking Models for Low-Latency Product Recommendations

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI:10.1145/2783258.2788579

Antonino Freno, Martin Saveski, Rodolphe Jenatton, C. Archambeau

{"title":"One-Pass Ranking Models for Low-Latency Product Recommendations","authors":"Antonino Freno, Martin Saveski, Rodolphe Jenatton, C. Archambeau","doi":"10.1145/2783258.2788579","DOIUrl":null,"url":null,"abstract":"Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommendations by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the naive applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers' preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equivalently maximize the data throughput), which in turn may prevent us from achieving the accuracy necessary in web-scale industrial applications. In this paper, we investigate how the practical challenges faced in this setting can be tackled via an online learning to rank approach. Sparse models will be the key to reduce prediction latency, whereas one-pass stochastic optimization will minimize the training time and restrict the memory footprint. Interestingly, and perhaps surprisingly, extensive experiments show that one-pass learning preserves most of the predictive performance. Additionally, we study a variety of online learning algorithms that enforce sparsity and provide insights to help the practitioner make an informed decision about which approach to pick. We report results on a massive purchase log dataset from the Amazon retail website, as well as on several benchmarks from the LETOR corpus.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2783258.2788579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommendations by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the naive applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers' preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equivalently maximize the data throughput), which in turn may prevent us from achieving the accuracy necessary in web-scale industrial applications. In this paper, we investigate how the practical challenges faced in this setting can be tackled via an online learning to rank approach. Sparse models will be the key to reduce prediction latency, whereas one-pass stochastic optimization will minimize the training time and restrict the memory footprint. Interestingly, and perhaps surprisingly, extensive experiments show that one-pass learning preserves most of the predictive performance. Additionally, we study a variety of online learning algorithms that enforce sparsity and provide insights to help the practitioner make an informed decision about which approach to pick. We report results on a massive purchase log dataset from the Amazon retail website, as well as on several benchmarks from the LETOR corpus.

查看原文本刊更多论文

低延迟产品推荐的单次排名模型

电子商务平台收集的购买日志提供了丰富的客户偏好信息。通过将这些日志提供给机器学习排名模型，可以利用这些日志来提高产品推荐的质量。然而，各种部署约束限制了机器学习对这个问题的简单适用性。首先，数据的数量和维度使得记忆学习根本不可能。其次，随着时间的推移，客户偏好的变化需要定期使用新收集的数据重新训练排名模型。这限制了可用于训练的时间，间隔时间短得令人望而却步。第三，当查询复杂性使我们无法缓存预测时，实时排序是必要的。这种约束要求最小化预测时间(或等价地最大化数据吞吐量)，这反过来可能会阻止我们实现网络规模工业应用所需的准确性。在本文中，我们研究了如何通过在线学习排名方法来解决这种设置中面临的实际挑战。稀疏模型将是减少预测延迟的关键，而单次随机优化将最小化训练时间并限制内存占用。有趣的是，也许令人惊讶的是，大量的实验表明，一次学习保留了大部分的预测性能。此外，我们研究了各种在线学习算法，这些算法加强了稀疏性，并提供了见解，以帮助从业者做出明智的决定，选择哪种方法。我们报告了来自亚马逊零售网站的大量购买日志数据集的结果，以及来自LETOR语料库的几个基准测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

自引率

0.00%

发文量