通过离线初始化快速在线学习，用于时间敏感的推荐

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI:10.1145/1835804.1835894

D. Agarwal, Bee-Chung Chen, P. Elango

{"title":"通过离线初始化快速在线学习，用于时间敏感的推荐","authors":"D. Agarwal, Bee-Chung Chen, P. Elango","doi":"10.1145/1835804.1835894","DOIUrl":null,"url":null,"abstract":"Recommender problems with large and dynamic item pools are ubiquitous in web applications like content optimization, online advertising and web search. Despite the availability of rich item meta-data, excess heterogeneity at the item level often requires inclusion of item-specific \"factors\" (or weights) in the model. However, since estimating item factors is computationally intensive, it poses a challenge for time-sensitive recommender problems where it is important to rapidly learn factors for new items (e.g., news articles, event updates, tweets) in an online fashion. In this paper, we propose a novel method called FOBFM (Fast Online Bilinear Factor Model) to learn item-specific factors quickly through online regression. The online regression for each item can be performed independently and hence the procedure is fast, scalable and easily parallelizable. However, the convergence of these independent regressions can be slow due to high dimensionality. The central idea of our approach is to use a large amount of historical data to initialize the online models based on offline features and learn linear projections that can effectively reduce the dimensionality. We estimate the rank of our linear projections by taking recourse to online model selection based on optimizing predictive likelihood. Through extensive experiments, we show that our method significantly and uniformly outperforms other competitive methods and obtains relative lifts that are in the range of 10-15% in terms of predictive log-likelihood, 200-300% for a rank correlation metric on a proprietary My Yahoo! dataset; it obtains 9% reduction in root mean squared error over the previously best method on a benchmark MovieLens dataset using a time-based train/test data split.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":"{\"title\":\"Fast online learning through offline initialization for time-sensitive recommendation\",\"authors\":\"D. Agarwal, Bee-Chung Chen, P. Elango\",\"doi\":\"10.1145/1835804.1835894\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recommender problems with large and dynamic item pools are ubiquitous in web applications like content optimization, online advertising and web search. Despite the availability of rich item meta-data, excess heterogeneity at the item level often requires inclusion of item-specific \\\"factors\\\" (or weights) in the model. However, since estimating item factors is computationally intensive, it poses a challenge for time-sensitive recommender problems where it is important to rapidly learn factors for new items (e.g., news articles, event updates, tweets) in an online fashion. In this paper, we propose a novel method called FOBFM (Fast Online Bilinear Factor Model) to learn item-specific factors quickly through online regression. The online regression for each item can be performed independently and hence the procedure is fast, scalable and easily parallelizable. However, the convergence of these independent regressions can be slow due to high dimensionality. The central idea of our approach is to use a large amount of historical data to initialize the online models based on offline features and learn linear projections that can effectively reduce the dimensionality. We estimate the rank of our linear projections by taking recourse to online model selection based on optimizing predictive likelihood. Through extensive experiments, we show that our method significantly and uniformly outperforms other competitive methods and obtains relative lifts that are in the range of 10-15% in terms of predictive log-likelihood, 200-300% for a rank correlation metric on a proprietary My Yahoo! dataset; it obtains 9% reduction in root mean squared error over the previously best method on a benchmark MovieLens dataset using a time-based train/test data split.\",\"PeriodicalId\":20529,\"journal\":{\"name\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"72\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1835804.1835894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835804.1835894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 72

摘要

在内容优化、在线广告和网络搜索等web应用程序中，大型动态项目池的推荐问题无处不在。尽管存在丰富的项目元数据，但项目层面的过度异质性通常需要在模型中包含特定于项目的“因素”(或权重)。然而，由于估计项目因素是计算密集型的，它对时间敏感的推荐问题提出了挑战，在这些问题中，以在线方式快速学习新项目(例如，新闻文章、事件更新、tweet)的因素是很重要的。在本文中，我们提出了一种新的方法，称为FOBFM(快速在线双线性因子模型)，通过在线回归快速学习特定项目的因素。每个项目的在线回归可以独立执行，因此该过程快速，可扩展且易于并行化。然而，由于维数较高，这些独立回归的收敛速度较慢。我们的方法的核心思想是使用大量的历史数据来初始化基于离线特征的在线模型，并学习可以有效降维的线性投影。我们通过基于优化预测似然的在线模型选择来估计线性预测的秩。通过大量的实验，我们表明，我们的方法显著且一致地优于其他竞争方法，并且在预测对数似然方面获得了10-15%的相对提升，在专有的Yahoo!数据集;在使用基于时间的训练/测试数据分割的基准MovieLens数据集上，与之前的最佳方法相比，它的均方根误差降低了9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast online learning through offline initialization for time-sensitive recommendation

Recommender problems with large and dynamic item pools are ubiquitous in web applications like content optimization, online advertising and web search. Despite the availability of rich item meta-data, excess heterogeneity at the item level often requires inclusion of item-specific "factors" (or weights) in the model. However, since estimating item factors is computationally intensive, it poses a challenge for time-sensitive recommender problems where it is important to rapidly learn factors for new items (e.g., news articles, event updates, tweets) in an online fashion. In this paper, we propose a novel method called FOBFM (Fast Online Bilinear Factor Model) to learn item-specific factors quickly through online regression. The online regression for each item can be performed independently and hence the procedure is fast, scalable and easily parallelizable. However, the convergence of these independent regressions can be slow due to high dimensionality. The central idea of our approach is to use a large amount of historical data to initialize the online models based on offline features and learn linear projections that can effectively reduce the dimensionality. We estimate the rank of our linear projections by taking recourse to online model selection based on optimizing predictive likelihood. Through extensive experiments, we show that our method significantly and uniformly outperforms other competitive methods and obtains relative lifts that are in the range of 10-15% in terms of predictive log-likelihood, 200-300% for a rank correlation metric on a proprietary My Yahoo! dataset; it obtains 9% reduction in root mean squared error over the previously best method on a benchmark MovieLens dataset using a time-based train/test data split.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量