Real-Time Bidding by Reinforcement Learning in Display Advertising

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-01-10 DOI:10.1145/3018661.3018702

Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo

{"title":"Real-Time Bidding by Reinforcement Learning in Display Advertising","authors":"Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo","doi":"10.1145/3018661.3018702","DOIUrl":null,"url":null,"abstract":"The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for advertisers to devise a learning algorithm to cleverly bid an ad impression in real-time. Most previous works consider the bid decision as a static optimization problem of either treating the value of each impression independently or setting a bid price to each segment of ad volume. However, the bidding for a given ad campaign would repeatedly happen during its life span before the budget runs out. As such, each bid is strategically correlated by the constrained budget and the overall effectiveness of the campaign (e.g., the rewards from generated clicks), which is only observed after the campaign has completed. Thus, it is of great interest to devise an optimal bidding strategy sequentially so that the campaign budget can be dynamically allocated across all the available impressions on the basis of both the immediate and future rewards. In this paper, we formulate the bid decision process as a reinforcement learning problem, where the state space is represented by the auction information and the campaign's real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks. The empirical study on two large-scale real-world datasets and the live A/B testing on a commercial platform have demonstrated the superior performance and high efficiency compared to state-of-the-art methods.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"206","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018661.3018702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 206

Abstract

The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for advertisers to devise a learning algorithm to cleverly bid an ad impression in real-time. Most previous works consider the bid decision as a static optimization problem of either treating the value of each impression independently or setting a bid price to each segment of ad volume. However, the bidding for a given ad campaign would repeatedly happen during its life span before the budget runs out. As such, each bid is strategically correlated by the constrained budget and the overall effectiveness of the campaign (e.g., the rewards from generated clicks), which is only observed after the campaign has completed. Thus, it is of great interest to devise an optimal bidding strategy sequentially so that the campaign budget can be dynamically allocated across all the available impressions on the basis of both the immediate and future rewards. In this paper, we formulate the bid decision process as a reinforcement learning problem, where the state space is represented by the auction information and the campaign's real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks. The empirical study on two large-scale real-world datasets and the live A/B testing on a commercial platform have demonstrated the superior performance and high efficiency compared to state-of-the-art methods.

查看原文本刊更多论文

基于强化学习的展示广告实时竞价

大多数在线展示广告都是通过实时竞价(RTB)提供的——每个广告展示印象都是实时拍卖的，当它只是从用户访问中产生的。为了自动且最优地投放广告，广告商必须设计一种学习算法来实时巧妙地投放广告印象。以往的大多数研究都将投标决策视为一个静态优化问题，要么独立处理每个印象的价值，要么为每一部分广告量设定一个投标价格。然而，在预算用完之前，一个特定广告活动的竞标将在其生命周期内反复进行。因此，每个出价都是与有限的预算和活动的整体有效性(例如，产生点击的奖励)相关联的，这只有在活动完成后才能观察到。因此，设计一个最优的竞价策略是很有意义的，这样活动预算就可以根据当前和未来的奖励动态地分配到所有可用的印象上。在本文中，我们将投标决策过程描述为一个强化学习问题，其中状态空间由拍卖信息和活动的实时参数表示，而动作是要设置的投标价格。通过对竞价竞争的状态转移建模，构建了一个马尔可夫决策过程框架，用于学习最优竞价策略，从而在动态实时竞价环境下优化广告效果。此外，通过神经网络的状态值近似，可以很好地处理来自现实世界大量拍卖和活动预算的可扩展性问题。通过对两个大规模真实数据集的实证研究和商业平台上的实时A/B测试，证明了与最先进的方法相比，该方法具有优越的性能和高效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量