Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI:10.1145/3511808.3557683

Jinbo Song, Ruoran Huang, Xinyang Wang, Wei Huang, Qian Yu, Mingming Chen, Yafei Yao, Chaosheng Fan, Changping Peng, Zhangang Lin, Jinghe Hu, Jingping Shao

{"title":"Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models","authors":"Jinbo Song, Ruoran Huang, Xinyang Wang, Wei Huang, Qian Yu, Mingming Chen, Yafei Yao, Chaosheng Fan, Changping Peng, Zhangang Lin, Jinghe Hu, Jingping Shao","doi":"10.1145/3511808.3557683","DOIUrl":null,"url":null,"abstract":"Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with L0 regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511808.3557683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with L0 regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.

查看原文本刊更多论文

大规模预排名系统的再思考:全链跨领域模型

推荐系统和在线广告等工业系统已经广泛采用多阶段架构，将其分为几个级联模块，包括匹配、预排名、排名和重新排名。作为匹配和排序之间的关键桥梁，现有的预排序方法由于忽略了全链数据依赖性，存在样本选择偏差问题，导致性能不佳。本文从整个样本空间的角度重新思考预排序系统，提出了利用整个级联阶段样本的全链跨域模型(whole -chain Cross-domain Models, ECM)来有效缓解SSB问题。此外，我们还设计了一种名为ECMM的细粒度神经网络结构，以进一步提高预排序的准确性。具体而言，我们提出了一种跨域多塔神经网络来综合预测每个阶段的结果，并引入了L0正则化的子网络路由策略来降低计算成本。对现实世界大规模流量日志的评估表明，我们的预排序模型在时间消耗保持在可接受水平的情况下优于SOTA方法，从而在效率和有效性之间实现了更好的权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

自引率

0.00%

发文量