Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving

IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian
{"title":"Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving","authors":"Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian","doi":"10.1109/TC.2024.3449749","DOIUrl":null,"url":null,"abstract":"Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose \n<i>RecServe</i>\n, a high-performant serving system for recommendation with the optimized design of \n<i>structured features</i>\n and \n<i>SessionGroups</i>\n for recommendation serving. With \n<i>structured features</i>\n, \n<i>RecServe</i>\n packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With \n<i>session group</i>\n, \n<i>RecServe</i>\n further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that \n<i>RecServe</i>\n can achieve maximum performance speedups of 12.3\n<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>$22.0\\boldsymbol{\\times}$</tex-math></inline-formula>\n compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 11","pages":"2474-2487"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10654386/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose RecServe , a high-performant serving system for recommendation with the optimized design of structured features and SessionGroups for recommendation serving. With structured features , RecServe packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With session group , RecServe further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that RecServe can achieve maximum performance speedups of 12.3 $\boldsymbol{\times}$ and $22.0\boldsymbol{\times}$ compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.
利用结构化特征和运行时隔离实现高效推荐服务
利用深度学习模型提供推荐服务是现代电子商务公司最有价值的服务之一。在生产过程中,为了满足数十亿次推荐查询和严格的服务水平协议,高性能的推荐服务系统在满足如此巨大的需求方面发挥着至关重要的作用。遗憾的是,现有的模型服务框架无法实现高效服务,原因在于存在以下独特的挑战:1)服务需求与模型能力之间的输入格式不匹配;2)同时执行受限操作的软件任务繁重。针对上述挑战,我们提出了一个高性能的推荐服务系统 RecServe,该系统对结构化特征和会话组进行了优化设计,以提供推荐服务。利用结构化特征,RecServe 通过半自动转换带有注释的输入张量的计算图来打包单用户-多候选输入,这可以大大减少冗余的网络传输、数据移动和无用的计算。在会话组的基础上,RecServe 进一步采用了多个计算流的资源隔离和基于临界路径调度策略的成本感知操作员调度器,以实现并发内核执行,从而进一步提高服务吞吐量。实验结果表明,与CPU和GPU平台上最先进的服务系统相比,RecServe的最高性能分别提高了12.3倍和22.0倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Computers
IEEE Transactions on Computers 工程技术-工程:电子与电气
CiteScore
6.60
自引率
5.40%
发文量
199
审稿时长
6.0 months
期刊介绍: The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信