Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2024-08-28 DOI:10.1109/TC.2024.3449749

Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian

{"title":"Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving","authors":"Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian","doi":"10.1109/TC.2024.3449749","DOIUrl":null,"url":null,"abstract":"Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose \nRecServe\n, a high-performant serving system for recommendation with the optimized design of \nstructured features\n and \nSessionGroups\n for recommendation serving. With \nstructured features\n, \nRecServe\n packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With \nsession group\n, \nRecServe\n further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that \nRecServe\n can achieve maximum performance speedups of 12.3\n<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>$22.0\\boldsymbol{\\times}$</tex-math></inline-formula>\n compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 11","pages":"2474-2487"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10654386/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose RecServe , a high-performant serving system for recommendation with the optimized design of structured features and SessionGroups for recommendation serving. With structured features , RecServe packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With session group , RecServe further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that RecServe can achieve maximum performance speedups of 12.3

$\boldsymbol{\times}$

and

$22.0\boldsymbol{\times}$

compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.

查看原文本刊更多论文

利用结构化特征和运行时隔离实现高效推荐服务

利用深度学习模型提供推荐服务是现代电子商务公司最有价值的服务之一。在生产过程中，为了满足数十亿次推荐查询和严格的服务水平协议，高性能的推荐服务系统在满足如此巨大的需求方面发挥着至关重要的作用。遗憾的是，现有的模型服务框架无法实现高效服务，原因在于存在以下独特的挑战：1）服务需求与模型能力之间的输入格式不匹配；2）同时执行受限操作的软件任务繁重。针对上述挑战，我们提出了一个高性能的推荐服务系统 RecServe，该系统对结构化特征和会话组进行了优化设计，以提供推荐服务。利用结构化特征，RecServe 通过半自动转换带有注释的输入张量的计算图来打包单用户-多候选输入，这可以大大减少冗余的网络传输、数据移动和无用的计算。在会话组的基础上，RecServe 进一步采用了多个计算流的资源隔离和基于临界路径调度策略的成本感知操作员调度器，以实现并发内核执行，从而进一步提高服务吞吐量。实验结果表明，与CPU和GPU平台上最先进的服务系统相比，RecServe的最高性能分别提高了12.3倍和22.0倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.