A Case for Two-stage Inference with Knowledge Caching

Geonha Park, Changho Hwang, KyoungSoo Park
{"title":"A Case for Two-stage Inference with Knowledge Caching","authors":"Geonha Park, Changho Hwang, KyoungSoo Park","doi":"10.1145/3325413.3329789","DOIUrl":null,"url":null,"abstract":"Real-world intelligent services employing deep learning technology typically take a two-tier system architecture -- a dumb front-end device and smart back-end cloud servers. The front-end device simply forwards a human query while the back-end servers run a complex deep model to resolve the query and respond to the front-end device. While simple and effective, the current architecture not only increases the load at servers but also runs the risk of harming user privacy. In this paper, we present knowledge caching, which exploits the front-end device as a smart cache of a generalized deep model. The cache locally resolves a subset of popular or privacy-sensitive queries while it forwards the rest of them to back-end cloud servers. We discuss the feasibility of knowledge caching as well as technical challenges around deep model specialization and compression. We show our prototype two-stage inference system that populates a front-end cache with 10 voice commands out of 35 commands. We demonstrate that our specialization and compression techniques reduce the cached model size by 17.4x from the original model with 1.8x improvement on the inference accuracy.","PeriodicalId":164793,"journal":{"name":"The 3rd International Workshop on Deep Learning for Mobile Systems and Applications - EMDL '19","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 3rd International Workshop on Deep Learning for Mobile Systems and Applications - EMDL '19","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3325413.3329789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Real-world intelligent services employing deep learning technology typically take a two-tier system architecture -- a dumb front-end device and smart back-end cloud servers. The front-end device simply forwards a human query while the back-end servers run a complex deep model to resolve the query and respond to the front-end device. While simple and effective, the current architecture not only increases the load at servers but also runs the risk of harming user privacy. In this paper, we present knowledge caching, which exploits the front-end device as a smart cache of a generalized deep model. The cache locally resolves a subset of popular or privacy-sensitive queries while it forwards the rest of them to back-end cloud servers. We discuss the feasibility of knowledge caching as well as technical challenges around deep model specialization and compression. We show our prototype two-stage inference system that populates a front-end cache with 10 voice commands out of 35 commands. We demonstrate that our specialization and compression techniques reduce the cached model size by 17.4x from the original model with 1.8x improvement on the inference accuracy.
基于知识缓存的两阶段推理
现实世界中采用深度学习技术的智能服务通常采用两层系统架构——哑前端设备和智能后端云服务器。前端设备简单地转发人工查询,而后端服务器运行复杂的深度模型来解析查询并响应前端设备。虽然简单有效,但目前的架构不仅增加了服务器的负载,而且还存在损害用户隐私的风险。本文提出了一种利用前端设备作为广义深度模型的智能缓存的知识缓存方法。缓存在本地解析流行查询或隐私敏感查询的子集,同时将其余查询转发到后端云服务器。我们讨论了知识缓存的可行性,以及围绕深度模型专门化和压缩的技术挑战。我们展示了我们的原型两阶段推理系统,它用35个语音命令中的10个来填充前端缓存。我们证明了我们的专门化和压缩技术将缓存的模型大小比原始模型减少了17.4倍,推理精度提高了1.8倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信