分组:tee承载深度学习推理的准确延迟预测

Yan Li, Junming Ma, Donggang Cao, Hong Mei
{"title":"分组:tee承载深度学习推理的准确延迟预测","authors":"Yan Li, Junming Ma, Donggang Cao, Hong Mei","doi":"10.1109/ICDCS54860.2022.00092","DOIUrl":null,"url":null,"abstract":"As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference\",\"authors\":\"Yan Li, Junming Ma, Donggang Cao, Hong Mei\",\"doi\":\"10.1109/ICDCS54860.2022.00092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.\",\"PeriodicalId\":225883,\"journal\":{\"name\":\"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)\",\"volume\":\"114 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS54860.2022.00092\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

随着云卸载深度学习(DL)推理的安全问题越来越受到关注,在可信执行环境(tee)中运行深度学习推理已经成为一种普遍的做法。基于tee的深度学习模型推理的延迟预测在许多情况下都是必不可少的,例如在模型并行推理中具有延迟约束的深度神经网络模型架构搜索或层调度。然而,现有的解决方案无法解决tee内部资源受限环境中的内存过度承诺问题。本文介绍了Sectum,一个准确的延迟预测器,用于TEE飞地内的DL推断。我们首先进行综合实证研究,分析推理延迟与内存占用之间的关系。Sectum根据一些关键观察结果预测了两阶段设计后的推理延迟。首先,Sectum使用基于图神经网络(GNN)的模型来检测给定模型是否会触发tee中的内存过度使用。然后,将算子级延迟建模与线性回归相结合,Sectum可以预测模型的延迟。为了评估Sectum,我们设计了一个包含超过6k个CNN模型延迟信息的大型数据集。我们的实验表明,Sectum可以达到85%±10%以上的延迟预测准确率。据我们所知,Sectum是第一个准确预测tee承载的DL推理延迟的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference
As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信