主干分析:从CNN推理延迟到计算平台的结构化洞察

2022 IEEE Intelligent Vehicles Symposium (IV) Pub Date : 2022-06-05 DOI:10.1109/iv51971.2022.9827260

Frank M. Hafner, Matthias Zeller, Mark Schutera, Jochen Abhau, Julian F. P. Kooij

{"title":"主干分析:从CNN推理延迟到计算平台的结构化洞察","authors":"Frank M. Hafner, Matthias Zeller, Mark Schutera, Jochen Abhau, Julian F. P. Kooij","doi":"10.1109/iv51971.2022.9827260","DOIUrl":null,"url":null,"abstract":"Customization of a convolutional neural network (CNN) to a specific compute platform involves finding an optimal pareto state between computational complexity of the CNN and resulting throughput in operations per second on the compute platform. However, existing inference performance benchmarks compare complete backbones that entail many differences between their CNN configurations, which do not provide insights in how fine-grade layer design choices affect this balance.We present BackboneAnalysis, a methodology for extracting structured insights into the trade-off for a chosen target compute platform. Within a one-factor-at-a-time analysis setup, CNN architectures are systematically varied and evaluated based on throughput and latency measurements irrespective of model accuracy. Thereby, we investigate the configuration factors input shape, batch size, kernel size and convolutional layer type.In our experiments, we deploy BackboneAnalysis on a Xavier iGPU and a Coral Edge TPU accelerator. The analysis reveals that the general assumption from optimal Roofline performance that higher operation density in CNNs leads to higher throughput does not always hold. These results highlight the importance for a neural network architect to be aware of platform-specific latency and throughput behavior in order to derive sensible configuration decisions for a custom CNN.","PeriodicalId":184622,"journal":{"name":"2022 IEEE Intelligent Vehicles Symposium (IV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"BackboneAnalysis: Structured Insights into Compute Platforms from CNN Inference Latency\",\"authors\":\"Frank M. Hafner, Matthias Zeller, Mark Schutera, Jochen Abhau, Julian F. P. Kooij\",\"doi\":\"10.1109/iv51971.2022.9827260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Customization of a convolutional neural network (CNN) to a specific compute platform involves finding an optimal pareto state between computational complexity of the CNN and resulting throughput in operations per second on the compute platform. However, existing inference performance benchmarks compare complete backbones that entail many differences between their CNN configurations, which do not provide insights in how fine-grade layer design choices affect this balance.We present BackboneAnalysis, a methodology for extracting structured insights into the trade-off for a chosen target compute platform. Within a one-factor-at-a-time analysis setup, CNN architectures are systematically varied and evaluated based on throughput and latency measurements irrespective of model accuracy. Thereby, we investigate the configuration factors input shape, batch size, kernel size and convolutional layer type.In our experiments, we deploy BackboneAnalysis on a Xavier iGPU and a Coral Edge TPU accelerator. The analysis reveals that the general assumption from optimal Roofline performance that higher operation density in CNNs leads to higher throughput does not always hold. These results highlight the importance for a neural network architect to be aware of platform-specific latency and throughput behavior in order to derive sensible configuration decisions for a custom CNN.\",\"PeriodicalId\":184622,\"journal\":{\"name\":\"2022 IEEE Intelligent Vehicles Symposium (IV)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Intelligent Vehicles Symposium (IV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iv51971.2022.9827260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iv51971.2022.9827260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

将卷积神经网络(CNN)定制为特定的计算平台需要在CNN的计算复杂性和计算平台上的每秒操作吞吐量之间找到最优的帕累托状态。然而，现有的推理性能基准比较了完整的骨干网，这些骨干网在其CNN配置之间存在许多差异，这并没有提供关于精细层设计选择如何影响这种平衡的见解。我们提出了BackboneAnalysis，这是一种用于提取所选目标计算平台权衡的结构化见解的方法。在单因素一次分析设置中，CNN架构系统地变化，并根据吞吐量和延迟测量进行评估，而不考虑模型准确性。因此，我们研究了输入形状、批大小、核大小和卷积层类型的配置因素。在我们的实验中，我们在Xavier iGPU和Coral Edge TPU加速器上部署了BackboneAnalysis。分析表明，最优rooline性能的一般假设，即cnn中更高的操作密度导致更高的吞吐量并不总是成立。这些结果强调了神经网络架构师了解特定于平台的延迟和吞吐量行为的重要性，以便为自定义CNN得出合理的配置决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BackboneAnalysis: Structured Insights into Compute Platforms from CNN Inference Latency

Customization of a convolutional neural network (CNN) to a specific compute platform involves finding an optimal pareto state between computational complexity of the CNN and resulting throughput in operations per second on the compute platform. However, existing inference performance benchmarks compare complete backbones that entail many differences between their CNN configurations, which do not provide insights in how fine-grade layer design choices affect this balance.We present BackboneAnalysis, a methodology for extracting structured insights into the trade-off for a chosen target compute platform. Within a one-factor-at-a-time analysis setup, CNN architectures are systematically varied and evaluated based on throughput and latency measurements irrespective of model accuracy. Thereby, we investigate the configuration factors input shape, batch size, kernel size and convolutional layer type.In our experiments, we deploy BackboneAnalysis on a Xavier iGPU and a Coral Edge TPU accelerator. The analysis reveals that the general assumption from optimal Roofline performance that higher operation density in CNNs leads to higher throughput does not always hold. These results highlight the importance for a neural network architect to be aware of platform-specific latency and throughput behavior in order to derive sensible configuration decisions for a custom CNN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE Intelligent Vehicles Symposium (IV)

自引率

0.00%

发文量