Flydeling:通过系统识别实现cnn硬件加速的流线型性能模型

IF 0.7 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Walther Carballo-Hernández, M. Pelcat, S. Bhattacharyya, R. C. Galán, F. Berry
{"title":"Flydeling:通过系统识别实现cnn硬件加速的流线型性能模型","authors":"Walther Carballo-Hernández, M. Pelcat, S. Bhattacharyya, R. C. Galán, F. Berry","doi":"10.1145/3594870","DOIUrl":null,"url":null,"abstract":"The introduction of deep learning algorithms, such as Convolutional Neural Networks (CNNs) in many near-sensor embedded systems, opens new challenges in terms of energy efficiency and hardware performance. An emerging solution to address these challenges is to use tailored heterogeneous hardware accelerators combining processing elements of different architectural natures such as Central Processing Unit (CPU), Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC). To progress towards heterogeneity, a great asset would be an automated design space exploration tool that chooses, for each accelerated partition of a CNN, the most appropriate architecture considering available resources. To feed such a design space exploration process, models are required that provide very fast yet precise evaluations of alternative architectures or alternative forms of CNNs. Quick configuration estimation could be achieved with few parameters from representative input sequences. This article studies a solution called flydeling (as a contraction of flyweight modeling) for obtaining these models by inspiring from the black-box System Identification (SI) domain. We refer to models derived using the proposed approach as flyweight models (flydels). A methodology is proposed to generate these flydels, using CNN properties as predictor features together with SI techniques with a stochastic excitation input at a feature map dimensions level. For an embedded CPU-FPGA-GPU heterogeneous platform, it is demonstrated that it is possible to learn these Key Performance Indicators (KPIs) flydels at an early design stage and from high-level application features. For latency, energy, and resource utilization, flydels obtain estimation errors varying between 5% and 10% with less model parameters compared to state-of-the-art solutions and are built automatically from platform measurements.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"8 1","pages":"1 - 33"},"PeriodicalIF":0.7000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Flydeling: Streamlined Performance Models for Hardware Acceleration of CNNs through System Identification\",\"authors\":\"Walther Carballo-Hernández, M. Pelcat, S. Bhattacharyya, R. C. Galán, F. Berry\",\"doi\":\"10.1145/3594870\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The introduction of deep learning algorithms, such as Convolutional Neural Networks (CNNs) in many near-sensor embedded systems, opens new challenges in terms of energy efficiency and hardware performance. An emerging solution to address these challenges is to use tailored heterogeneous hardware accelerators combining processing elements of different architectural natures such as Central Processing Unit (CPU), Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC). To progress towards heterogeneity, a great asset would be an automated design space exploration tool that chooses, for each accelerated partition of a CNN, the most appropriate architecture considering available resources. To feed such a design space exploration process, models are required that provide very fast yet precise evaluations of alternative architectures or alternative forms of CNNs. Quick configuration estimation could be achieved with few parameters from representative input sequences. This article studies a solution called flydeling (as a contraction of flyweight modeling) for obtaining these models by inspiring from the black-box System Identification (SI) domain. We refer to models derived using the proposed approach as flyweight models (flydels). A methodology is proposed to generate these flydels, using CNN properties as predictor features together with SI techniques with a stochastic excitation input at a feature map dimensions level. For an embedded CPU-FPGA-GPU heterogeneous platform, it is demonstrated that it is possible to learn these Key Performance Indicators (KPIs) flydels at an early design stage and from high-level application features. For latency, energy, and resource utilization, flydels obtain estimation errors varying between 5% and 10% with less model parameters compared to state-of-the-art solutions and are built automatically from platform measurements.\",\"PeriodicalId\":56350,\"journal\":{\"name\":\"ACM Transactions on Modeling and Performance Evaluation of Computing Systems\",\"volume\":\"8 1\",\"pages\":\"1 - 33\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Modeling and Performance Evaluation of Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3594870\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3594870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

摘要

深度学习算法的引入,如卷积神经网络(cnn)在许多近传感器嵌入式系统中,在能效和硬件性能方面带来了新的挑战。应对这些挑战的一个新兴解决方案是使用定制的异构硬件加速器,结合不同架构性质的处理元素,如中央处理单元(CPU)、图形处理单元(GPU)、现场可编程门阵列(FPGA)或专用集成电路(ASIC)。为了向异构方向发展,自动化的设计空间探索工具将是一个重要的资产,它可以为CNN的每个加速分区选择考虑可用资源的最合适的架构。为了满足这样的设计空间探索过程,需要模型提供非常快速而精确的替代架构或替代形式的cnn评估。利用代表性输入序列的少量参数,可以实现快速的组态估计。本文研究了从黑匣子系统识别(SI)领域获得这些模型的一种称为flydeling (flyweight modeling的缩写)的解决方案。我们将使用所提出的方法导出的模型称为flyweight模型(flydels)。提出了一种方法来生成这些飞行模型,使用CNN属性作为预测特征,并在特征映射维度水平上使用随机激励输入的SI技术。对于嵌入式CPU-FPGA-GPU异构平台,证明了在早期设计阶段和从高级应用功能中学习这些关键性能指标(kpi)的可能性。对于延迟、能源和资源利用率,与最先进的解决方案相比,flydels的模型参数更少,估计误差在5%到10%之间,并且是根据平台测量自动构建的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Flydeling: Streamlined Performance Models for Hardware Acceleration of CNNs through System Identification
The introduction of deep learning algorithms, such as Convolutional Neural Networks (CNNs) in many near-sensor embedded systems, opens new challenges in terms of energy efficiency and hardware performance. An emerging solution to address these challenges is to use tailored heterogeneous hardware accelerators combining processing elements of different architectural natures such as Central Processing Unit (CPU), Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC). To progress towards heterogeneity, a great asset would be an automated design space exploration tool that chooses, for each accelerated partition of a CNN, the most appropriate architecture considering available resources. To feed such a design space exploration process, models are required that provide very fast yet precise evaluations of alternative architectures or alternative forms of CNNs. Quick configuration estimation could be achieved with few parameters from representative input sequences. This article studies a solution called flydeling (as a contraction of flyweight modeling) for obtaining these models by inspiring from the black-box System Identification (SI) domain. We refer to models derived using the proposed approach as flyweight models (flydels). A methodology is proposed to generate these flydels, using CNN properties as predictor features together with SI techniques with a stochastic excitation input at a feature map dimensions level. For an embedded CPU-FPGA-GPU heterogeneous platform, it is demonstrated that it is possible to learn these Key Performance Indicators (KPIs) flydels at an early design stage and from high-level application features. For latency, energy, and resource utilization, flydels obtain estimation errors varying between 5% and 10% with less model parameters compared to state-of-the-art solutions and are built automatically from platform measurements.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.10
自引率
0.00%
发文量
9
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信