关键在于公关 -- 利用性能代表对人工智能加速器进行智能基准测试

Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann
{"title":"关键在于公关 -- 利用性能代表对人工智能加速器进行智能基准测试","authors":"Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann","doi":"arxiv-2406.08330","DOIUrl":null,"url":null,"abstract":"Statistical models are widely used to estimate the performance of commercial\noff-the-shelf (COTS) AI hardware accelerators. However, training of statistical\nperformance models often requires vast amounts of data, leading to a\nsignificant time investment and can be difficult in case of limited hardware\navailability. To alleviate this problem, we propose a novel performance\nmodeling methodology that significantly reduces the number of training samples\nwhile maintaining good accuracy. Our approach leverages knowledge of the target\nhardware architecture and initial parameter sweeps to identify a set of\nPerformance Representatives (PR) for deep neural network (DNN) layers. These\nPRs are then used for benchmarking, building a statistical performance model,\nand making estimations. This targeted approach drastically reduces the number\nof training samples needed, opposed to random sampling, to achieve a better\nestimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as\nlow as 0.02% for single-layer estimations and 0.68% for whole DNN estimations\nwith less than 10000 training samples. The results demonstrate the superiority\nof our method for single-layer estimations compared to models trained with\nrandomly sampled datasets of the same size.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives\",\"authors\":\"Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann\",\"doi\":\"arxiv-2406.08330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Statistical models are widely used to estimate the performance of commercial\\noff-the-shelf (COTS) AI hardware accelerators. However, training of statistical\\nperformance models often requires vast amounts of data, leading to a\\nsignificant time investment and can be difficult in case of limited hardware\\navailability. To alleviate this problem, we propose a novel performance\\nmodeling methodology that significantly reduces the number of training samples\\nwhile maintaining good accuracy. Our approach leverages knowledge of the target\\nhardware architecture and initial parameter sweeps to identify a set of\\nPerformance Representatives (PR) for deep neural network (DNN) layers. These\\nPRs are then used for benchmarking, building a statistical performance model,\\nand making estimations. This targeted approach drastically reduces the number\\nof training samples needed, opposed to random sampling, to achieve a better\\nestimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as\\nlow as 0.02% for single-layer estimations and 0.68% for whole DNN estimations\\nwith less than 10000 training samples. The results demonstrate the superiority\\nof our method for single-layer estimations compared to models trained with\\nrandomly sampled datasets of the same size.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.08330\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.08330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

统计模型被广泛用于估算商用现成(COTS)人工智能硬件加速器的性能。然而,统计性能模型的训练往往需要大量数据,从而导致大量时间投入,而且在硬件可用性有限的情况下也很困难。为了缓解这一问题,我们提出了一种新颖的性能建模方法,它能在保持良好准确性的同时大幅减少训练样本的数量。我们的方法利用目标硬件架构和初始参数扫描知识,为深度神经网络(DNN)层确定一组性能代表(PR)。然后,这些性能代表将用于基准测试、建立统计性能模型和进行估算。与随机抽样相比,这种有针对性的方法大大减少了所需的训练样本数量,从而达到更好的估计精度。单层估计的平均绝对百分比误差 (MAPE) 低至 0.02%,整个 DNN 估计的平均绝对百分比误差 (MAPE) 低至 0.68%,训练样本少于 10000 个。结果表明,与使用相同大小的随机抽样数据集训练的模型相比,我们的方法在单层估计方面更具优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives
Statistical models are widely used to estimate the performance of commercial off-the-shelf (COTS) AI hardware accelerators. However, training of statistical performance models often requires vast amounts of data, leading to a significant time investment and can be difficult in case of limited hardware availability. To alleviate this problem, we propose a novel performance modeling methodology that significantly reduces the number of training samples while maintaining good accuracy. Our approach leverages knowledge of the target hardware architecture and initial parameter sweeps to identify a set of Performance Representatives (PR) for deep neural network (DNN) layers. These PRs are then used for benchmarking, building a statistical performance model, and making estimations. This targeted approach drastically reduces the number of training samples needed, opposed to random sampling, to achieve a better estimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as low as 0.02% for single-layer estimations and 0.68% for whole DNN estimations with less than 10000 training samples. The results demonstrate the superiority of our method for single-layer estimations compared to models trained with randomly sampled datasets of the same size.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信