It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives

arXiv - CS - Performance Pub Date : 2024-06-12 DOI:arxiv-2406.08330

Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann

{"title":"It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives","authors":"Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann","doi":"arxiv-2406.08330","DOIUrl":null,"url":null,"abstract":"Statistical models are widely used to estimate the performance of commercial\noff-the-shelf (COTS) AI hardware accelerators. However, training of statistical\nperformance models often requires vast amounts of data, leading to a\nsignificant time investment and can be difficult in case of limited hardware\navailability. To alleviate this problem, we propose a novel performance\nmodeling methodology that significantly reduces the number of training samples\nwhile maintaining good accuracy. Our approach leverages knowledge of the target\nhardware architecture and initial parameter sweeps to identify a set of\nPerformance Representatives (PR) for deep neural network (DNN) layers. These\nPRs are then used for benchmarking, building a statistical performance model,\nand making estimations. This targeted approach drastically reduces the number\nof training samples needed, opposed to random sampling, to achieve a better\nestimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as\nlow as 0.02% for single-layer estimations and 0.68% for whole DNN estimations\nwith less than 10000 training samples. The results demonstrate the superiority\nof our method for single-layer estimations compared to models trained with\nrandomly sampled datasets of the same size.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.08330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Statistical models are widely used to estimate the performance of commercial off-the-shelf (COTS) AI hardware accelerators. However, training of statistical performance models often requires vast amounts of data, leading to a significant time investment and can be difficult in case of limited hardware availability. To alleviate this problem, we propose a novel performance modeling methodology that significantly reduces the number of training samples while maintaining good accuracy. Our approach leverages knowledge of the target hardware architecture and initial parameter sweeps to identify a set of Performance Representatives (PR) for deep neural network (DNN) layers. These PRs are then used for benchmarking, building a statistical performance model, and making estimations. This targeted approach drastically reduces the number of training samples needed, opposed to random sampling, to achieve a better estimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as low as 0.02% for single-layer estimations and 0.68% for whole DNN estimations with less than 10000 training samples. The results demonstrate the superiority of our method for single-layer estimations compared to models trained with randomly sampled datasets of the same size.

查看原文本刊更多论文

关键在于公关 -- 利用性能代表对人工智能加速器进行智能基准测试

统计模型被广泛用于估算商用现成（COTS）人工智能硬件加速器的性能。然而，统计性能模型的训练往往需要大量数据，从而导致大量时间投入，而且在硬件可用性有限的情况下也很困难。为了缓解这一问题，我们提出了一种新颖的性能建模方法，它能在保持良好准确性的同时大幅减少训练样本的数量。我们的方法利用目标硬件架构和初始参数扫描知识，为深度神经网络（DNN）层确定一组性能代表（PR）。然后，这些性能代表将用于基准测试、建立统计性能模型和进行估算。与随机抽样相比，这种有针对性的方法大大减少了所需的训练样本数量，从而达到更好的估计精度。单层估计的平均绝对百分比误差 (MAPE) 低至 0.02%，整个 DNN 估计的平均绝对百分比误差 (MAPE) 低至 0.68%，训练样本少于 10000 个。结果表明，与使用相同大小的随机抽样数据集训练的模型相比，我们的方法在单层估计方面更具优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量