Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann
{"title":"关键在于公关 -- 利用性能代表对人工智能加速器进行智能基准测试","authors":"Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann","doi":"arxiv-2406.08330","DOIUrl":null,"url":null,"abstract":"Statistical models are widely used to estimate the performance of commercial\noff-the-shelf (COTS) AI hardware accelerators. However, training of statistical\nperformance models often requires vast amounts of data, leading to a\nsignificant time investment and can be difficult in case of limited hardware\navailability. To alleviate this problem, we propose a novel performance\nmodeling methodology that significantly reduces the number of training samples\nwhile maintaining good accuracy. Our approach leverages knowledge of the target\nhardware architecture and initial parameter sweeps to identify a set of\nPerformance Representatives (PR) for deep neural network (DNN) layers. These\nPRs are then used for benchmarking, building a statistical performance model,\nand making estimations. This targeted approach drastically reduces the number\nof training samples needed, opposed to random sampling, to achieve a better\nestimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as\nlow as 0.02% for single-layer estimations and 0.68% for whole DNN estimations\nwith less than 10000 training samples. The results demonstrate the superiority\nof our method for single-layer estimations compared to models trained with\nrandomly sampled datasets of the same size.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives\",\"authors\":\"Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann\",\"doi\":\"arxiv-2406.08330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Statistical models are widely used to estimate the performance of commercial\\noff-the-shelf (COTS) AI hardware accelerators. However, training of statistical\\nperformance models often requires vast amounts of data, leading to a\\nsignificant time investment and can be difficult in case of limited hardware\\navailability. To alleviate this problem, we propose a novel performance\\nmodeling methodology that significantly reduces the number of training samples\\nwhile maintaining good accuracy. Our approach leverages knowledge of the target\\nhardware architecture and initial parameter sweeps to identify a set of\\nPerformance Representatives (PR) for deep neural network (DNN) layers. These\\nPRs are then used for benchmarking, building a statistical performance model,\\nand making estimations. This targeted approach drastically reduces the number\\nof training samples needed, opposed to random sampling, to achieve a better\\nestimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as\\nlow as 0.02% for single-layer estimations and 0.68% for whole DNN estimations\\nwith less than 10000 training samples. The results demonstrate the superiority\\nof our method for single-layer estimations compared to models trained with\\nrandomly sampled datasets of the same size.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.08330\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.08330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives
Statistical models are widely used to estimate the performance of commercial
off-the-shelf (COTS) AI hardware accelerators. However, training of statistical
performance models often requires vast amounts of data, leading to a
significant time investment and can be difficult in case of limited hardware
availability. To alleviate this problem, we propose a novel performance
modeling methodology that significantly reduces the number of training samples
while maintaining good accuracy. Our approach leverages knowledge of the target
hardware architecture and initial parameter sweeps to identify a set of
Performance Representatives (PR) for deep neural network (DNN) layers. These
PRs are then used for benchmarking, building a statistical performance model,
and making estimations. This targeted approach drastically reduces the number
of training samples needed, opposed to random sampling, to achieve a better
estimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as
low as 0.02% for single-layer estimations and 0.68% for whole DNN estimations
with less than 10000 training samples. The results demonstrate the superiority
of our method for single-layer estimations compared to models trained with
randomly sampled datasets of the same size.