2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献

筛选
英文 中文
Bifrost: End-to-End Evaluation and optimization of Reconfigurable DNN Accelerators Bifrost:可重构DNN加速器的端到端评估和优化
Axel Stjerngren, Perry Gibson, José Cano
{"title":"Bifrost: End-to-End Evaluation and optimization of Reconfigurable DNN Accelerators","authors":"Axel Stjerngren, Perry Gibson, José Cano","doi":"10.48550/arXiv.2204.12418","DOIUrl":"https://doi.org/10.48550/arXiv.2204.12418","url":null,"abstract":"Reconfigurable accelerators for deep neural networks (DNNs) promise to improve performance such as inference latency. STONNE is the first cycle-accurate simulator for reconfigurable DNN inference accelerators which allows for the exploration of accelerator designs and configuration space. However, preparing models for evaluation and exploring configuration space in STONNE is a manual developer-time-consuming process, which is a barrier for research. This paper introduces Bifrost, an end-to-end framework for the evaluation and optimization of reconfigurable DNN inference accelerators. Bifrost operates as a frontend for STONNE and leverages the TVM deep learning compiler stack to parse models and automate offloading of accelerated computations. We discuss Bifrost’s advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost. Additionally, Bifrost introduces a module leveraging AutoTVM to efficiently explore accelerator designs and datatlow mapping space to optimize performance. This is demonstrated by tuning the MAERI architecture and generating efficient datatlow mappings for AlexNet, obtaining an average speedup of $50times$ for the convolutional layers and $11times$ for the fully connected layers. Our code is available at www.github.com/gicLAB/bifrost.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133243524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices 边缘设备上的基准测试时间无监督深度神经网络自适应
K. Bhardwaj, James Diffenderfer, B. Kailkhura, M. Gokhale
{"title":"Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices","authors":"K. Bhardwaj, James Diffenderfer, B. Kailkhura, M. Gokhale","doi":"10.48550/arXiv.2203.11295","DOIUrl":"https://doi.org/10.48550/arXiv.2203.11295","url":null,"abstract":"The prediction accuracy of deep neural networks (DNNs) after deployment at the edge can suffer with time due to shifts in the distribution of the new data. To improve robustness of DNNs, they must be able to update themselves. However, DNN adaptation at the edge is challenging due to lack of resources. Recently, lightweight prediction-time unsupervised DNN adaptation techniques have been introduced that improve prediction accuracy of the models for noisy data by re-tuning the batch normalization parameters. This paper performs a comprehensive measurement study of such techniques to quantify their performance and energy on various edge devices as well as find bottlenecks and propose optimization opportunities.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122390113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distilling the Real Cost of Production Garbage Collectors 提炼生产垃圾收集器的实际成本
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2021-12-15 DOI: 10.1109/ISPASS55109.2022.00005
Zixian Cai
{"title":"Distilling the Real Cost of Production Garbage Collectors","authors":"Zixian Cai","doi":"10.1109/ISPASS55109.2022.00005","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00005","url":null,"abstract":"Despite the long history of garbage collection (GC) and its prevalence in modern programming languages, there is surprisingly little clarity about its true cost. Without understanding their cost, crucial tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to misguided design constraints and evaluation criteria used by GC researchers and users, hindering the development of high-performance, low-cost GCs. In this paper, we develop a methodology that allows us to empirically estimate the cost of GC for any given set of metrics. This fundamental quantification has eluded the research community, even when using modern, well-established methodologies. By distilling out the explicitly identifiable GC cost, we estimate the intrinsic application execution cost using different GCs. The minimum distilled cost forms a baseline. Subtracting this baseline from the total execution costs, we can then place an empirical lower bound on the absolute costs of different GCs. Using this methodology, we study five production GCs in OpenJDK 17, a high-performance Java runtime. We measure the cost of these collectors, and expose their respective key performance tradeoffs. We find that with a modestly sized heap, production GCs incur substantial overheads across a diverse suite of modern benchmarks, spending at least 7-82% more wall-clock time and 6-92% more CPU cycles relative to the baseline cost. We show that these costs can be masked by concurrency and generous provisioning of memory/compute. In addition, we find that newer low-pause GCs are significantly more expensive than older GCs, and, surprisingly, sometimes deliver worse application latency than stop-the-world GCs. Our findings reaffirm that GC is by no means a solved problem and that a low-cost, low-latency GC remains elusive. We recommend adopting the distillation methodology together with a wider range of cost metrics for future GC evaluations. This will not only help the community more comprehensively understand the performance characteristics of different GCs, but also reveal opportunities for future GC optimizations.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122474665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Meterstick: Benchmarking Performance Variability in Cloud and Self-hosted Minecraft-like Games Meterstick:在云和自托管的minecraft类游戏中测试性能可变性
Jerrit Eickhoff, Jesse Donkervliet, A. Iosup
{"title":"Meterstick: Benchmarking Performance Variability in Cloud and Self-hosted Minecraft-like Games","authors":"Jerrit Eickhoff, Jesse Donkervliet, A. Iosup","doi":"10.1145/3578244.3583724","DOIUrl":"https://doi.org/10.1145/3578244.3583724","url":null,"abstract":"One of the most popular types of online games is the Minecraft-like Game (MLG), in which players can terraform the environment. MLGs currently support their many players by replicating isolated instances with limited scalability. We posit that performance variability is a key cause for the lack of scalability in MLGs and design the first benchmark that focuses on MLG performance variability, identifying specialized workloads, metrics, and processes. We conduct real-world benchmarking of MLGs, both cloud-based and self-hosted. We find environment-based workloads and cloud deployment are significant sources of performance variability: peak-latency degrades sharply to 20.7 times the arithmetic mean, and exceeds by a factor of 7.4 the performance requirements.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132184688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信