2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献_第2页

Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware 面向开源硬件可靠性分析的故障注入与跟踪框架

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00027

Shubham Nema, Justin Kirschner, Debpratim Adak, S. Agarwal, Ben Feinberg, Arun Rodrigues, M. Marinella, Amro Awad

{"title":"Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware","authors":"Shubham Nema, Justin Kirschner, Debpratim Adak, S. Agarwal, Ben Feinberg, Arun Rodrigues, M. Marinella, Amro Awad","doi":"10.1109/ISPASS55109.2022.00027","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00027","url":null,"abstract":"As transistors have been scaled over the past decade, modern systems have become increasingly susceptible to faults. Increased transistor densities and lower capacitances make a particle strike more likely to cause an upset. At the same time, complex computer systems are increasingly integrated into safety-critical systems such as autonomous vehicles. These two trends make the study of system reliability and fault tolerance essential for modern systems. To analyze and improve system reliability early in the design process, new tools are needed for RTL fault analysis.This paper proposes Eris, a novel framework to identify vulnerable components in hardware designs through fault-injection and fault propagation tracking. Eris builds on ESSENT—a fast C/C++ RTL simulation framework—to provide fault injection, fault tracking, and control-flow deviation detection capabilities for RTL designs. To demonstrate Eris’ capabilities, we analyze the reliability of the open source Rocket Chip SoC by randomly injecting faults during thousands of runs on four microbenchmarks. As part of this analysis we measure the sensitivity of different hardware structures to faults based on the likelihood of a random fault causing silent data corruption, unrecoverable data errors, program crashes, and program hangs. We detect control flow deviations and determine whether or not they are benign. Additionally, using Eris’ novel fault-tracking capabilities we are able to find 78% more vulnerable components in the same number of simulations compared to RTL-based fault injection techniques without these capabilities. We will release Eris as an open-source tool to aid future research into processor reliability and hardening.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121011913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pareto Rank Surrogate Model for Hardware-aware Neural Architecture Search 基于Pareto秩代理模型的硬件感知神经结构搜索

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00040

Hadjer Benmeziane, S. Niar, Hamza Ouarnoughi, Kaoutar El Maghraoui

{"title":"Pareto Rank Surrogate Model for Hardware-aware Neural Architecture Search","authors":"Hadjer Benmeziane, S. Niar, Hamza Ouarnoughi, Kaoutar El Maghraoui","doi":"10.1109/ISPASS55109.2022.00040","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00040","url":null,"abstract":"Hardware-aware Neural Architecture Search (HWNAS) has recently gained much attention by automating the design of efficient deep learning models with tiny resources and reduced inference time requirements. However, HW-NAS inherits and exacerbates the expensive computational complexity of general NAS due to its significantly increased search spaces and more complex NAS evaluation component. To speed up HWNAS, existing efforts use surrogate models to predict a neural architecture’s accuracy and hardware performance on a specific platform. Thereby reducing the expensive training process and significantly reducing search time. We show that using multiple surrogate models to estimate the different objectives does not achieve the true Pareto front. Therefore, we propose HW-PRNAS, a novel Pareto Rank-preserving surrogate model. HWPR-NAS training is based on a new loss function that ranks the architectures according to their Pareto front. We evaluate our approach on seven different hardware platforms, including ASIC, FPGA, GPU and multi-cores. Our results show that we can achieve up to 2. 5x speedup while achieving better Pareto-front results than state of the art surrogate models.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117055171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

RTRBench: A Benchmark Suite for Real-Time Robotics RTRBench:实时机器人的基准测试套件

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00024

Mohammad Bakhshalipour, M. Likhachev, Phillip B. Gibbons

引用次数: 5

gpuFI-4: A Microarchitecture-Level Framework for Assessing the Cross-Layer Resilience of Nvidia GPUs gpuFI-4:用于评估Nvidia gpu跨层弹性的微架构级框架

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00004

Dimitris Sartzetakis, G. Papadimitriou, D. Gizopoulos

{"title":"gpuFI-4: A Microarchitecture-Level Framework for Assessing the Cross-Layer Resilience of Nvidia GPUs","authors":"Dimitris Sartzetakis, G. Papadimitriou, D. Gizopoulos","doi":"10.1109/ISPASS55109.2022.00004","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00004","url":null,"abstract":"Pre-silicon reliability evaluation of processors is usually performed at the microarchitecture or at the software level. Recent studies on CPUs have, however, shown that software level approaches can mislead the soft error vulnerability assessment process and drive designers towards wrong error protection decisions. To avoid such pitfalls in the GPUs domain, the availability of microarchitecture level reliability assessment tools is of paramount importance. Although there are several publicly available frameworks for the reliability assessment of GPUs, they only operate at the software level, and do not consider the microarchitecture. This paper aims at accurate microarchitecture level GPU soft error vulnerability assessment. We introduce gpuFI-4: a detailed microarchitecture-level fault injection framework to assess the cross-layer vulnerability of hardware structures and entire GPU chips for single and multiple bit faults, built on top of the state-of-the-art simulator GPGPU-Sim 4.0. We employ gpuFI-4 for fault injection of soft errors on CUDA-enabled Nvidia GPU architectures. The target hardware structures that our framework analyzes are the register file, the shared memory, the LI data and texture caches and the L2 cache, altogether accounting for tens of MBs of on-chip GPU storage. We showcase the features of the tool reporting the vulnerability of three Nvidia GPU chip models: two different modem GPU architectures – RTX 2060 (Turing) and Quadro GV100 (Volta) – and an older generation – GTX Titan (Kepler), for both single-bit and triple-bit fault injections and for twelve different CUDA benchmarks that are simulated on the actual physical instruction set (SASS). Our experiments report the Architectural Vulnerability Factor (AVF) of the GPU chips (which can be only measured at the microarchitecture level) as well as their predicted Failures in Time (FIT) rate when technology information is incorporated in the assessment.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129923399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

High-Performance Deployment of Text Detection Model: Compression and Hardware Platform considerations 文本检测模型的高性能部署:压缩和硬件平台考虑

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00022

Nupur Sumeet, Karan Rawat, M. Nambiar

引用次数: 0

Understanding Data Compression in Warehouse-Scale Datacenter Services 理解仓库级数据中心服务中的数据压缩

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00028

Geonhwa Jeong, Bikash Sharma, Nick Terrell, A. Dhanotia, Zhiwei Zhao, Niket Agarwal, A. Kejariwal, T. Krishna

引用次数: 0

POSET-RL: Phase ordering for Optimizing Size and Execution Time using Reinforcement Learning post - rl:使用强化学习优化大小和执行时间的阶段排序

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00012

Shalini Jain, Yashas Andaluri, S. VenkataKeerthy, Ramakrishna Upadrasta

{"title":"POSET-RL: Phase ordering for Optimizing Size and Execution Time using Reinforcement Learning","authors":"Shalini Jain, Yashas Andaluri, S. VenkataKeerthy, Ramakrishna Upadrasta","doi":"10.1109/ISPASS55109.2022.00012","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00012","url":null,"abstract":"The ever increasing memory requirements of several applications has led to increased demands which might not be met by embedded devices. Constraining the usage of memory in such cases is of paramount importance. It is important that such code size improvements should not have a negative impact on the runtime. Improving the execution time while optimizing for code size is a non-trivial but a significant task.The ordering of standard optimization sequences in modern compilers is fixed, and are heuristically created by the compiler domain experts based on their expertise. However, this ordering is sub-optimal, and does not generalize well across all the cases.We present a reinforcement learning based solution to the phase ordering problem, where the ordering improves both the execution time and code size. We propose two different approaches to model the sequences: one by manual ordering, and other based on a graph called Oz Dependence Graph (ODG). Our approach uses minimal data as training set, and is integrated with LLVM.We show results on x86 and AArch64 architectures on the benchmarks from SPEC-CPU 2006, SPEC-CPU 2017 and MiBench. We observe that the proposed model based on ODG outperforms the current Oz sequence both in terms of size and execution time by 6.19% and 11.99% in SPEC 2017 benchmarks, on an average.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130053107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Advancing Near-Data Processing with Precise Exceptions and Efficient Data Fetching 用精确的异常和高效的数据获取推进近数据处理

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00031

S. Santos, T. R. Kepe, Francis B. Moreira, P. C. Santos, M. Alves

引用次数: 0

LoopIn: A Loop-Based Simulation Sampling Mechanism 环路:一种基于环路的模拟采样机制

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00029

Uday Kumar Reddy Vengalam, Anshujit Sharma, Michael C. Huang

{"title":"LoopIn: A Loop-Based Simulation Sampling Mechanism","authors":"Uday Kumar Reddy Vengalam, Anshujit Sharma, Michael C. Huang","doi":"10.1109/ispass55109.2022.00029","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00029","url":null,"abstract":"Understanding program behavior is at the heart of general-purpose architecture design. Whether we are testing a new design offline or making a design adapt to changing behavior online, a central assumption is that the test cases represent real workload in steady state. Typical computer programs have been known to exhibit patterns of runtime behavior that repeat during the course of their execution. Simulation and adaptation strategies all exploit this repetition to some extent. In this paper, we introduce a simple mechanism that is more explicit in identifying and exploiting behavior repetition at the granularity of (broadly defined) loops. The result is that a typical benchmark will be categorized into tens of loops. In terms of architectural simulations, this strategy will create a moderate number (on the orders of 100) of relatively short (tens of thousands of instructions) segments. There are two major benefits in our view. The first and more quantifiable benefit is that, the strategy requires less simulation and obtains increased accuracy compared to the commonly used SimPoint approach. Second, instead of depicting average statistics of an entire program, we can accurately describe intra-program behavior variation, which simple sampling strategies cannot. LoopIn produces many small simulation segments. In certain usage scenarios, microarchitectural state warm-up may be costly. In these cases, an existing tool BLRL can help create efficient warm-up arrangements.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"317 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123233085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Profiling Intel Graphics Architecture with Long Instruction Traces 剖析英特尔图形架构与长指令跟踪

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00001

Konstantin Levit-Gurevich, Alex Skaletsky, Michael Berezalsky, Yulia Kuznetcova, Hila Yakov

{"title":"Profiling Intel Graphics Architecture with Long Instruction Traces","authors":"Konstantin Levit-Gurevich, Alex Skaletsky, Michael Berezalsky, Yulia Kuznetcova, Hila Yakov","doi":"10.1109/ISPASS55109.2022.00001","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00001","url":null,"abstract":"In the process of developing software and hardware, profiling workloads is critical. Binary Instrumentation Technology plays a key role in this task for both x86 architecture and Intel Graphics Processing Units. The GTPin framework is the first tool that allows the profiling of graphics and compute kernels running on Intel GPUs. However, GTPin capabilities are less flexible than x86 profiling tools. In this paper, we introduce the concept of “gLIT” – Long Instruction Trace for Intel GPUs. Generated on real hardware, gLIT can be replayed on a simulator or an emulator running on the CPU device, and thus, can be easily profiled and analyzed “on the fly” with analysis tools of any complexity. Since the graphics devices are extremely parallel, the gLIT trace is, by definition, a multi-threaded trace, reflecting a kernel concurrently running hundreds of hardware threads. The ability to thoroughly profile and analyze workloads is critical for improving hardware and software readiness and creates new possibilities for academic research on Intel graphics devices.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"434 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126100647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0