2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献_第4页

ARBench: Augmented Reality Benchmark For Mobile Devices ARBench:移动设备的增强现实基准

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00035

Sofiane Chetoui, Rahul Shahi, Seif Abdelaziz, Abhinav Golas, Farrukh Hijaz, S. Reda

引用次数: 4

SGXGauge: A Comprehensive Benchmark Suite for Intel SGX SGXGauge:针对Intel SGX的综合基准测试套件

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00014

Sandeep Kumar, Abhisek Panda, S. Sarangi

引用次数: 4

Characterization of MPC-based Private Inference for Transformer-based Models 基于mpc的变压器模型私有推理的表征

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00025

Yongqin Wang, G. Suh, Wenjie Xiong, Benjamin Lefaudeux, Brian Knott, M. Annavaram, Hsien-Hsin S. Lee

{"title":"Characterization of MPC-based Private Inference for Transformer-based Models","authors":"Yongqin Wang, G. Suh, Wenjie Xiong, Benjamin Lefaudeux, Brian Knott, M. Annavaram, Hsien-Hsin S. Lee","doi":"10.1109/ISPASS55109.2022.00025","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00025","url":null,"abstract":"In this work, we provide an in-depth characterization study of the performance overhead for running Transformer models with secure multi-party computation (MPC). MPC is a cryptographic framework for protecting both the model and input data privacy in the presence of untrusted compute nodes. Our characterization study shows that Transformers introduce several performance challenges for MPC-based private machine learning inference. First, Transformers rely extensively on “softmax” functions. While softmax functions are relatively cheap in a non-private execution, softmax dominates the MPC inference runtime, consuming up to 50% of the total inference runtime. Further investigation shows that computing the maximum, needed for providing numerical stability to softmax, is a key culprit for the increase in latency. Second, MPC relies on approximating non-linear functions that are part of the softmax computations, and the narrow dynamic ranges make optimizing softmax while maintaining accuracy quite difficult. Finally, unlike CNNs, Transformer-based NLP models use large embedding tables to convert input words into embedding vectors. Accesses to these embedding tables can disclose inputs; hence, additional obfuscation for embedding access patterns is required for guaranteeing the input privacy. One approach to hide address accesses is to convert an embedding table lookup into a matrix multiplication. However, this naive approach increases MPC inference runtime significantly. We then apply tensor-train (TT) decomposition, a lossy compression technique for representing embedding tables, and evaluate its performance on embedding lookups. We show the trade-off between performance improvements and the corresponding impact on model accuracy using detailed experiments.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126676864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design 学习硬件加速器设计的连续可重构潜在空间

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00041

Qijing Huang, Charles Hong, J. Wawrzynek, Mahesh Subedar, Y. Shao

{"title":"Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design","authors":"Qijing Huang, Charles Hong, J. Wawrzynek, Mahesh Subedar, Y. Shao","doi":"10.1109/ispass55109.2022.00041","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00041","url":null,"abstract":"The hardware design space is high-dimensional and discrete. Systematic and efficient exploration of this space has been a significant challenge. Central to this problem is the intractable search complexity that grows exponentially with the design choices and the discrete nature of the search space. This work investigates the feasibility of learning a meaningful low-dimensional continuous representation for hardware designs to reduce such complexity and facilitate the search process. We devise a variational autoencoder (VAE)-based design space exploration framework called VAESA, to encode the hardware design space in a compact and continuous representation. We show that black-box and gradient-based design space exploration algorithms can be applied to the latent space, and design points optimized in the latent space can be reconstructed to high-performance realistic hardware designs. Our experiments show that performing the design space search on the latent space consistently leads to the optimal design point under a fixed number of samples. In addition, the latent space can improve the sample efficiency of the original algorithm by 6.8$times$ and can discover hardware designs that are up to 5% more efficient than the optimal design searched directly in the high-dimensional input space.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126203320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

VIPP: Validation-Included Precision-Parametric N-Body Benchmark Suite VIPP:包含验证的精度参数n体基准套件

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00021

Sh. Sato, Kota Iizuka, N. Yoshifuji, Masaki Natsume

引用次数: 0

Scale-Model Architectural Simulation 比例模型建筑模拟

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00006

Wenjie Liu, W. Heirman, Stijn Eyerman, Shoaib Akram, L. Eeckhout

引用次数: 0

PCMCsim: An Accurate Phase-Change Memory Controller Simulator and its Performance Analysis PCMCsim:一个精确的相变存储器控制器模拟器及其性能分析

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00043

Hyokeun Lee, Hyungsuk Kim, Seokbo Shim, Seungyong Lee, Dosun Hong, Hyuk-Jae Lee, Hyun Kim

{"title":"PCMCsim: An Accurate Phase-Change Memory Controller Simulator and its Performance Analysis","authors":"Hyokeun Lee, Hyungsuk Kim, Seokbo Shim, Seungyong Lee, Dosun Hong, Hyuk-Jae Lee, Hyun Kim","doi":"10.1109/ISPASS55109.2022.00043","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00043","url":null,"abstract":"With the growing demand for technology scaling and storage capacity in data centers, phase-change memory (PCM) has garnered attention as a next-generation nonvolatile memory (NVM). However, an accurate simulator that includes the necessary hardware features for PCM is not available, lagging behind current PCM technology. In this study, a functional and cycle-accurate PCM controller simulator, called PCMCsim, is presented to revitalize the related research. The proposed simulator incorporates necessary features for current PCM products and the latest DDR5 specifications. Based on rigorous performance analysis, this study characterizes bottlenecks of the PCM subsystem by sweeping hardware parameters, providing important takeaway messages to designers. Furthermore, the latency is significantly reduced by introducing a dedicated prefetcher into the address translation module. The proposed simulator is validated against a command trace made by a PCM product developer. We release our simulator as open-source software, except for industry-confidential features.11https://github.com/harrylee365/pcmcsim-pub1ic","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133046503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MARTA: Multi-configuration Assembly pRofiler and Toolkit for performance Analysis 用于性能分析的多配置汇编分析器和工具包

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00008

Marcos Horro, L. Pouchet, Gabriel Rodríguez, J. Touriño

{"title":"MARTA: Multi-configuration Assembly pRofiler and Toolkit for performance Analysis","authors":"Marcos Horro, L. Pouchet, Gabriel Rodríguez, J. Touriño","doi":"10.1109/ISPASS55109.2022.00008","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00008","url":null,"abstract":"Benchmarking to characterize specific software or hardware features is an error-prone, arduous and repetitive task. Designing a specialized experimental setup frequently requires writing new scripts or ad-hoc programs in order to properly exhibit interesting performance effects, using code changes and hardware events measurements. These artifacts may have limited reusability for subsequent experiments, since they are dependent on specific problems and, in some cases, platforms. To improve productivity and reproducibility of such experiments, which are often investigative in nature, we introduce MARTA: a fully customizable toolkit that aims to increase productivity by generating benchmark templates, compiling them, and profiling the regions of interest (RoI) specified using hardware events, and performing static code analysis. MARTA can also be applied on existing code regions of interest, it only requires to write a simple configuration file. In an orthogonal dimension, the system is able to run various statistical analyses on the measurements collected. MARTA uses data mining and machine learning or AI-based techniques for classification and regression, automatically extracting the features of the experimental setup which have the most impact on performance or whichever other metric of interest, given a large set of experiments and dimensions to consider. These post-processing tasks are valuable for deriving knowledge from experiments and are not included in most profiling tools. We also provide a set of cases of study to illustrate the ability of MARTA to conveniently create a reliable and reproducible setup for high-performance computing experiments, investigating three vastly different performance effects on modern processors.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123028715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MEGsim: A Novel Methodology for Efficient Simulation of Graphics Workloads in GPUs MEGsim: gpu中图形工作负载高效仿真的新方法

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00007

Jorge L. Ortiz, David Corbalán-Navarro, Juan L. Aragón, Antonio González

{"title":"MEGsim: A Novel Methodology for Efficient Simulation of Graphics Workloads in GPUs","authors":"Jorge L. Ortiz, David Corbalán-Navarro, Juan L. Aragón, Antonio González","doi":"10.1109/ispass55109.2022.00007","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00007","url":null,"abstract":"An important drawback of cycle-accurate microarchitectural simulators is that they are several orders of magnitude slower than the system they model. This becomes an important issue when simulations have to be repeated multiple times sweeping over the desired design space. In the specific context of graphics workloads, performing cycle-accurate simulations are even more demanding due to the high number of triangles that have to be shaded, lighted and textured to compose a single frame. As a result, simulating a few minutes of a video game sequence is extremely time-consuming.In this paper, we make the observation that collecting information about the vertices and primitives that are processed, along with the times that shader programs are invoked, allows us to characterize the activity performed on a given frame. Based on that, we propose a novel methodology for the efficient simulation of graphics workloads called MEGsim, an approach that is capable of accurately characterizing entire video sequences by using a small subset of selected frames which substantially drops the simulation time. For a set of popular Android games, we show that MEGsim achieves an average simulation speedup of 126×, achieving remarkably accurate results for the estimated final statistics, e.g., with average relative errors of just 0.84% for the total number of cycles, 0.99% for the number of DRAM accesses, 1.2% for the number of L2 cache accesses, and 0.86% for the number of L1 (tile cache) accesses.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131978566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

XFeatur: Hardware Feature Extraction for DNN Auto-tuning xfeature:用于DNN自动调优的硬件特征提取

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00013

J. Acosta, Andreas Diavastos, Antonio González

引用次数: 0