2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献

筛选
英文 中文
ARBench: Augmented Reality Benchmark For Mobile Devices ARBench:移动设备的增强现实基准
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00035
Sofiane Chetoui, Rahul Shahi, Seif Abdelaziz, Abhinav Golas, Farrukh Hijaz, S. Reda
{"title":"ARBench: Augmented Reality Benchmark For Mobile Devices","authors":"Sofiane Chetoui, Rahul Shahi, Seif Abdelaziz, Abhinav Golas, Farrukh Hijaz, S. Reda","doi":"10.1109/ISPASS55109.2022.00035","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00035","url":null,"abstract":"This paper takes an important step towards the improvement of the AR mobile experience by designing and developing ARBench, the first Augmented Reality (AR) benchmark for mobile devices. ARBench incorporates different AR workloads that stress multiple hardware units of the SoC (CPU, GPU, DSP, etc), and measures the individual score for each AR workload. The proposed benchmark suite is then used to evaluate the AR performance of various commercial mobile devices, and their ability to support various functions of AR workloads.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114530114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SGXGauge: A Comprehensive Benchmark Suite for Intel SGX SGXGauge:针对Intel SGX的综合基准测试套件
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00014
Sandeep Kumar, Abhisek Panda, S. Sarangi
{"title":"SGXGauge: A Comprehensive Benchmark Suite for Intel SGX","authors":"Sandeep Kumar, Abhisek Panda, S. Sarangi","doi":"10.1109/ISPASS55109.2022.00014","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00014","url":null,"abstract":"Trusted execution environments (TEEs) such as Intel SGX facilitate the secure execution of an application on untrusted machines. A plethora of work focuses on improving the performance of such environments necessitating the need for a standard, widely accepted benchmark suite. We present SGXGauge, a benchmark suite for SGX containing a diverse set of workloads from different domains. We also thoroughly characterize the behavior of the benchmark suite on a native platform and on a platform that uses a library OS-based shim layer (GrapheneSGX).","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124317965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Characterization of MPC-based Private Inference for Transformer-based Models 基于mpc的变压器模型私有推理的表征
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00025
Yongqin Wang, G. Suh, Wenjie Xiong, Benjamin Lefaudeux, Brian Knott, M. Annavaram, Hsien-Hsin S. Lee
{"title":"Characterization of MPC-based Private Inference for Transformer-based Models","authors":"Yongqin Wang, G. Suh, Wenjie Xiong, Benjamin Lefaudeux, Brian Knott, M. Annavaram, Hsien-Hsin S. Lee","doi":"10.1109/ISPASS55109.2022.00025","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00025","url":null,"abstract":"In this work, we provide an in-depth characterization study of the performance overhead for running Transformer models with secure multi-party computation (MPC). MPC is a cryptographic framework for protecting both the model and input data privacy in the presence of untrusted compute nodes. Our characterization study shows that Transformers introduce several performance challenges for MPC-based private machine learning inference. First, Transformers rely extensively on “softmax” functions. While softmax functions are relatively cheap in a non-private execution, softmax dominates the MPC inference runtime, consuming up to 50% of the total inference runtime. Further investigation shows that computing the maximum, needed for providing numerical stability to softmax, is a key culprit for the increase in latency. Second, MPC relies on approximating non-linear functions that are part of the softmax computations, and the narrow dynamic ranges make optimizing softmax while maintaining accuracy quite difficult. Finally, unlike CNNs, Transformer-based NLP models use large embedding tables to convert input words into embedding vectors. Accesses to these embedding tables can disclose inputs; hence, additional obfuscation for embedding access patterns is required for guaranteeing the input privacy. One approach to hide address accesses is to convert an embedding table lookup into a matrix multiplication. However, this naive approach increases MPC inference runtime significantly. We then apply tensor-train (TT) decomposition, a lossy compression technique for representing embedding tables, and evaluate its performance on embedding lookups. We show the trade-off between performance improvements and the corresponding impact on model accuracy using detailed experiments.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126676864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design 学习硬件加速器设计的连续可重构潜在空间
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00041
Qijing Huang, Charles Hong, J. Wawrzynek, Mahesh Subedar, Y. Shao
{"title":"Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design","authors":"Qijing Huang, Charles Hong, J. Wawrzynek, Mahesh Subedar, Y. Shao","doi":"10.1109/ispass55109.2022.00041","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00041","url":null,"abstract":"The hardware design space is high-dimensional and discrete. Systematic and efficient exploration of this space has been a significant challenge. Central to this problem is the intractable search complexity that grows exponentially with the design choices and the discrete nature of the search space. This work investigates the feasibility of learning a meaningful low-dimensional continuous representation for hardware designs to reduce such complexity and facilitate the search process. We devise a variational autoencoder (VAE)-based design space exploration framework called VAESA, to encode the hardware design space in a compact and continuous representation. We show that black-box and gradient-based design space exploration algorithms can be applied to the latent space, and design points optimized in the latent space can be reconstructed to high-performance realistic hardware designs. Our experiments show that performing the design space search on the latent space consistently leads to the optimal design point under a fixed number of samples. In addition, the latent space can improve the sample efficiency of the original algorithm by 6.8$times$ and can discover hardware designs that are up to 5% more efficient than the optimal design searched directly in the high-dimensional input space.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126203320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
VIPP: Validation-Included Precision-Parametric N-Body Benchmark Suite VIPP:包含验证的精度参数n体基准套件
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00021
Sh. Sato, Kota Iizuka, N. Yoshifuji, Masaki Natsume
{"title":"VIPP: Validation-Included Precision-Parametric N-Body Benchmark Suite","authors":"Sh. Sato, Kota Iizuka, N. Yoshifuji, Masaki Natsume","doi":"10.1109/ISPASS55109.2022.00021","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00021","url":null,"abstract":"Many efforts have recently been made to analyze and validate floating-point errors, particularly in mixed-precision arithmetic. However, real-world applications in approximate computing typically incorporate both model-level approximation and arithmetic-level precision. It is crucial to analyze the combined effects of both precision parameters to the extent valid in terms of approximate algorithms. In this work, we develop a benchmark suite of the practical approximate solvers of various N-body problems that parameterize both N-body approximation and arithmetic precision. It involves precision criteria to prevent us from unrestricted reduced precision and serves as a testbed to analyze the combined effects of model-level approximation and arithmetic-level reduced precision. It would help the design of precision control in approximate computing.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124264986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scale-Model Architectural Simulation 比例模型建筑模拟
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00006
Wenjie Liu, W. Heirman, Stijn Eyerman, Shoaib Akram, L. Eeckhout
{"title":"Scale-Model Architectural Simulation","authors":"Wenjie Liu, W. Heirman, Stijn Eyerman, Shoaib Akram, L. Eeckhout","doi":"10.1109/ISPASS55109.2022.00006","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00006","url":null,"abstract":"Computer architects extensively use simulation to steer future processor research and development. Simulating large-scale multicore processors is extremely time-consuming and is sometimes impossible because of simulation infrastructure constraints and/or simulation host compute and memory limitations. This paper proposes scale-model simulation, a novel methodology to predict large-scale multicore system performance. Scale-model simulation first constructs and simulates a scale model of the target system with reduced core count and shared resources. Target system performance is then predicted through machine-learning (ML) based extrapolation. Scale-model simulation predicts 32-core target system performance based on a single-core scale model with an average error of 8.0% and 15.8% for homogeneous and heterogeneous multiprogram workloads, respectively, while yielding a $28times$ simulation speedup.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134061406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PCMCsim: An Accurate Phase-Change Memory Controller Simulator and its Performance Analysis PCMCsim:一个精确的相变存储器控制器模拟器及其性能分析
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00043
Hyokeun Lee, Hyungsuk Kim, Seokbo Shim, Seungyong Lee, Dosun Hong, Hyuk-Jae Lee, Hyun Kim
{"title":"PCMCsim: An Accurate Phase-Change Memory Controller Simulator and its Performance Analysis","authors":"Hyokeun Lee, Hyungsuk Kim, Seokbo Shim, Seungyong Lee, Dosun Hong, Hyuk-Jae Lee, Hyun Kim","doi":"10.1109/ISPASS55109.2022.00043","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00043","url":null,"abstract":"With the growing demand for technology scaling and storage capacity in data centers, phase-change memory (PCM) has garnered attention as a next-generation nonvolatile memory (NVM). However, an accurate simulator that includes the necessary hardware features for PCM is not available, lagging behind current PCM technology. In this study, a functional and cycle-accurate PCM controller simulator, called PCMCsim, is presented to revitalize the related research. The proposed simulator incorporates necessary features for current PCM products and the latest DDR5 specifications. Based on rigorous performance analysis, this study characterizes bottlenecks of the PCM subsystem by sweeping hardware parameters, providing important takeaway messages to designers. Furthermore, the latency is significantly reduced by introducing a dedicated prefetcher into the address translation module. The proposed simulator is validated against a command trace made by a PCM product developer. We release our simulator as open-source software, except for industry-confidential features.11https://github.com/harrylee365/pcmcsim-pub1ic","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133046503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MARTA: Multi-configuration Assembly pRofiler and Toolkit for performance Analysis 用于性能分析的多配置汇编分析器和工具包
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00008
Marcos Horro, L. Pouchet, Gabriel Rodríguez, J. Touriño
{"title":"MARTA: Multi-configuration Assembly pRofiler and Toolkit for performance Analysis","authors":"Marcos Horro, L. Pouchet, Gabriel Rodríguez, J. Touriño","doi":"10.1109/ISPASS55109.2022.00008","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00008","url":null,"abstract":"Benchmarking to characterize specific software or hardware features is an error-prone, arduous and repetitive task. Designing a specialized experimental setup frequently requires writing new scripts or ad-hoc programs in order to properly exhibit interesting performance effects, using code changes and hardware events measurements. These artifacts may have limited reusability for subsequent experiments, since they are dependent on specific problems and, in some cases, platforms. To improve productivity and reproducibility of such experiments, which are often investigative in nature, we introduce MARTA: a fully customizable toolkit that aims to increase productivity by generating benchmark templates, compiling them, and profiling the regions of interest (RoI) specified using hardware events, and performing static code analysis. MARTA can also be applied on existing code regions of interest, it only requires to write a simple configuration file. In an orthogonal dimension, the system is able to run various statistical analyses on the measurements collected. MARTA uses data mining and machine learning or AI-based techniques for classification and regression, automatically extracting the features of the experimental setup which have the most impact on performance or whichever other metric of interest, given a large set of experiments and dimensions to consider. These post-processing tasks are valuable for deriving knowledge from experiments and are not included in most profiling tools. We also provide a set of cases of study to illustrate the ability of MARTA to conveniently create a reliable and reproducible setup for high-performance computing experiments, investigating three vastly different performance effects on modern processors.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123028715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MEGsim: A Novel Methodology for Efficient Simulation of Graphics Workloads in GPUs MEGsim: gpu中图形工作负载高效仿真的新方法
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00007
Jorge L. Ortiz, David Corbalán-Navarro, Juan L. Aragón, Antonio González
{"title":"MEGsim: A Novel Methodology for Efficient Simulation of Graphics Workloads in GPUs","authors":"Jorge L. Ortiz, David Corbalán-Navarro, Juan L. Aragón, Antonio González","doi":"10.1109/ispass55109.2022.00007","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00007","url":null,"abstract":"An important drawback of cycle-accurate microarchitectural simulators is that they are several orders of magnitude slower than the system they model. This becomes an important issue when simulations have to be repeated multiple times sweeping over the desired design space. In the specific context of graphics workloads, performing cycle-accurate simulations are even more demanding due to the high number of triangles that have to be shaded, lighted and textured to compose a single frame. As a result, simulating a few minutes of a video game sequence is extremely time-consuming.In this paper, we make the observation that collecting information about the vertices and primitives that are processed, along with the times that shader programs are invoked, allows us to characterize the activity performed on a given frame. Based on that, we propose a novel methodology for the efficient simulation of graphics workloads called MEGsim, an approach that is capable of accurately characterizing entire video sequences by using a small subset of selected frames which substantially drops the simulation time. For a set of popular Android games, we show that MEGsim achieves an average simulation speedup of 126×, achieving remarkably accurate results for the estimated final statistics, e.g., with average relative errors of just 0.84% for the total number of cycles, 0.99% for the number of DRAM accesses, 1.2% for the number of L2 cache accesses, and 0.86% for the number of L1 (tile cache) accesses.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131978566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XFeatur: Hardware Feature Extraction for DNN Auto-tuning xfeature:用于DNN自动调优的硬件特征提取
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00013
J. Acosta, Andreas Diavastos, Antonio González
{"title":"XFeatur: Hardware Feature Extraction for DNN Auto-tuning","authors":"J. Acosta, Andreas Diavastos, Antonio González","doi":"10.1109/ispass55109.2022.00013","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00013","url":null,"abstract":"In this work, we extend the auto-tuning process of the state-of-the-art TVM framework with XFeatur; a tool that extracts new meaningful hardware-related features that improve the quality of the representation of the search space and consequently improve the accuracy of its prediction algorithm. These new features provide information about the amount of thread-level parallelism, shared memory usage, register usage, dynamic instruction count and memory access dependencies. Optimizing ResNet-18 with the proposed features improves the quality of the search space representation by 63% on average and a maximum of 2× for certain tasks, while it reduces the tuning time by 9% (approximately 1.1 hours) and produces configurations that have equal or better performance (up to 92.7%) than the baseline.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125035424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信