2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献

筛选
英文 中文
PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments 天堂-后摩尔架构和加速器设计空间探索使用设备级仿真和实验
Dilip P. Vasudevan, George Michelogiannakis, D. Donofrio, J. Shalf
{"title":"PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments","authors":"Dilip P. Vasudevan, George Michelogiannakis, D. Donofrio, J. Shalf","doi":"10.1109/ISPASS.2019.00022","DOIUrl":"https://doi.org/10.1109/ISPASS.2019.00022","url":null,"abstract":"An increasing number of technologies are being proposed to preserve digital computing performance scaling as lithographic scaling slows. These technologies include new devices, specialized architectures, memories, and 3D integration. Currently, no end-to-end tool flow is available to rapidly perform architectural-level evaluation using device-level models and for a variety of emerging technologies at once. We propose PARADISE: An open-source comprehensive methodology to evaluate emerging technologies with a vertical simulation flow from the individual device level all the way up to the architec-turallevel. To demonstrate its effectiveness, we use PARADISE to perform end-to-end simulation and analysis of heterogeneous architectures using CNFETs, TFETs, and NCFETs, along with multiple hardware designs. To demonstrate its accuracy, we show that PARADISE has only a 6% mean deviation for delay and 9% for power compared to previous studies using commercial synthesis tools.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"417 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116001230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Characterization of Unnecessary Computations in Web Applications Web应用程序中不必要计算的特征
Hossein Golestani, S. Mahlke, S. Narayanasamy
{"title":"Characterization of Unnecessary Computations in Web Applications","authors":"Hossein Golestani, S. Mahlke, S. Narayanasamy","doi":"10.1109/ISPASS.2019.00010","DOIUrl":"https://doi.org/10.1109/ISPASS.2019.00010","url":null,"abstract":"Web applications are widely used in many different daily activities-such as online shopping, navigation through maps, and social networking-in both desktop and mobile environments. Advances in technology, such as network connection, hardware platforms, and software design techniques, have empowered Web developers to design Web pages that are highly rich in content and engage users through an interactive experience. However, the performance of Web applications is not ideal today, and many users experience poor quality of service, including long page load times and irregular animations. One of the contributing factors to low performance is the very design of Web applications, particularly Web browsers. In this work, we argue that there are unnecessary computations in today's Web applications, which are completely or most likely wasted. We first describe the potential unnecessary computations at a high level, and then design a profiler based on dynamic backward program slicing that detects such computations. Our profiler reveals that for four different websites, only 45% of dynamically executed instructions are useful in rendering the main page, on average. We then analyze and categorize unnecessary computations. Our analysis shows that processing JavaScript codes is the most notable category of unnecessary computations, specifically during page loading. Therefore, such computations are either completely wasted or could be deferred to a later time, i.e., when they are actually needed, thereby providing higher performance and better energy efficiency.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129010058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
RPPM: Rapid Performance Prediction of Multithreaded Workloads on Multicore Processors 多核处理器上多线程工作负载的快速性能预测
S. D. Pestel, S. V. D. Steen, Shoaib Akram, L. Eeckhout
{"title":"RPPM: Rapid Performance Prediction of Multithreaded Workloads on Multicore Processors","authors":"S. D. Pestel, S. V. D. Steen, Shoaib Akram, L. Eeckhout","doi":"10.1109/ISPASS.2019.00038","DOIUrl":"https://doi.org/10.1109/ISPASS.2019.00038","url":null,"abstract":"Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors. This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128793770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ISPASS 2019 2019 IEEE系统与软件性能分析国际研讨会(ISPASS
Matthew Halpern
{"title":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ISPASS 2019","authors":"Matthew Halpern","doi":"10.1109/ispass.2019.00004","DOIUrl":"https://doi.org/10.1109/ispass.2019.00004","url":null,"abstract":"","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129046238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Timeloop: A Systematic Approach to DNN Accelerator Evaluation 时间循环:深度神经网络加速器评估的系统方法
A. Parashar, Priyanka Raina, Y. Shao, Yu-hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, S. Keckler, J. Emer
{"title":"Timeloop: A Systematic Approach to DNN Accelerator Evaluation","authors":"A. Parashar, Priyanka Raina, Y. Shao, Yu-hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, S. Keckler, J. Emer","doi":"10.1109/ISPASS.2019.00042","DOIUrl":"https://doi.org/10.1109/ISPASS.2019.00042","url":null,"abstract":"This paper presents Timeloop, an infrastructure for evaluating and exploring the architecture design space of deep neural network (DNN) accelerators. Timeloop uses a concise and unified representation of the key architecture and implementation attributes of DNN accelerators to describe a broad space of hardware topologies. It can then emulate those topologies to generate an accurate projection of performance and energy efficiency for a DNN workload through a mapper that finds the best way to schedule operations and stage data on the specified architecture. This enables fair comparisons across different architectures and makes DNN accelerator design more systematic. This paper describes Timeloop's underlying models and algorithms in detail and shows results from case studies enabled by Timeloop, which provide interesting insights into the current state of DNN architecture design. In particular, they reveal that dataflow and memory hierarchy co-design plays a critical role in optimizing energy efficiency. Also, there is currently still not a single architecture that achieves the best performance and energy efficiency across a diverse set of workloads due to flexibility and efficiency trade-offs. These results provide inspiration into possible directions for DNN accelerator research.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127013925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 316
A Model Driven Approach Towards Improving the Performance of Apache Spark Applications 改进Apache Spark应用程序性能的模型驱动方法
Kewen Wang, Mohammad Maifi Hasan Khan, Nhan Nguyen, S. Gokhale
{"title":"A Model Driven Approach Towards Improving the Performance of Apache Spark Applications","authors":"Kewen Wang, Mohammad Maifi Hasan Khan, Nhan Nguyen, S. Gokhale","doi":"10.1109/ISPASS.2019.00036","DOIUrl":"https://doi.org/10.1109/ISPASS.2019.00036","url":null,"abstract":"Apache Spark applications often execute in multiple stages where each stage consists of multiple tasks running in parallel. However, prior efforts noted that the execution time of different tasks within a stage can vary significantly for various reasons (e.g., inefficient partition of input data), and tasks can be distributed unevenly across worker nodes for different reasons (e.g., data co-locality). While these problems are well-known, it is nontrivial to predict and address them effectively. In this paper we present an analytical model driven approach that can predict the possibility of such problems by executing an application with a limited amount of input data and recommend ways to address the identified problems by repartitioning input data (in case of task straggler problem) and/or changing the locality configuration setting (in case of skewed task distribution problem). The novelty of our approach lies in automatically predicting the potential problems a priori based on limited execution data and recommending the locality setting and partition number. Our experimental result using 9 Apache Spark applications on two different clusters shows that our model driven approach can predict these problems with high accuracy and improve the performance by up to 71%.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"179 1-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114026186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Full-System Simulation of Mobile CPU/GPU Platforms 移动CPU/GPU平台的全系统仿真
Kuba Kaszyk, Harry Wagstaff, T. Spink, Björn Franke, M. O’Boyle, Bruno Bodin, Henrik Uhrenholt
{"title":"Full-System Simulation of Mobile CPU/GPU Platforms","authors":"Kuba Kaszyk, Harry Wagstaff, T. Spink, Björn Franke, M. O’Boyle, Bruno Bodin, Henrik Uhrenholt","doi":"10.1109/ISPASS.2019.00015","DOIUrl":"https://doi.org/10.1109/ISPASS.2019.00015","url":null,"abstract":"Graphics Processing Units (GPUs) critically rely on a complex system software stack comprising kernel- and user-space drivers and Just-in-time (JIT) compilers. Yet, existing GPU simulators typically abstract away details of the software stack and GPU instruction set. Partly, this is because GPU vendors rarely release sufficient information about their latest GPU products. However, this is also due to the lack of an integrated CPU/GPU simulation framework, which is complete and powerful enough to drive the complex GPU software environment. This has led to a situation where research on GPU architectures and compilers is largely based on outdated or greatly simplified architectures and software stacks, undermining the validity of the generated results. In this paper we develop a full-system system simulation environment for a mobile platform, which enables users to run a complete and unmodified software stack for a state-of-the-art mobile Arm CPU and Mali-G71 GPU powered device. We validate our simulator against a hardware implementation and Arm's stand-alone GPU simulator, achieving 100% architectural accuracy across all available toolchains. We demonstrate the capability of our GPU simulation framework by optimizing an advanced Computer Vision application using simulated statistics unavailable with other simulation approaches or physical GPU implementations. We demonstrate that performance optimizations for desktop GPUs trigger bottlenecks on mobile GPUs, and show the importance of efficient memory use.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134320475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers 一种尺寸不适合所有:通过容差层量化和暴露机器学习云服务api中的准确性和延迟权衡
Matthew Halpern, Behzad Boroujerdian, Todd W. Mummert, E. Duesterwald, V. Reddi
{"title":"One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers","authors":"Matthew Halpern, Behzad Boroujerdian, Todd W. Mummert, E. Duesterwald, V. Reddi","doi":"10.1109/ISPASS.2019.00012","DOIUrl":"https://doi.org/10.1109/ISPASS.2019.00012","url":null,"abstract":"Today's cloud service architectures follow a “one size fits all” deployment strategy where the same service version instantiation is provided to the end users. However, consumers are broad and different applications have different accuracy and responsiveness requirements, which as we demonstrate renders the “one size fits all” approach inefficient in practice. We use a production grade speech recognition engine, which serves several thousands of users, and an open source computer vision based system, to explain our point. To overcome the limitations of the “one size fits all” approach, we recommend Tolerance Tiers where each MLaaS tier exposes an accuracy/responsiveness characteristic, and consumers can programmatically select a tier. We evaluate our proposal on the CPU-based automatic speech recognition (ASR) engine and cutting-edge neural networks for image classification deployed on both CPUs and GPUs. The results show that our proposed approach provides a MLaaS cloud service architecture that can be tuned by the end API user or consumer to outperform the conventional “one size fits all” approach.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133845040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
DeLTA: GPU Performance Model for Deep Learning Applications with In-Depth Memory System Traffic Analysis DeLTA:基于深度内存系统流量分析的深度学习应用GPU性能模型
Sangkug Lym, Donghyuk Lee, Mike O'Connor, Niladrish Chatterjee, M. Erez
{"title":"DeLTA: GPU Performance Model for Deep Learning Applications with In-Depth Memory System Traffic Analysis","authors":"Sangkug Lym, Donghyuk Lee, Mike O'Connor, Niladrish Chatterjee, M. Erez","doi":"10.1109/ISPASS.2019.00041","DOIUrl":"https://doi.org/10.1109/ISPASS.2019.00041","url":null,"abstract":"Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased. We present DeLTA, the first analytical model that accurately estimates the traffic at each GPU memory hierarchy level, while accounting for the complex reuse patterns of a parallel convolution algorithm. We demonstrate that our model is both accurate and robust for different CNNs and GPU architectures. We then show how this model can be used to carefully balance the scaling of different GPU resources for efficient CNN performance improvement.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114769828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
ISPASS 2019 Program Committee ISPASS 2019项目委员会
Tosiron Adegbija, Junwhan Ahn, Nelson Amaral, Sarah Bird, Simone Campanoni, Trevor E. Carlson, Lizhong Chen, Jason Clemons, Jeanine Cook, S. Diestelhorst, D. Gizopoulos, Rui Hou, D. Kaeli, Ulya R. Karpuzcu, Omer Khan, Jangwoo Kim, John Kim, Qiuyun Llull, Xiaosong Ma, Moriyoshi Ohara, Antonio J. Peña, Ravi Soundararajan, Hyojin Sung, R. Teodorescu, Yash Ukidave, Yuhao Zhu, Adrián Castelló
{"title":"ISPASS 2019 Program Committee","authors":"Tosiron Adegbija, Junwhan Ahn, Nelson Amaral, Sarah Bird, Simone Campanoni, Trevor E. Carlson, Lizhong Chen, Jason Clemons, Jeanine Cook, S. Diestelhorst, D. Gizopoulos, Rui Hou, D. Kaeli, Ulya R. Karpuzcu, Omer Khan, Jangwoo Kim, John Kim, Qiuyun Llull, Xiaosong Ma, Moriyoshi Ohara, Antonio J. Peña, Ravi Soundararajan, Hyojin Sung, R. Teodorescu, Yash Ukidave, Yuhao Zhu, Adrián Castelló","doi":"10.1109/ispass.2019.00008","DOIUrl":"https://doi.org/10.1109/ispass.2019.00008","url":null,"abstract":"Tosiron Adegbija, University of Arizona Junwhan Ahn, Google J. Nelson Amaral, University of Alberta Sarah Bird, Facebook Simone Campanoni, Northwestern University Trevor E. Carlson, National University of Singapore Lizhong Chen, Oregon State University Jason Clemons, NVIDIA Jeanine Cook, Sandia National Laboratories Stephan Diestelhorst, ARM Stijn Eyerman, Intel Dimitris Gizopoulos, University of Athens Rajiv Gupta, University of California Riverside Rui Hou, Institute of Information Engineering Lizy John, University of Texas at Austin David Kaeli, Northeastern University Ulya Karpuzcu, University of Minnesota/Brown University Omer Khan, University of Connecticut Jangwoo Kim, Seoul National University John Kim, Korea Advanced Institute of Science and Technology Qiuyun Llull, VMware Xiaosong Ma, Qatar Computing Research Institute Andreas Moshovos, University of Toronto Moriyoshi Ohara, IBM Research Tokyo Michael Papamichael, Microsoft Research Antonio J Peña, Barcelona Supercomputing Center (BSC) Ravi Soundararajan, VMware Hyojin Sung, IBM Research Radu Teodorescu, Ohio State University Yash Ukidave, AMD Yuhao Zhu, University of Rochester Adrian Castello Universitat Jaume I (UJI) (external) Yang Hu University of Texas at Dallas (external) Xiongchao Tang Tsinghua University (external)","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131190393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信