2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献

筛选
英文 中文
Message from the general chair 主席的口信
Erik Hagersten
{"title":"Message from the general chair","authors":"Erik Hagersten","doi":"10.1109/ISPASS.2016.7482065","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482065","url":null,"abstract":"I am delighted to welcome you to the 2016 International Symposium on Performance Analysis of Systems and Software (ISPASS). On its 16th birthday, ISPASS has grown old enough to travel “abroad” for the first time and this 17th ISPASS edition is being held in Uppsala, Sweden.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"40 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120925521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NoMali: Simulating a realistic graphics driver stack using a stub GPU NoMali:使用存根GPU模拟一个真实的图形驱动程序堆栈
R. D. Jong, Andreas Sandberg
{"title":"NoMali: Simulating a realistic graphics driver stack using a stub GPU","authors":"R. D. Jong, Andreas Sandberg","doi":"10.1109/ISPASS.2016.7482100","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482100","url":null,"abstract":"Since the advent of the smartphone, all high-end mobile devices have required graphics acceleration in the form of a GPU. Today, even low-power devices such as smartwatches use GPUs for rendering and composition. However, the computer architecture community has largely ignored these developments when evaluating new architecture proposals. A common approach when evaluating CPU designs for the mobile space has been to use software rendering instead of a GPU model. However, due to the ubiquity of GPUs in mobile devices, they are used in both 3D applications and 2D applications. For example, when running a 2D application such as the web browser in Android with a software renderer instead of a GPU, the CPU ends up executing twice as many instructions. Both the CPU characteristics and the memory system characteristics differ significantly between the browser and the software renderer. The software renderer typically executes tight loops of vector instructions, while the browser predominantly consists of integer instructions and complex control flow with hard-to-predict branches. Including software rendering results in unrepresentative benchmark performance. In this paper, we use gem5 to quantify the effects of software rendering on a set of common mobile workloads. We also introduce the NoMali stub GPU model that can be used as a drop-in replacement for a real Mali GPU model. This model behaves like a normal GPU, but does not render anything. Using this stub GPU, we demonstrate how most of the problems associated with software rendering can be avoided, while at the same time simulating a representative graphics stack.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130718383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Characterizing Hadoop applications on microservers for performance and energy efficiency optimizations 在微服务器上描述Hadoop应用程序的性能和能效优化
Maria Malik, Avesta Sasan, R. Joshi, Setareh Rafatirah, H. Homayoun
{"title":"Characterizing Hadoop applications on microservers for performance and energy efficiency optimizations","authors":"Maria Malik, Avesta Sasan, R. Joshi, Setareh Rafatirah, H. Homayoun","doi":"10.1109/ISPASS.2016.7482087","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482087","url":null,"abstract":"The traditional low-power embedded processors such as Atom and ARM are entering the high-performance server market. At the same time, as the size of data grows, emerging Big Data applications require more and more server computational power that yields challenges to process data energy-efficiently using current high performance server architectures. Furthermore, physical design constraints, such as power and density have become the dominant limiting factor for scaling out servers. Numerous big data applications rely on using the Hadoop MapReduce framework to perform their analysis on large-scale datasets. Since Hadoop configuration parameters as well as architecture parameters directly affect the MapReduce job performance and energy-efficiency, system and architecture level parameters tuning is vital to maximize the energy efficiency. In this work, through methodical investigation of performance and power measurements, we demonstrate how the interplay among various Hadoop configurations and system and architecture level parameters affect the performance and energy-efficiency across various Hadoop applications.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116926827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
CoolSim: Eliminating traditional cache warming with fast, virtualized profiling CoolSim:通过快速、虚拟化的分析消除传统的缓存升温
Nikos Nikoleris, Andreas Sandberg, Erik Hagersten, Trevor E. Carlson
{"title":"CoolSim: Eliminating traditional cache warming with fast, virtualized profiling","authors":"Nikos Nikoleris, Andreas Sandberg, Erik Hagersten, Trevor E. Carlson","doi":"10.1109/ISPASS.2016.7482085","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482085","url":null,"abstract":"Sampling (e.g., SMARTS and SimPoint) improves simulation performance by an order of magnitude or more through the reduction of large workloads into a small but representative sample. Virtualized fast-forwarding (e.g., FSA) speeds up simulation further by advancing execution at near-native speed between simulation points, making cache warming the critical limiting factor for simulation performance. CoolSim is an efficient simulation framework that eliminates cache warming. It collects sparse memory reuse information (MRI) while advancing between simulation points using virtualized fast-forwarding. During detailed simulation, a statistical cache model uses the previously acquired MRI to estimate the performance of the caches. CoolSim builds upon KVM and gem5 and runs 19x faster than the state-of-the-art sampled simulation. It estimates the CPI of the SPEC CPU2006 benchmarks with 3.62% error on average, across a wide range of cache sizes.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"445 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MofySim: A mobile full-system simulation framework for energy consumption and performance analysis MofySim:用于能源消耗和性能分析的移动全系统仿真框架
Minho Ju, Hyeonggyu Kim, Soontae Kim
{"title":"MofySim: A mobile full-system simulation framework for energy consumption and performance analysis","authors":"Minho Ju, Hyeonggyu Kim, Soontae Kim","doi":"10.1109/ISPASS.2016.7482099","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482099","url":null,"abstract":"The analysis of energy consumption and performance is essential to design and optimize mobile systems because of their limited battery capacity. Full-system simulation provides detailed performance metrics for an entire system. Thus it has been widely used for designing and optimizing microarchitectures and mobile systems. The gem5 simulator provides full-system simulation based on the ARM architecture and Android for mobile systems. However, gem5 for mobile systems does not support wireless network interfaces and can not configure various networking environments such as network errors and network types. Furthermore, gem5 provides only performance statistics without power consumption data. This paper presents a mobile full-system simulation framework based on an enhanced gem5 that includes a simulated mobile system, a simulated server system, and a simulated Ethernet, which enables us to configure various networking environments, in addition to power models for the main components of mobile systems: CPU/caches, DRAM, network interfaces, and display. Using mobile applications and SPEC CPU2006 benchmarks, we show that the proposed mobile full-system simulator achieves performance accuracy within 26.8% error rate for various network packet loss rates, and power modeling accuracy within 12.8% error rate, compared with Nexus 5. This mobile full-system simulator considering the real networking environments provides the energy consumption and performance analysis of not only hardware components, but also application processes and threads at the same time. We also discovered energy-inefficient tasks and the inefficiency of the DVFS ondemand governor on network delays using the proposed mobile full-system simulation framework.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122483815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
JIT-assisted fast-forward embedding and instrumentation to enable fast, accurate, and agile simulation jit辅助的快进嵌入和仪器,以实现快速、准确和敏捷的仿真
Berkin Ilbeyi, C. Batten
{"title":"JIT-assisted fast-forward embedding and instrumentation to enable fast, accurate, and agile simulation","authors":"Berkin Ilbeyi, C. Batten","doi":"10.1109/ISPASS.2016.7482103","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482103","url":null,"abstract":"Computer architects need fast and accurate simulation to research new computing systems, but architects are also increasingly demanding agile simulation to give them flexibility to productively explore the interaction between software and hardware. In this paper, we propose JIT-assisted fast-forward embedding (JIT-FFE) and JIT-assisted fast-forward instrumentation (JIT-FFI) for fast, accurate, and agile simulation. JIT-FFE enables zero-copy architectural state transfer between a state-of-the-art dynamic-binary-translation-based instruction-set simulator and a detailed microarchitectural simulator. JIT-FFI enables productive implementation of fast functional profiling and warmup. We have implemented these two techniques in a new tool, called PydginFF, which can be integrated with any C/C++ detailed simulator. We evaluate PydginFF within the context of the gem5 detailed simulator for both periodic sampling (SMARTS) and targeted sampling (SimPoint) and demonstrate that PydginFF reduces simulation time of fast-forward-based sampling by over 10×.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122199216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elastic traces for fast and accurate system performance exploration 弹性轨迹,用于快速和准确的系统性能探索
R. Jagtap, S. Diestelhorst, Andreas Hansson
{"title":"Elastic traces for fast and accurate system performance exploration","authors":"R. Jagtap, S. Diestelhorst, Andreas Hansson","doi":"10.1109/ISPASS.2016.7482084","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482084","url":null,"abstract":"As computer systems become increasingly complex, the need for fast and accurate simulation tools increases. Accurate but slow processor core models are often substituted with simple trace players to achieve faster memory-system simulation. However, existing trace-driven simulation techniques are limited in their applicability and availability. In this work, we capture elastic traces containing out-of-order core dependencies and effects of speculative execution, which overcome limitations of existing work. Additionally, we make our capture and replay modelling available in the gem5 simulator. Our trace-driven CPU achieves a speed-up of 6-8x compared to the reference core and predicts the performance with less than 1% error on average when the memory-system is changed.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126389627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MLC PCM main memory with accelerated read 具有加速读取功能的MLC PCM主存储器
M. Arjomand, A. Jadidi, M. Kandemir, A. Sivasubramaniam, C. Das
{"title":"MLC PCM main memory with accelerated read","authors":"M. Arjomand, A. Jadidi, M. Kandemir, A. Sivasubramaniam, C. Das","doi":"10.1109/ISPASS.2016.7482082","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482082","url":null,"abstract":"This paper alleviates the problem of slow reads in the Multi-Level Cell Phase Change Memory (MLC PCM) by exploiting a the fact that the Most-Significant Bit (MSB) of MLCs is read fast, while reading the Least-Significant Bits (LSBs) is slower. We propose Half-Line PCM (HL-PCM), a memory architecture that leverages this property to send half of a cache line to the processor ahead of the other half, so that processor continues its execution if the missed data element is in the first half. Our evaluation shows that HL-PCM improves program execution time by 23%, on average, in a 16-core CMP model for workloads from PARSEC-2 benchmark.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115738193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Agave: A benchmark suite for exploring the complexities of the Android software stack Agave:一个用于探索Android软件栈复杂性的基准套件
Martin K. Brown, Zachary Yannes, Michael Lustig, M. Sanati, S. Mckee, G. Tyson, S. Reinhardt
{"title":"Agave: A benchmark suite for exploring the complexities of the Android software stack","authors":"Martin K. Brown, Zachary Yannes, Michael Lustig, M. Sanati, S. Mckee, G. Tyson, S. Reinhardt","doi":"10.1109/ISPASS.2016.7482089","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482089","url":null,"abstract":"Traditional suites used for benchmarking high-performance computing platforms or for architectural design space exploration use much simpler virtual memory layouts and multitasking/ multithreading schemes, which means that they cannot be used to study the complex interactions among the layers of the Android software stack. To demonstrate this, we present memory reference and concurrency data showing how Android applications differ from traditional C benchmarks. We propose the Agave suite of open-source applications as the basis for a standard, multipurpose Android benchmark suite. We make all sources and tools available in hopes that the community will adopt and build on this initial version of Agave.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Workload characterization and optimization of TPC-H queries on Apache Spark Apache Spark上TPC-H查询的工作负载表征和优化
Tatsuhiro Chiba, Tamiya Onodera
{"title":"Workload characterization and optimization of TPC-H queries on Apache Spark","authors":"Tatsuhiro Chiba, Tamiya Onodera","doi":"10.1109/ISPASS.2016.7482079","DOIUrl":"https://doi.org/10.1109/ISPASS.2016.7482079","url":null,"abstract":"Besides being an in-memory-oriented computing framework, Spark runs on top of Java Virtual Machines (JVMs), so JVM parameters must be tuned to improve Spark application performance. Misconfigured parameters and settings degrade performance. For example, using Java heaps that are too large often causes a long garbage collection pause time, which accounts for over 10-20% of application execution time. Moreover, recent computing nodes have many cores with simultaneous multi-threading technology and the processors on the node are connected via NUMA, so it is difficult to exploit best performance without taking into account of these hardware features. Thus, optimization in a full stack is also important. Not only JVM parameters but also OS parameters, Spark configuration, and application code based on CPU characteristics need to be optimized to take full advantage of underlying computing resources. In this paper, we used the TPC-H benchmark as our optimization case study and gathered many perspective logs such as application, JVM (e.g. GC and JIT), system utilization, and hardware events from a performance monitoring unit. We discuss current problems and introduce several JVM and OS parameter optimization approaches for accelerating Spark performance. As a result, our optimization exhibits 30-40% increase in speed on average and is up to 5x faster than the naive configuration.","PeriodicalId":416765,"journal":{"name":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116475796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信