2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献_第2页

Characterization and analysis of a web search benchmark 网络搜索基准的特征和分析

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095818

Zacharias Hadjilambrou, Marios Kleanthous, Yiannakis Sazeides

{"title":"Characterization and analysis of a web search benchmark","authors":"Zacharias Hadjilambrou, Marios Kleanthous, Yiannakis Sazeides","doi":"10.1109/ISPASS.2015.7095818","DOIUrl":"https://doi.org/10.1109/ISPASS.2015.7095818","url":null,"abstract":"Web search as a service is very impressive. Web search runs on thousands of servers which perform search on an index of billions of web pages. The search results must be both relevant to the user queries and reach the user in a fraction of a second. A web search service must guarantee the same QoS at all times even at the peak incoming traffic load. Not unjustifiably the web search service has attracted a lot of research attention. Despite the high research interest web search has gained, there are still plenty unknown about the functionality and the architecture of web search benchmarks. Much research has been done using commercial web search engines, like Bing or Google, but many details of these search engines are, of course, not disclosed to the public. We take an academically accepted web search benchmark and we perform a thorough characterization and analysis of it. We shed light in to the architecture and the functionality of the benchmark. We also investigate some prominent web search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput and we also explore the potential use of low power servers for web search. Our results show that intra-server partitioning can reduce tail latencies and that low power servers given enough partitioning can provide same response times as conventional high performance servers.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126071641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Where does the time go? characterizing tail latency in memcached 时间都去哪儿了?表征memcached中的尾部延迟

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095781

G. Blake, A. Saidi

{"title":"Where does the time go? characterizing tail latency in memcached","authors":"G. Blake, A. Saidi","doi":"10.1109/ISPASS.2015.7095781","DOIUrl":"https://doi.org/10.1109/ISPASS.2015.7095781","url":null,"abstract":"To function correctly Online, Data-Intensive (OLDI) services require low and consistent service times. Maintaining predictable service times entails requiring 99th or higher percentile latency targets across hundreds to thousands of servers in the data-center. However, to maintain the 99th percentile targets servers are routinely run well below full utilization. The main difficulty in optimizing a server to run closer to peak utilization and maintain predictable 99th percentile response latencies is identifying and mitigating the causes of a request missing the target service time. In practice this analysis is challenging as requests and responses overlap their execution with respect to one another and traverse multiple layers of software, user/kernel protection boundaries, and the hardware/software divide. Traditional profiling methods that record the time being spent in each function usually yield few clues as to where a bottleneck may be present due to the many layers of software each consuming only a small fraction of time each. In this work we analyze the end-to-end sources of latency in a Memcached server from the wire through the kernel into the application and back again. To do so, we develop a tool that utilizes the Linux SystemTap infrastructure to measure latency throughout the many software layers that make up the complete request and response path for Memcached. While memory copies and the Linux networking stack are often suggested as major contributors to latency, we find that the main cause of missing response latency guarantees is the formation of standing queues and the application's inability to detect and remedy this situation.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127353194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

DELPHI: a framework for RTL-based architecture design evaluation using DSENT models DELPHI:一个使用DSENT模型进行基于rtl的架构设计评估的框架

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095780

Michael Papamichael, Cagla Cakir, Chen Sun, C. Chen, J. Hoe, K. Mai, L. Peh, V. Stojanović

{"title":"DELPHI: a framework for RTL-based architecture design evaluation using DSENT models","authors":"Michael Papamichael, Cagla Cakir, Chen Sun, C. Chen, J. Hoe, K. Mai, L. Peh, V. Stojanović","doi":"10.1109/ISPASS.2015.7095780","DOIUrl":"https://doi.org/10.1109/ISPASS.2015.7095780","url":null,"abstract":"Computer architects are increasingly interested in evaluating their ideas at the register-transfer level (RTL) to gain more precise insights on the key characteristics (frequency, area, power) of a micro/architectural design proposal. However, the RTL synthesis process is notoriously tedious, slow, and errorprone and is often outside the area of expertise of a typical computer architect, as it requires familiarity with complex CAD flows, hard-to-get tools and standard cell libraries. The effort is further multiplied when targeting multiple technology nodes and standard cell variants to study technology dependence. This paper presents DELPHI, a flexible, open framework that leverages the DSENT modeling engine for faster, easier, and more efficient characterization of RTL hardware designs. DELPHI first synthesizes a Verilog or VHDL RTL design (either using the industry-standard Synopsys Design Compiler tool or a combination of open-source tools) to an intermediate structural netlist. It then processes the resulting synthesized netlist to generate a technology-independent DSENT design model. This model can then be used within a modified version of the DSENT flow to perform very fast-one to two orders of magnitude faster than full RTL synthesis-estimation of hardware performance characteristics, such as frequency, area, and power across a variety of DSENT technology models (e.g., 65nm Bulk, 32nm SOI, 11nm Tri-Gate, etc.). In our evaluation using 26 RTL design examples, DELPHI and DSENT were consistently able to closely track and capture design trends of conventional RTL synthesis results without the associated delay and complexity. We are releasing the full DELPHI framework (including a fully open-source flow) at http://www.ece.cmu.edu/CALCM/delphi/.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128279220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Mosaic: cross-platform user-interaction record and replay for the fragmented android ecosystem Mosaic:面向碎片化android生态系统的跨平台用户交互记录和回放

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095807

Matthew Halpern, Yuhao Zhu, R. Peri, V. Reddi

{"title":"Mosaic: cross-platform user-interaction record and replay for the fragmented android ecosystem","authors":"Matthew Halpern, Yuhao Zhu, R. Peri, V. Reddi","doi":"10.1109/ISPASS.2015.7095807","DOIUrl":"https://doi.org/10.1109/ISPASS.2015.7095807","url":null,"abstract":"In contrast to traditional computing systems, such as desktops and servers, that are programmed to perform “compute-bound” and “run-to-completion” tasks, mobile applications are designed for user interactivity. Factoring user interactivity into computer system design and evaluation is important, yet possesses many challenges. In particular, systematically studying interactive mobile applications across the diverse set of mobile devices available today is difficult due to the mobile device fragmentation problem. At the time of writing, there are 18,796 distinct Android mobile devices on the market and will only continue to increase in the future. Differences in screen sizes, resolutions and operating systems impose different interactivity requirements, making it difficult to uniformly study these systems. We present Mosaic, a cross-platform, timing-accurate record and replay tool for Android-based mobile devices. Mosaic overcomes device fragmentation through a novel virtual screen abstraction. User interactions are translated from a physical device into a platform-agnostic intermediate representation before translation to a target system. The intermediate representation is human-readable, which allows Mosaic users to modify previously recorded traces or even synthesize their own user interactive sessions from scratch. We demonstrate that Mosaic allows user interaction traces to be recorded on emulators, smartphones, tablets, and development boards and replayed on other devices. Using Mosaic we were able to replay 45 different Google Play applications across multiple devices, and also show that we can perform cross-platform performance comparisons between two different processors under identical user interactions.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130436273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

Factors affecting scalability of multithreaded Java applications on manycore systems 影响多核系统上多线程Java应用程序可伸缩性的因素

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095800

Junjie Qian, Du Li, W. Srisa-an, Hong Jiang, S. Seth

引用次数: 7

A modeling framework for reuse distance-based estimation of cache performance 基于重用距离的缓存性能估计建模框架

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095785

Xiaoyue Pan, B. Jonsson

{"title":"A modeling framework for reuse distance-based estimation of cache performance","authors":"Xiaoyue Pan, B. Jonsson","doi":"10.1109/ISPASS.2015.7095785","DOIUrl":"https://doi.org/10.1109/ISPASS.2015.7095785","url":null,"abstract":"We develop an analytical modeling framework for efficient prediction of cache miss ratios based on reuse distance distributions. The only input needed for our predictions is the reuse distance distribution of a program execution: previous work has shown that they can be obtained with very small overhead by sampling from native executions. This should be contrasted with previous approaches that base predictions on stack distance distributions, whose collection need significantly larger overhead or additional hardware support. The predictions are based on a uniform modeling framework which can be specialized for a variety of cache replacement policies, including Random, LRU, PLRU, and MRU (aka. bit-PLRU), and for arbitrary values of cache size and cache associativity. We evaluate our modeling framework with the SPEC CPU 2006 benchmark suite over a set of cache configurations with varying cache size, associativity and replacement policy. The introduced inaccuracies were generally below 1% for the model of the policy, and additionally around 2% when set-local reuse distances must be estimated from global reuse distance distributions. The inaccuracy introduced by sampling is significantly smaller.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"454 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134497351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Precise computer comparisons via statistical resampling methods 通过统计重采样方法进行精确的计算机比较

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095787

Bin Li, Shaoming Chen, Lu Peng

{"title":"Precise computer comparisons via statistical resampling methods","authors":"Bin Li, Shaoming Chen, Lu Peng","doi":"10.1109/ISPASS.2015.7095787","DOIUrl":"https://doi.org/10.1109/ISPASS.2015.7095787","url":null,"abstract":"Performance variability, stemming from non-deterministic hardware and software behaviors or deterministic behaviors such as measurement bias, is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics. Conventional methods use various measures (such as geometric mean) to quantify the performance of different benchmarks to compare computers without considering variability. This may lead to wrong conclusions. In this paper, we propose three resampling methods for performance evaluation and comparison: a randomization test for a general performance comparison between two computers, bootstrapping confidence estimation, and an empirical distribution and five-number-summary for performance evaluation. The results show that 1) the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large; 2) bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure (e.g. ratio of geometric means); and 3) when the difference is very small, a single test is often not enough to reveal the nature of the computer performance and a five-number-summary to summarize computer performance. We illustrate the results and conclusion through detailed Monte Carlo simulation studies and real examples. Results show that our methods are precise and robust even when two computers have very similar performance metrics.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116818624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Pydgin: generating fast instruction set simulators from simple architecture descriptions with meta-tracing JIT compilers Pydgin:使用元跟踪JIT编译器从简单的架构描述生成快速指令集模拟器

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095811

Derek Lockhart, Berkin Ilbeyi, C. Batten

{"title":"Pydgin: generating fast instruction set simulators from simple architecture descriptions with meta-tracing JIT compilers","authors":"Derek Lockhart, Berkin Ilbeyi, C. Batten","doi":"10.1109/ISPASS.2015.7095811","DOIUrl":"https://doi.org/10.1109/ISPASS.2015.7095811","url":null,"abstract":"Instruction set simulators (ISSs) remain an essential tool for the rapid exploration and evaluation of instruction set extensions in both academia and industry. Due to their importance in both hardware and software design, modern ISSs must balance a tension between developer productivity and high-performance simulation. Productivity requirements have led to “ADL-driven” toolflows that automatically generate ISSs from high-level architectural description languages (ADLs). Meanwhile, performance requirements have prompted ISSs to incorporate increasingly complicated dynamic binary translation (DBT) techniques. Construction of frameworks capable of providing both the productivity benefits of ADL-generated simulators and the performance benefits of DBT remains a significant challenge. We introduce Pydgin, a new approach to ISS construction that addresses the multiple challenges of designing, implementing, and maintaining ADL-generated DBT-ISSs. Pydgin uses a Python-based, embedded-ADL to succinctly describe instruction behavior as directly executable “pseudocode”. These Pydgin ADL descriptions are used to automatically generate high-performance DBT-ISSs by creatively adapting an existing meta-tracing JIT compilation framework designed for general-purpose dynamic programming languages. We demonstrate the capabilities of Pydgin by implementing ISSs for two instruction sets and show that Pydgin provides concise, flexible ISA descriptions while also generating simulators with performance comparable to hand-coded DBT-ISSs.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134258300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Non-volatile memory host controller interface performance analysis in high-performance I/O systems 高性能I/O系统中非易失性存储器主控制器接口性能分析

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095793

Amro Awad, B. Kettering, Yan Solihin

{"title":"Non-volatile memory host controller interface performance analysis in high-performance I/O systems","authors":"Amro Awad, B. Kettering, Yan Solihin","doi":"10.1109/ISPASS.2015.7095793","DOIUrl":"https://doi.org/10.1109/ISPASS.2015.7095793","url":null,"abstract":"Emerging non-volatile memories (NVMs), such as Phase-Change Memory (PCM), Spin-Transfer Torque RAM (STT-RAM) and Memristor, are very promising candidates for replacing NAND-Flash Solid-State Drives (SSDs) and Hard Disk Drives (HDDs) for many reasons. First, their read/write latencies are orders of magnitude faster. Second, some emerging NVMs, such as memristors, are expected to have very high densities, which allow deploying a much higher capacity without requiring increased physical space. While the percentage of the time taken for data movement over low-speed buses, such as Peripheral Component Interconnect (PCI), is negligible for the overall read/write latency in HDDs, it could be dominant for emerging fast NVMs. Therefore, the trend has moved toward using very fast interconnect technologies, such as PCI Express (PCIe) which is hundreds of times faster than the traditional PCI. Accordingly, new host controller interfaces are used to communicate with I/O devices to exploit the parallelism and low-latency features of emerging NVMs through high-speed interconnects. In this paper, we investigate the system performance bottlenecks and overhead of using the standard state-of-the-art Non-Volatile Memory Express (NVMe), or Non-Volatile Memory Host Controller Interface (NVMHCI) Specification [1] as representative for NVM host controller interfaces.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131152531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

An updated performance comparison of virtual machines and Linux containers 更新了虚拟机和Linux容器的性能比较

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI: 10.1109/ISPASS.2015.7095802

Wes Felter, Alexandre Ferreira, R. Rajamony, J. Rubio

引用次数: 1084