Jichuan Chang, Parthasarathy Ranganathan, T. Mudge, D. Roberts, Mehul A. Shah, Kevin T. Lim
{"title":"A limits study of benefits from nanostore-based future data-centric system architectures","authors":"Jichuan Chang, Parthasarathy Ranganathan, T. Mudge, D. Roberts, Mehul A. Shah, Kevin T. Lim","doi":"10.1145/2212908.2212915","DOIUrl":"https://doi.org/10.1145/2212908.2212915","url":null,"abstract":"The adoption of non-volatile memories (NVMs) in system architecture and the growth in data-centric workloads offer exciting opportunities for new designs. In this paper, we examine the potential and limit of designs that move compute in close proximity to NVM-based data stores. To address the challenges in evaluating such system architectures for distributed systems, we develop and validate a new methodology for large-scale data-centric workloads. We then study \"nanostores\" as an example design that constructs distributed systems from building blocks with 3D-stacked compute and NVM layers on the same chip, replacing both traditional storage and memory with NVM. Our limits study demonstrates significant potential of this approach (3-162X improvement in energy delay product) over 2015 baselines, particularly for IO-intensive workloads. We also discuss and quantify the impact of network bandwidth, software scalability, and power density, and design tradeoffs for future NVM-based data-centric architectures.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander W. Min, Ren Wang, James Tsai, M. A. Ergin, T. Tai
{"title":"Improving energy efficiency for mobile platforms by exploiting low-power sleep states","authors":"Alexander W. Min, Ren Wang, James Tsai, M. A. Ergin, T. Tai","doi":"10.1145/2212908.2212928","DOIUrl":"https://doi.org/10.1145/2212908.2212928","url":null,"abstract":"Reducing energy consumption is one of the most important design aspects for small form-factor mobile platforms, such as smartphones and tablets. Despite its potential for power savings, optimally leveraging system low-power sleep states during active mobile workloads, such as video streaming and web browsing, has not been fully explored. One major challenge is to make intelligent power management decisions based on, among other things, accurate system idle duration prediction, which is difficult due to the non-deterministic system interrupt behavior. In this paper, we propose a novel framework, called E2S3 (Energy Efficient Sleep-State Selection), that dynamically enters the optimal low-power sleep state to minimize the system power consumption. In particular, E2S3 detects and exploits short idle durations during active mobile workloads by, (i) finding optimal thresholds (i.e., energy break-even times) for multiple low-power sleep states, (ii) predicting the sleep-state selection error probabilities heuristically, and by (iii) selecting the optimal sleep state based on the expected reward, e.g., power consumption, which incorporates the risks of making a wrong decision We implemented and evaluated E2S3 on Android-based smartphones, demonstrating the effectiveness of the algorithm. The evaluation results show that E2S3 significantly reduces the platform energy consumption, by up to 50% (hence extending battery life), without compromising system performance.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134083206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architectural support of multiple hypervisors over single platform for enhancing cloud computing security","authors":"Wei-qi Shi, Jong-Hyuk Lee, Taeweon Suh, Dong Hyuk Woo, Xinwen Zhang","doi":"10.1145/2212908.2212920","DOIUrl":"https://doi.org/10.1145/2212908.2212920","url":null,"abstract":"This paper presents MultiHype, a novel architecture that supports multiple hypervisors (or virtual machine monitors) on a single physical platform by leveraging many-core based cloud-on-chip architecture. A MultiHype platform consists of a control plane and multiple hypervisors created on-demand, each can further create multiple guest virtual machines. Supported at architectural level, a single platform using MultiHype can behave as a distributed system with each hypervisor and its virtual machines running independently and concurrently. As a direct consequence, vulnerabilities of one hypervisor or its guest virtual machine can be confined within its own domain, which makes the platform more resilient to malicious attacks and failures in a cloud environment. Towards defending against resource exhaustion attacks, MultiHype further implements a new cache eviction policy and memory management scheme for preventing resource monopolization on shared cache, and defending against denial of resource exploits on physical memory resource launched from malicious virtual machines on shared platform. We use Bochs emulator and cycle based x86 simulation to evaluate the effectiveness and performance of MultiHype.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114336110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient vectorization of linked-cell particle simulations","authors":"W. Eckhardt, A. Heinecke","doi":"10.1145/2212908.2212943","DOIUrl":"https://doi.org/10.1145/2212908.2212943","url":null,"abstract":"Molecular dynamics simulations for short-range potentials represent an important class of applications in scientific computing. While a lot of work has been spent on the efficient implementation of such simulations on vector machines in general, not much effort has been invested into the efficient implementation for current x86 processor architectures' SIMD extensions such as SSE and AVX.\u0000 We describe an implementation of the linked-cell algorithm for the SSE and AVX instruction set, which achieves the theoretical limit for SSE. Moreover, the proposed scheme will allow the efficient usage of future architectures with wider vector units. We implemented the kernel using intrinsics within a small test program and conducted a number of runs for different setups of the Lennard-Jones fluid on an Intel- and AMD-based cluster, respectively.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114586439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning, evolution and adaptation in racing games","authors":"D. Loiacono","doi":"10.1145/2212908.2212953","DOIUrl":"https://doi.org/10.1145/2212908.2212953","url":null,"abstract":"Modern racing games offer a realistic driving experience and a vivid game environment. Accordingly, developing this type of games involves several challenges and requires a large amount of game contents. Computational intelligence represents a promising technology to deal effectively with such challenges and, at the same time, to reduce the cost of the development process. In this paper, we provide an overview of the most relevant applications of computational intelligence methods in the domain of racing games. In particular, we show that computational intelligence can be successfully applied (i) to develop highly competitive non-player characters,(ii) to design advanced racing behaviors such as overtaking maneuvers, and (iii) to automatically generate tracks and racing scenarios.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115611356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Instructions activating conditions for hardware-based auto-scheduling","authors":"S. Lovergine, Fabrizio Ferrandi","doi":"10.1145/2212908.2212946","DOIUrl":"https://doi.org/10.1145/2212908.2212946","url":null,"abstract":"Nowadays, implementing hardware accelerators by hand-writing the RTL still leads to better quality of the results with respect to those obtained by automating the design process. Manually developing and maintaining hardware designs, however, is a complex, time-consuming and error prone task, making improvements in the automatic design flow definition a fervent ongoing research topic. The most common approach is based on a statically computed scheduling order. Supports for features such as dynamic scheduling or unbounded latency of operations and functional units have been proposed with some limitations. Instructions auto-scheduling is an alternative to overcome such restrictions, while facing those situations that need or take advantage of run-time adaptive reordering of the instructions.\u0000 This paper focuses on how to improve the synthesis of hardware cores by increasing automatic parallelism exploitation. The proposed approach computes the set of conditions to be satisfied for each instruction to be executed as soon as possible, allowing run-time auto-scheduling. Representing such conditions as logic functions, the corresponding hardware implementation can be easily automated. Experimental results have shown an encouraging enhancement in terms of performance, with a limited increase of area.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114749552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karthik T. Sundararajan, Timothy M. Jones, N. Topham
{"title":"A reconfigurable cache architecture for energy efficiency","authors":"Karthik T. Sundararajan, Timothy M. Jones, N. Topham","doi":"10.1145/2016604.2016616","DOIUrl":"https://doi.org/10.1145/2016604.2016616","url":null,"abstract":"On-chip caches often consume a significant fraction of the total processor energy budget. Allowing adaptation to the running workload can significantly lower their energy consumption. In this paper, we present a novel Set and way Management cache Architecture for efficient Run-Time reconfiguration (Smart cache), a cache architecture that allows reconfiguration in both its size and associativity. Results show the energy-delay of the Smart cache is on average 18% better than state-of-the-art reconfiguration architectures.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124849515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid high-performance low-power and ultra-low energy reliable caches","authors":"Bojan Maric, J. Abella, F. Cazorla, M. Valero","doi":"10.1145/2016604.2016619","DOIUrl":"https://doi.org/10.1145/2016604.2016619","url":null,"abstract":"Ubiquitous computing has become a very popular paradigm. The most suitable technological solution for those systems consists of using hybrid processors able to operate at high voltage to provide high performance and at near-/sub-threshold voltage to provide ultra-low energy consumption.\u0000 This paper studies different non-hybrid and hybrid SRAM L1 cache designs using several SRAM cell types and compare them in terms of delay, dynamic energy, leakage power and area.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"27 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123697060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging data-structure semantics for efficient algorithmic parallelism","authors":"Romain Cledat, K. Ravichandran, S. Pande","doi":"10.1145/2016604.2016638","DOIUrl":"https://doi.org/10.1145/2016604.2016638","url":null,"abstract":"Irregular or pointer-based structures such as graphs and trees are commonly used in algorithms dealing with sparse data. Given their reliance on pointers, these algorithms are difficult to analyze and the structure of their memory accesses is obfuscated which makes the extraction of parallelism difficult.\u0000 In this work, we present a framework that is capable of reasoning about the semantics of the dynamic data footprints of operations to determine their potential overlap. We leverage the knowledge the programmer has about access patterns for the algorithm but is currently unable to express. This knowledge allows our runtime to make either a parallelization decision or throttle concurrency to improve performance in Software Transactional Memories (STMs) [6]. Our framework relies on programmer-supplied predicates that are appropriately evaluated at runtime and utilized to probabilistically assert certain properties about data footprints.\u0000 We present simple abstractions and a low-overhead runtime to support our framework. We demonstrate our work by parallelizing a graph-coloring benchmark and by improving the transactional performance of benchmarks from the STAMP suite.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122397016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Daneshtalab, M. Ebrahimi, P. Liljeberg, J. Plosila, H. Tenhunen
{"title":"Cluster-based topologies for 3D stacked architectures","authors":"M. Daneshtalab, M. Ebrahimi, P. Liljeberg, J. Plosila, H. Tenhunen","doi":"10.1145/2016604.2016621","DOIUrl":"https://doi.org/10.1145/2016604.2016621","url":null,"abstract":"As Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package, combining the benefits of 3D IC and Network-on-Chip (NoC) schemes provides a significant performance gain for 3D architectures. Through-Silicon-Via (TSV), employed for inter-layer connectivity (vertical channel)in 3D ICs, reduces wafer utilization and yield which impact design of 3D architectures using a large number of TSVs. In this paper, we propose two novel stacked topologies for 3D architectures to reduce the area overhead of TSVs and power dissipation on each layer with minimal performance penalty. The presented schemes benefit of clustering the mesh topology in order to mitigate TSV footprint on each stacked layer.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128962502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}