Hasan Hassan, Nandita Vijaykumar, S. Khan, Saugata Ghose, K. Chang, Gennady Pekhimenko, Donghyuk Lee, O. Ergin, O. Mutlu
{"title":"SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies","authors":"Hasan Hassan, Nandita Vijaykumar, S. Khan, Saugata Ghose, K. Chang, Gennady Pekhimenko, Donghyuk Lee, O. Ergin, O. Mutlu","doi":"10.1109/HPCA.2017.62","DOIUrl":"https://doi.org/10.1109/HPCA.2017.62","url":null,"abstract":"DRAM is the primary technology used for main memory in modern systems. Unfortunately, as DRAM scales down to smaller technology nodes, it faces key challenges in both data integrity and latency, which strongly affects overall system reliability and performance. To develop reliable and high-performance DRAM-based main memory in future systems, it is critical to characterize, understand, and analyze various aspects (e.g., reliability, latency) of existing DRAM chips. To enable this, there is a strong need for a publicly-available DRAM testing infrastructure that can flexibly and efficiently test DRAM chips in a manner accessible to both software and hardware developers. This paper develops the first such infrastructure, SoftMC (Soft Memory Controller), an FPGA-based testing platform that can control and test memory modules designed for the commonly-used DDR (Double Data Rate) interface. SoftMC has two key properties: (i) it provides flexibility to thoroughly control memory behavior or to implement a wide range of mechanisms using DDR commands, and (ii) it is easy to use as it provides a simple and intuitive high-level programming interface for users, completely hiding the low-level details of the FPGA. We demonstrate the capability, flexibility, and programming ease of SoftMC with two example use cases. First, we implement a test that characterizes the retention time of DRAM cells. Experimental results we obtain using SoftMC are consistent with the findings of prior studies on retention time in modern DRAM, which serves as a validation of our infrastructure. Second, we validate two recently-proposed mechanisms, which rely on accessing recently-refreshed or recently-accessed DRAM cells faster than other DRAM cells. Using our infrastructure, we show that the expected latency reduction effect of these mechanisms is not observable in existing DRAM chips, which demonstrates the usefulness of SoftMC in testing new ideas on existing memory modules. We discuss several other use cases of SoftMC, including the ability to characterize emerging non-volatile memory modules that obey the DDR standard. We hope that our open-source release of SoftMC fills a gap in the space of publicly-available experimental memory testing infrastructures and inspires new studies, ideas, and methodologies in memory system design.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"85 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127982538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajiv Nishtala, P. Carpenter, V. Petrucci, X. Martorell
{"title":"Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads","authors":"Rajiv Nishtala, P. Carpenter, V. Petrucci, X. Martorell","doi":"10.1109/HPCA.2017.13","DOIUrl":"https://doi.org/10.1109/HPCA.2017.13","url":null,"abstract":"In 2013, U. S. data centers accounted for 2.2% of the country's total electricity consumption, a figure that is projected to increase rapidly over the next decade. Many important workloads are interactive, and they demand strict levels of quality-of-service (QoS) to meet user expectations, making it challenging to reduce power consumption due to increasing performance demands. This paper introduces Hipster, a technique that combines heuristics and reinforcement learning to manage latency-critical workloads. Hipster's goal is to improve resource efficiency in data centers while respecting the QoS of the latency-critical workloads. Hipster achieves its goal by exploring heterogeneous multi-cores and dynamic voltage and frequency scaling (DVFS). To improve data center utilization and make best usage of the available resources, Hipster can dynamically assign remaining cores to batch workloads without violating the QoS constraints for the latency-critical workloads. We perform experiments using a 64-bit ARM big.LITTLE platform, and show that, compared to prior work, Hipster improves the QoS guarantee for Web-Search from 80% to 96%, and for Memcached from 92% to 99%, while reducing the energy consumption by up to 18%.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128789157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximizing Cache Performance Under Uncertainty","authors":"Nathan Beckmann, Daniel Sánchez","doi":"10.1109/HPCA.2017.43","DOIUrl":"https://doi.org/10.1109/HPCA.2017.43","url":null,"abstract":"Much prior work has studied cache replacement, but a large gap remains between theory and practice. The design of many practical policies is guided by the optimal policy, Belady's MIN. However, MIN assumes perfect knowledge of the future that is unavailable in practice, and the obvious generalizationsof MIN are suboptimal with imperfect information. What, then, is the right metric for practical cache replacement?We propose that practical policies should replace lines based on their economic value added (EVA), the difference of their expected hits from the average. Drawing on the theory of Markov decision processes, we discuss why this metric maximizes the cache's hit rate. We present an inexpensive implementation ofEVA and evaluate it exhaustively. EVA outperforms several prior policies and saves area at iso-performance. These results show that formalizing cache replacement yields practical benefits.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125836516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Cha, O. Seongil, Hyunsung Shin, Sang-Jun Hwang, Kwang-il Park, Seong-Jin Jang, Joo-Sun Choi, G. Jin, Y. Son, Hyunyoon Cho, Jung Ho Ahn, N. Kim
{"title":"Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices","authors":"S. Cha, O. Seongil, Hyunsung Shin, Sang-Jun Hwang, Kwang-il Park, Seong-Jin Jang, Joo-Sun Choi, G. Jin, Y. Son, Hyunyoon Cho, Jung Ho Ahn, N. Kim","doi":"10.1109/HPCA.2017.30","DOIUrl":"https://doi.org/10.1109/HPCA.2017.30","url":null,"abstract":"Technology scaling has continuously improved the density, performance, energy efficiency, and cost of DRAM-based main memory systems. Starting from sub-20nm processes, however, the industry began to pay considerably higher costs to screen and manage notably increasing defective cells. The traditional technique, which replaces the rows/columns containing faulty cells with spare rows/columns, has been able to cost-effectively repair the defective cells so far, but it will become unaffordable soon because an excessive number of spare rows/columns are required to manage the increasing number of defective cells. This necessitates a synergistic application of an alternative resilience technique such as In-DRAM ECC with the traditional one. Through extensive measurement and simulation, we first identify that aggressive miniaturization makes DRAM cells more sensitive to random telegraph noise or variable retention time, which is dominantly manifested as a surge in randomly scattered single-cell faults. Second, we advocate using In-DRAM ECC to overcome the DRAM scaling challenges and architect In-DRAM ECC to accomplish high area efficiency and minimal performance degradation. Moreover, we show that advancement in process technology reduces decoding/correction time to a small fraction of DRAM access time, and that the throughput penalty of a write operation due to an additional read for a parity update is mostly overcome by the multi-bank structure and long burst writes that span an entire In-DRAM ECC codeword. Lastly, we demonstrate that system reliability with modern rank-level ECC schemes such as single device data correction is further improved by hundred million times with the proposed In-DRAM ECC architecture.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128592918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jee Ho Ryoo, Mitesh R. Meswani, A. Prodromou, L. John
{"title":"SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization","authors":"Jee Ho Ryoo, Mitesh R. Meswani, A. Prodromou, L. John","doi":"10.1145/2967938.2974060","DOIUrl":"https://doi.org/10.1145/2967938.2974060","url":null,"abstract":"With current DRAM technology reaching its limit, emerging heterogeneous memory systems have become attractive to keep the memory performance scaling. This paper argues for using a small, fast memory closer to the processor as part of a flat address space where the memory system is composed of two or more memory types. OS-transparent management of such memory has been proposed in prior works such as CAMEO and Part of Memory (PoM) work. Data migration is typically handled either at coarse granularity with high bandwidth overheads (as in PoM) or atne granularity with low hit rate (as in CAMEO). Prior work uses restricted address mapping from only congruence groups in order to simplify the mapping. At any time, only one page (block) from a congruence group is resident in the fast memory. In this paper, we present a at address space organization called SILC-FM that uses large granularity but allows subblocks from two pages to coexist in an interleaved fashion in fast memory. Data movement is done at subblocked granularity, avoiding fetching of useless subblocks and consuming less bandwidth compared to migrating the entire large block. SILC-FM can get more spatial locality hits than CAMEO and PoM due to page-level operation and interleaving blocks respectively. The interleaved subblock placement improves performance by 55% on average over a static placement scheme without data migration. We also selectively lock hot blocks to prevent them from being involved in the hardware swapping operations. Additional features such as locking, associativity and bandwidth balancing improve performance by 11%, 8%, and 8% respectively, resulting in a total of 82% performance improvement over no migration static placement scheme. Compared to the best state-of-the-art scheme, SILC-FM gets performance improvement of 36% with 13% energy savings.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123472540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}