Carlo C. del Mundo, Vincent T. Lee, L. Ceze, M. Oskin
{"title":"NCAM: Near-Data Processing for Nearest Neighbor Search","authors":"Carlo C. del Mundo, Vincent T. Lee, L. Ceze, M. Oskin","doi":"10.1145/2818950.2818984","DOIUrl":"https://doi.org/10.1145/2818950.2818984","url":null,"abstract":"Emerging classes of computer vision applications demand unprecedented computational resources and operate on large amounts of data. In particular, k-nearest neighbors (kNN), a cornerstone algorithm in these applications, incurs significant data movement. To address this challenge, the underlying architecture and memory subsystems must vertically evolve to address memory bandwidth and compute demands. To enable large-scale computer vision, we propose a new class of associative memories called NCAMs which encapsulate logic with memory to accelerate k-nearest neighbors. We estimate that NCAMs can improve the performance of kNN by orders of magnitude over the best off-the-shelf software libraries (e.g., FLANN) and commodity platforms (e.g., GPUs).","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126620784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Radulovic, D. Zivanovic, Daniel Ruiz, B. Supinski, S. Mckee, Petar Radojkovic, E. Ayguadé
{"title":"Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?","authors":"M. Radulovic, D. Zivanovic, Daniel Ruiz, B. Supinski, S. Mckee, Petar Radojkovic, E. Ayguadé","doi":"10.1145/2818950.2818955","DOIUrl":"https://doi.org/10.1145/2818950.2818955","url":null,"abstract":"First defined two decades ago, the memory wall remains a fundamental limitation to system performance. Recent innovations in 3D-stacking technology enable DRAM devices with much higher bandwidths than traditional DIMMs. The first such products will soon hit the market, and some of the publicity claims that they will break through the memory wall. Here we summarize our analysis and expectations of how such 3D-stacked DRAMs will affect the memory wall for a set of representative HPC applications. We conclude that although 3D-stacked DRAM is a major technological innovation, it cannot eliminate the memory wall.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114684347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Chen, A. Arunkumar, Carole-Jean Wu, T. Mudge, C. Chakrabarti
{"title":"E-ECC: Low Power Erasure and Error Correction Schemes for Increasing Reliability of Commodity DRAM Systems","authors":"H. Chen, A. Arunkumar, Carole-Jean Wu, T. Mudge, C. Chakrabarti","doi":"10.1145/2818950.2818961","DOIUrl":"https://doi.org/10.1145/2818950.2818961","url":null,"abstract":"Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power and/or performance overhead. In this paper we present low overhead schemes for improving the reliability of commodity DRAM systems with better power and IPC performance compared to Chipkill-Correct solutions. Specifically, we propose two erasure and error correction (E-ECC) schemes for x8 memory systems that have 12.5% storage overhead and do not require any change in the existing memory architecture. Both schemes have superior error performance due to the use of a strong ECC code, namely, RS(36,32) over GF(28). Scheme 1 activates 18 chips per access and has stronger reliability compared to Chipkill-Correct solutions. If the location of the faulty chip is known, Scheme 1 can correct an additional random error in a second chip. Scheme 2 trades off reliability for higher energy efficiency by activating only 9 chips per access. It cannot correct random errors due to a chip failure but can detect them with 99.9986% probability, and once a chip is marked faulty due to persistent errors, it can correct all errors due to that chip. Synthesis results in 28nm node show that the RS (36,32) code results in a very low decoding latency that can be well-hidden in commodity memory systems and, therefore, it has minimal effect on the DRAM access latency. Evaluations based on SPEC CPU 2006 sequential and multi-programmed workloads show that compared to Chipkill-Correct, the proposed Schemes 1 and 2 improve IPC by an average of 3.2% (maximum of 13.8%) and 4.8% (maximum of 31.8%) and reduce the power consumption by an average of 16.2% (maximum of 25%) and 26.8% (maximum of 36%), respectively.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115098181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Loh, Natalie D. Enright Jerger, Ajaykumar Kannan, Yasuko Eckert
{"title":"Interconnect-Memory Challenges for Multi-chip, Silicon Interposer Systems","authors":"G. Loh, Natalie D. Enright Jerger, Ajaykumar Kannan, Yasuko Eckert","doi":"10.1145/2818950.2818951","DOIUrl":"https://doi.org/10.1145/2818950.2818951","url":null,"abstract":"Silicon interposer technology is promising for large-scale integration of memory within a processor package. While past work on vertical, 3D-stacked memory allows a stack of memory to be placed directly on top of a processor, the total amount of memory that could be integrated is limited by the size of the processor die. With silicon interposers, multiple memory stacks can be integrated inside the processor package, thereby increasing both the capacity and the bandwidth provided by the 3D memory. However, the full potential of all of this integrated memory may be squandered if the in-package interconnect architecture cannot keep up with the data rates provided by the multiple memory stacks. This position paper describes key issues in providing the interconnect support for aggressive interposer-based memory integration, and argues for additional research efforts to address these challenges to enable integrated memory to deliver its full value.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125644412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Jayaraj, Arun Rodrigues, S. Hammond, G. Voskuilen
{"title":"The Potential and Perils of Multi-Level Memory","authors":"J. Jayaraj, Arun Rodrigues, S. Hammond, G. Voskuilen","doi":"10.1145/2818950.2818976","DOIUrl":"https://doi.org/10.1145/2818950.2818976","url":null,"abstract":"The future of memory systems is Multi-Level Memory (MLM). In a MLM system the main memory is comprised of two or more types of memory instead of a conventional DDR-DRAM-only main memory. By combining different memory technologies, an MLM system can potentially offer more usable bandwidth and more capacity for a similar cost as a conventional memory system. However, substantial software and hardware design challenges must be overcome to make this potential real. It is our position that the diversity of application access patterns precludes any simple \"one size fits all\" approach and that better tools and design processes will be needed to fulfill the potential of MLM. Efficient implementations of MLM will require a high degree of co-design and coordination between hardware and software. The simulation framework we have built for this study can aid tool building to solve the programming challenges.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122423902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Semantic Gap Between Software and the Memory System","authors":"Jim Stevens, Paul Tschirhart, B. Jacob","doi":"10.1145/2818950.2818957","DOIUrl":"https://doi.org/10.1145/2818950.2818957","url":null,"abstract":"The virtual memory system defined in the 1960s remains the primary interface between software and the physical memory system. Over time, operating systems and memory controllers evolved to become more intelligent about goals such as memory allocation, prefetching, security, and fairness. However, the limited knowledge that each side has of the other creates a significant semantic gap that may be artificially limiting the performance of todayâĂŹs memory systems. In this paper, we discuss the kinds of optimizations that occur on each side of the memory system and what types of knowledge could be shared between hardware and software to improve system performance.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127578756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2015 International Symposium on Memory Systems","authors":"","doi":"10.1145/2818950","DOIUrl":"https://doi.org/10.1145/2818950","url":null,"abstract":"","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134186562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}