2012 39th Annual International Symposium on Computer Architecture (ISCA)最新文献_第3页

A case for random shortcut topologies for HPC interconnects HPC互连的随机快捷拓扑的案例

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337179

M. Koibuchi, Hiroki Matsutani, H. Amano, D. Hsu, H. Casanova

引用次数: 117

Configurable fine-grain protection for multicore processor virtualization 多核处理器虚拟化的可配置细粒度保护

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337213

D. Wentzlaff, C. Jackson, Patrick Griffin, A. Agarwal

引用次数: 8

Branch regulation: Low-overhead protection from code reuse attacks 分支规则:防止代码重用攻击的低开销保护

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337171

M. Kayaalp, Meltem Ozsoy, N. Abu-Ghazaleh, D. Ponomarev

{"title":"Branch regulation: Low-overhead protection from code reuse attacks","authors":"M. Kayaalp, Meltem Ozsoy, N. Abu-Ghazaleh, D. Ponomarev","doi":"10.1145/2366231.2337171","DOIUrl":"https://doi.org/10.1145/2366231.2337171","url":null,"abstract":"Code reuse attacks (CRAs) are recent security exploits that allow attackers to execute arbitrary code on a compromised machine. CRAs, exemplified by return-oriented and jump-oriented programming approaches, reuse fragments of the library code, thus avoiding the need for explicit injection of attack code on the stack. Since the executed code is reused existing code, CRAs bypass current hardware and software security measures that prevent execution from data or stack regions of memory. While software-based full control flow integrity (CFI) checking can protect against CRAs, it includes significant overhead, involves non-trivial effort of constructing a control flow graph, relies on proprietary tools and has potential vulnerabilities due to the presence of unintended branch instructions in architectures such as ×86 - those branches are not checked by the software CFI. We propose branch regulation (BR), a lightweight hardware-supported protection mechanism against the CRAs that addresses all limitations of software CFI. BR enforces simple control flow rules in hardware at the function granularity to disallow arbitrary control flow transfers from one function into the middle of another function. This prevents common classes of CRAs without the complexity and run-time overhead of full CFI enforcement. BR incurs a slowdown of about 2% and increases the code footprint by less than 1% on the average for the SPEC 2006 benchmarks.","PeriodicalId":193578,"journal":{"name":"2012 39th Annual International Symposium on Computer Architecture (ISCA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123695969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 94

Scheduling heterogeneous multi-cores through performance impact estimation (PIE) 通过性能影响评估(PIE)调度异构多核

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337184

K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, J. Emer

{"title":"Scheduling heterogeneous multi-cores through performance impact estimation (PIE)","authors":"K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, J. Emer","doi":"10.1145/2366231.2337184","DOIUrl":"https://doi.org/10.1145/2366231.2337184","url":null,"abstract":"Single-ISA heterogeneous multi-core processors are typically composed of small (e.g., in-order) power-efficient cores and big (e.g., out-of-order) high-performance cores. The effectiveness of heterogeneous multi-cores depends on how well a scheduler can map workloads onto the most appropriate core type. In general, small cores can achieve good performance if the workload inherently has high levels of ILP. On the other hand, big cores provide good performance if the workload exhibits high levels of MLP or requires the ILP to be extracted dynamically. This paper proposes Performance Impact Estimation (PIE) as a mechanism to predict which workload-to-core mapping is likely to provide the best performance. PIE collects CPI stack, MLP and ILP profile information, and estimates performance if the workload were to run on a different core type. Dynamic PIE adjusts the scheduling at runtime and thereby exploits fine-grained time-varying execution behavior. We show that PIE requires limited hardware support and can improve system performance by an average of 5.5% over recent state-of-the-art scheduling proposals and by 8.7% over a sampling-based scheduling policy.","PeriodicalId":193578,"journal":{"name":"2012 39th Annual International Symposium on Computer Architecture (ISCA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129244961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 346

Tolerating process variations in nanophotonic on-chip networks 纳米光子片上网络中工艺变化的容忍

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337176

Yi Xu, Jun Yang, R. Melhem

{"title":"Tolerating process variations in nanophotonic on-chip networks","authors":"Yi Xu, Jun Yang, R. Melhem","doi":"10.1145/2366231.2337176","DOIUrl":"https://doi.org/10.1145/2366231.2337176","url":null,"abstract":"Nanophontonic networks, a potential candidate for future networks on-chip, have been challenged for their reliability due to several device-level limitations. One of the main issues is that fabrication errors (a.k.a. process variations) can cause devices to malfunction, rendering communication unreliable. For example, microring resonator, a preferred optical modulator device, may not resonate at the designated wavelength under process variations (PV), leading to communication errors and bandwidth loss. This paper proposes a series of solutions to the wavelength drifting problem of microrings and subsequent bandwidth loss problem of an optical network, due to PV. The objective is to maximize network bandwidth through proper arrangement among microrings and wavelengths with minimum power requirement. Our arrangement, called “MinTrim”, solves this problem using simple integer linear programming, adding supplementary microrings and allowing flexible assignment of wavelengths to network nodes as long as the resulting network presents maximal bandwidth. Each step is shown to improve bandwidth provisioning with lower power requirement. Evaluations on a sample network show that a baseline network could lose more than 40% bandwidth due to PV. Such loss can be recovered by MinTrim to produce a network with 98.4% working bandwidth. In addition, the power required in arranging microrings is 39% lower than the baseline. Therefore, MinTrim provides an efficient PV-tolerant solution to improving the reliability of on-chip phontonics.","PeriodicalId":193578,"journal":{"name":"2012 39th Annual International Symposium on Computer Architecture (ISCA)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127035737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Buffer-on-board memory systems 缓冲板载存储器系统

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337204

E. Cooper-Balis, P. Rosenfeld, B. Jacob

{"title":"Buffer-on-board memory systems","authors":"E. Cooper-Balis, P. Rosenfeld, B. Jacob","doi":"10.1145/2366231.2337204","DOIUrl":"https://doi.org/10.1145/2366231.2337204","url":null,"abstract":"The design and implementation of the commodity memory architecture has resulted in significant performance and capacity limitations. To circumvent these limitations, designers and vendors have begun to place intermediate logic between the CPU and DRAM. This additional logic has two functions: to control the DRAM and to communicate with the CPU over a fast and narrow bus. The benefit provided by this logic is a reduction in pin-out to the memory system and increased signal integrity to the DRAM, allowing faster clock rates while maintaining capacity. While the few vendors utilizing this design have used the same general approach, their implementations vary greatly in their non-trivial details. A hardware-verified simulation suite is developed to accurately model and evaluate the behavior of this buffer-on-board memory system. A study of this design space is used to determine optimal use of the resources involved. This includes DRAM and bus organization, queue storage, and mapping schemes. Various constraints based on implementation costs are placed on simulated configurations to confirm that these optimizations apply to viable systems. Finally, full system simulations are performed to better understand how this memory system interacts with an operating system executing an application with the goal of uncovering behaviors not present in simple limit case simulations. When applying insights gleaned from these simulations, optimal performance can be achieved while still considering outside constraints (i.e., pin-out, power, and fabrication costs).","PeriodicalId":193578,"journal":{"name":"2012 39th Annual International Symposium on Computer Architecture (ISCA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122355165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

A case for exploiting subarray-level parallelism (SALP) in DRAM 在DRAM中利用子阵列级并行性(SALP)的一个案例

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337202

Yoongu Kim, V. Seshadri, Donghyuk Lee, Jamie Liu, O. Mutlu

{"title":"A case for exploiting subarray-level parallelism (SALP) in DRAM","authors":"Yoongu Kim, V. Seshadri, Donghyuk Lee, Jamie Liu, O. Mutlu","doi":"10.1145/2366231.2337202","DOIUrl":"https://doi.org/10.1145/2366231.2337202","url":null,"abstract":"Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the high latency of off-chip memory. Adding more banks to the system to mitigate this problem incurs high system cost. Our goal in this work is to achieve the benefits of increasing the number of banks with a low cost approach. To this end, we propose three new mechanisms that overlap the latencies of different requests that go to the same bank. The key observation exploited by our mechanisms is that a modern DRAM bank is implemented as a collection of subarrays that operate largely independently while sharing few global peripheral structures. Our proposed mechanisms (SALP-1, SALP-2, and MASA) mitigate the negative impact of bank serialization by overlapping different components of the bank access latencies of multiple requests that go to different subarrays within the same bank. SALP-1 requires no changes to the existing DRAM structure and only needs reinterpretation of some DRAM timing parameters. SALP-2 and MASA require only modest changes (<;0.15% area overhead) to the DRAM peripheral structures, which are much less design constrained than the DRAM core. Evaluations show that all our schemes significantly improve performance for both single-core systems and multi-core systems. Our schemes also interact positively with application-aware memory request scheduling in multi-core systems.","PeriodicalId":193578,"journal":{"name":"2012 39th Annual International Symposium on Computer Architecture (ISCA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128466859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 331

Towards energy-proportional datacenter memory with mobile DRAM 面向移动DRAM的能量比例数据中心存储器

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337164

Krishna T. Malladi, Frank A. Nothaft, Karthika Periyathambi, Benjamin C. Lee, C. Kozyrakis, M. Horowitz

引用次数: 250

Probabilistic Shared Cache Management (PriSM) 概率共享缓存管理(PriSM)

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337208

R. Manikantan, K. Rajan, Ramaswamy Govindarajan

引用次数: 94

Improving writeback efficiency with decoupled last-write prediction 通过解耦的最后写预测提高回写效率

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI: 10.1145/2366231.2337195

Zhe Wang, S. Khan, Daniel A. Jiménez

{"title":"Improving writeback efficiency with decoupled last-write prediction","authors":"Zhe Wang, S. Khan, Daniel A. Jiménez","doi":"10.1145/2366231.2337195","DOIUrl":"https://doi.org/10.1145/2366231.2337195","url":null,"abstract":"In modern DDRx memory systems, memory write requests compete with read requests for available memory resources, significantly increasing the average read request service time. Caches are used to mitigate long memory read latency that limits system performance. Dirty blocks in the last-level cache (LLC) that will not be written again before they are evicted will eventually be written back to memory. We refer to these blocks as last-write blocks. In this paper, we propose an LLC writeback technique that improves DRAM efficiency by scheduling predicted last-write blocks early. We propose a low overhead last-write predictor for the LLC. The predicted last-write blocks are made available to the memory controller for scheduling. This technique effectively re-distributes the memory requests and expands writes scheduling opportunities, allowing writes to be serviced efficiently by DRAM. The technique is flexible enough to be applied to any LLC replacement policy. Our evaluation with multi-programmed workloads shows that the technique significantly improves performance by 6.5%-11.4% on average over the traditional writeback technique in an eight-core processor with various DRAM configurations running memory intensive benchmarks.","PeriodicalId":193578,"journal":{"name":"2012 39th Annual International Symposium on Computer Architecture (ISCA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124497832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31