2010 22nd International Symposium on Computer Architecture and High Performance Computing最新文献

Improving In-memory Column-Store Database Predicate Evaluation Performance on Multi-core Systems 多核系统中内存列存储数据库谓词评估性能的改进

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.17

Hong Min, H. Franke

引用次数: 5

Using Support Vector Machines to Learn How to Compile a Method 使用支持向量机学习如何编译一个方法

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.35

Ricardo Nabinger Sanchez, J. N. Amaral, D. Szafron, Marius Pirvu, Mark G. Stoodley

引用次数: 3

MOPSO Applied to Architecture Tuning with Unified Second-Level Cache for Energy and Performance Optimization 基于统一二级缓存的MOPSO在架构调优中的应用

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.40

F. Cordeiro, A. Silva-Filho, G. R. Carvalho

引用次数: 3

Parallel Linear Octree Meshing with Immersed Surfaces 浸入曲面的平行线性八叉树网格划分

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.26

J. Camata, A. Coutinho

引用次数: 8

A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model 应用于天气预报模型的负载平衡算法的比较分析

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.18

E. Rodrigues, P. Navaux, J. Panetta, Á. Fazenda, C. Mendes, L. Kalé

引用次数: 43

High Level Power and Energy Exploration Using ArchC 利用ArchC进行高水平电力和能源勘探

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.13

T. Gupta, C. Bertolini, O. Héron, N. Ventroux, T. Zimmer, F. Marc

引用次数: 8

Performance Debugging of GPGPU Applications with the Divergence Map GPGPU应用发散图的性能调试

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.38

Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintão Pereira, Wagner Meira Jr

{"title":"Performance Debugging of GPGPU Applications with the Divergence Map","authors":"Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintão Pereira, Wagner Meira Jr","doi":"10.1109/SBAC-PAD.2010.38","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.38","url":null,"abstract":"The increasing programability and the high computational power of Graphical Processing Units (GPU) make them attractive to general purpose programming. However, taking full bene t of this execution environment is a challenging task. One of these challenges stem from divergences, a phenomenon that occurs when threads that execute in lock-step are forced to take di erent program paths due to branches in the code. In face of divergences, some threads will have to wait, idly, while their diverging siblings execute. Optimizing the code to avoid divergences is diffcult, because this task demands a deep understanding of programs that might be large and convoluted. In order to facilitate the detection of divergences, this paper introduces the divergence map, a data structure that indicates the location and the volume of divergences in a program. We build this map via dynamic profiling techniques, which we have implemented on top of an open source CUDA compiler. To illustrate the importance of the divergence map, we have used it to pin-point the core regions that must be optimized in well known public applications. By hand optimizing some applications, we have added 9-11% speedups onto kernels that have already gone through the sieve of many programmers.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121775449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Flexible Error Protection for Energy Efficient Reliable Architectures 灵活的错误保护节能可靠的架构

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.37

Timothy N. Miller, Nagarjuna Surapaneni, R. Teodorescu

{"title":"Flexible Error Protection for Energy Efficient Reliable Architectures","authors":"Timothy N. Miller, Nagarjuna Surapaneni, R. Teodorescu","doi":"10.1109/SBAC-PAD.2010.37","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.37","url":null,"abstract":"Technology scaling is having an increasingly detrimental effect on microprocessor reliability, with increased variability and higher susceptibility to errors. At the same time, as integration of chip multiprocessors increases, power consumption is becoming a significant bottleneck that could threaten their growth. To deal with these competing trends, energy-efficient solutions are needed to deal with reliability problems. This paper presents a reliable multicore architecture that provides targeted error protection by adapting to the characteristics of individual cores and workloads, with the goal of providing reliability with minimum energy. The user can specify an acceptable reliability target for each chip, core, or application. The system then adjusts a range of parameters, including replication and supply voltage, to meet that reliability goal. In this multicore architecture, each core consists of a pair of pipelines that can run independently (running separate threads) or in concert (running the same thread and verifying results). Redundancy is enabled selectively, at functional unit granularity. The architecture also employs timing speculation for mitigation of variation-induced timing errors and to reduce the power overhead of error protection. On-line control based on machine learning dynamically adjusts multiple parameters to minimize energy consumption. Evaluation shows that dynamic adaptation of voltage and redundancy can reduce the energy delay product of a CMP by 30 − 60% compared to static dual modular redundancy.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125791943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Feedback-Driven Restructuring of Multi-threaded Applications for NUCA Cache Performance in CMPs 基于反馈驱动的多线程应用程序重构在cmp中的NUCA缓存性能

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.20

S. Bartolini, P. Foglia, M. Solinas, C. Prete

{"title":"Feedback-Driven Restructuring of Multi-threaded Applications for NUCA Cache Performance in CMPs","authors":"S. Bartolini, P. Foglia, M. Solinas, C. Prete","doi":"10.1109/SBAC-PAD.2010.20","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.20","url":null,"abstract":"This paper addresses feedback-directed restructuring techniques tuned to Non Uniform Cache Architectures (NUCA) in CMPs running multi-threaded applications. Access time to NUCA caches depends on the location of the referred block, so the locality and cache mapping of the application influence the overall performance. We show techniques for altering the distribution of applications into the cache space as to achieve improved average memory access time. In CMPs running multi-threaded applications, the aggregated accesses (and locality) of the processors form the actual cache load and pose specific issues. We consider a number of Splash-2 and Parsec benchmarks on an 8 processor system and we show that a relatively simple remapping algorithm is able to improve the average Static-NUCA (SNUCA) cache access time by 5.5% and allows an SNUCA cache to surpass the performance of a more complex dynamic-NUCA (DNUCA) for most benchmarks. Then, we present a more sophisticated remapping algorithm, relying on cache geometry information and on the access distribution statistics from individual processors, that reduces the average cache access time by 10.2% and is very stable across all benchmarks.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125424508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Characterizing Energy Consumption in Hardware Transactional Memory Systems 硬件事务性存储系统能耗表征

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.11

Epifanio Gaona-Ramírez, J. Gil, Juan Fernández, M. Acacio

{"title":"Characterizing Energy Consumption in Hardware Transactional Memory Systems","authors":"Epifanio Gaona-Ramírez, J. Gil, Juan Fernández, M. Acacio","doi":"10.1109/SBAC-PAD.2010.11","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.11","url":null,"abstract":"Transactional Memory is currently being advocated as a promising alternative to lock-based synchronization because it simplifies multithreaded programming. In this way, future many-core CMP architectures may need to provide hardware support for transactional memory. On the other hand, power dissipation constitutes a first class consideration in multicore processor design. In this work, we characterize the performance and energy consumption of two well-known Hardware Transactional Memory systems that employ opposite policies for data versioning and conflict management. More specifically, we compare the Log TM-SE Eager-Eager system and a version of the Scalable TCC Lazy-Lazy system that enables parallel commits. To the best of our knowledge, this is the first characterization in terms of energy consumption of hardware transactional memory systems. To do that, we extended the GEMS simulator to estimate the energy consumed in the on-chip caches according to CACTI, and used the interconnection network energy model given by Orion 2. Results show that the energy consumption of the Eager-Eager system is 60% higher on average than in the Lazy-Lazy case, whereas performance differences between the two systems are 42% on average. Finally, we found that although on average Lazy-Lazy beats Eager-Eager there are considerable deviations in performance depending on the particular characteristics of each application.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132968556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19