Proceedings of the 7th ACM international conference on Computing frontiers最新文献_第5页

A predictable communication assist 可预测的沟通协助

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787301

A. Shabbir, S. Stuijk, Akash Kumar, B. Theelen, B. Mesman, H. Corporaal

引用次数: 6

Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture 将现有的缓参无关线性代数HPC模块移植到larrabee架构

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787298

A. Heinecke, C. Trinitis, J. Weidendorfer

{"title":"Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture","authors":"A. Heinecke, C. Trinitis, J. Weidendorfer","doi":"10.1145/1787275.1787298","DOIUrl":"https://doi.org/10.1145/1787275.1787298","url":null,"abstract":"Cache-obliviousness represents an important but relatively new concept for cache optimization. As cache-oblivious algorithms perform well on architectures with arbitrary cache configurations, the programming effort required for porting and optimizing for future architectures can be significantly reduced. In [8] and [9], fast parallel cache-oblivious linear algebra modules have been presented. The underlying matrix storing schemes are based on space filling curves. For matrix multiplication, all cache misses can be avoided, whereas for the LU decomposition algorithm the number of cache misses is minimized. It has been shown that the resulting codes work very well on several kinds of systems ranging from laptops to supercomputers. In this paper, we will show that the runtime characteristics of our existing cache-oblivious codes can be preserved on newer Intel processors. Special emphasis is put on the first many-core processor architecture with complete hardware-based cache coherency: The Larrabee Architecture. As the latter is expected to be available as a PCIe card connected to the host system, porting had to take into account transfer of data structures between different memory address spaces. Unfortunately, Larrabee was canceled as a graphics device for 2010, but Intel is expected to outline futher steps about Larrabee during 2010.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122251937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Energy efficient biomolecular simulations with FPGA-based reconfigurable computing 基于fpga的可重构计算的高能效生物分子模拟

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787294

Ananth Nallamuthu, M. C. Smith, Scott S. Hampton, P. Agarwal, S. Alam

{"title":"Energy efficient biomolecular simulations with FPGA-based reconfigurable computing","authors":"Ananth Nallamuthu, M. C. Smith, Scott S. Hampton, P. Agarwal, S. Alam","doi":"10.1145/1787275.1787294","DOIUrl":"https://doi.org/10.1145/1787275.1787294","url":null,"abstract":"Reconfigurable computing (RC) is being investigated as a hardware solution for improving time-to-solution for biomolecular simulations. A number of popular molecular dynamics (MD) codes are used to study various aspects of biomolecules. These codes are now capable of simulating nanosecond time-scale trajectories per day on conventional microprocessor-based hardware, but biomolecular processes often occur at the microsecond time-scale or longer. A wide gap exists between the desired and achievable simulation capability; therefore, there is considerable interest in alternative algorithms and hardware for improving the time-to-solution of MD codes. The fine-grain parallelism provided by Field Programmable Gate Arrays (FPGA) combined with their low power consumption make them an attractive solution for improving the performance of MD simulations. In this work, we use an FPGA-based coprocessor to accelerate the compute-intensive calculations of LAMMPS, a popular MD code, achieving up to 5.5 fold speed-up on the non-bonded force computations of the particle mesh Ewald method and up to 2.2 fold speed-up in overall time-to-solution, and potentially an increase by a factor of 9 in power-performance efficiencies for the pair-wise computations. The results presented here provide an example of the multi-faceted benefits to an application in a heterogeneous computing environment.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115333869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Applying statistical machine learning to multicore voltage & frequency scaling 将统计机器学习应用于多核电压和频率缩放

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787336

Michael Moeng, R. Melhem

{"title":"Applying statistical machine learning to multicore voltage & frequency scaling","authors":"Michael Moeng, R. Melhem","doi":"10.1145/1787275.1787336","DOIUrl":"https://doi.org/10.1145/1787275.1787336","url":null,"abstract":"Dynamic Voltage/Frequency Scaling (DVFS) is a useful tool for improving system energy efficiency, especially in multi-core chips where energy is more of a limiting factor. Per-core DVFS, where cores can independently scale their voltages and frequencies, is particularly effective. We present a DVFS policy using machine learning, which learns the best frequency choices for a machine as a decision tree. Machine learning is used to predict the frequency which will minimize the expected energy per user-instruction (epui) or energy per (user-instruction)2 (epui2). While each core independently sets its frequency and voltage, a core is sensitive to other cores' frequency settings. Also, we examine the viability of using only partial training to train our policy, rather than full profiling for each program. We evaluate our policy on a 16-core machine running multiprogrammed, multithreaded benchmarks from the PARSEC benchmark suite against a baseline fixed frequency as well as a recently-proposed greedy policy. For 1ms DVFS intervals, our technique improves system epui2 by 14.4% over the baseline no-DVFS policy and 11.3% on average over the greedy policy.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"356 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116239416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

A high-speed AES architecture implementation 高速AES架构实现

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787300

Flavius Opritoiu, M. Vladutiu, L. Prodan, M. Udrescu

引用次数: 6

Session details: Processor architecture 会话细节:处理器架构

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/3251919

S. Mckee

引用次数: 0

Efficient implementation of GPGPU synchronization primitives on CPUs GPGPU同步原语在cpu上的高效实现

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787295

J. Gummaraju, B. Sander, L. Morichetti, Benedict R. Gaster, Lee W. Howes

引用次数: 3

Session details: Power 2 会话细节:Power 2

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/3251916

R. Gioiosa

引用次数: 0

Efficient pattern matching on GPUs for intrusion detection systems 基于gpu的入侵检测系统高效模式匹配

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787296

Antonino Tumeo, Oreste Villa, D. Sciuto

引用次数: 50

A hyperscalar multi-core architecture 一个超标量多核架构

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787291

J. Chiu, Yu-Liang Chou, Ding-Siang Su

引用次数: 1