Proceedings of the 7th ACM international conference on Computing frontiers最新文献

筛选
英文 中文
A predictable communication assist 可预测的沟通协助
Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787301
A. Shabbir, S. Stuijk, Akash Kumar, B. Theelen, B. Mesman, H. Corporaal
{"title":"A predictable communication assist","authors":"A. Shabbir, S. Stuijk, Akash Kumar, B. Theelen, B. Mesman, H. Corporaal","doi":"10.1145/1787275.1787301","DOIUrl":"https://doi.org/10.1145/1787275.1787301","url":null,"abstract":"Modern multi-processor systems need to provide guaranteed services to their users. A communication assist (CA) helps in achieving tight timing guarantees. In this paper, we present a CA for a tile-based MP-SoC. Our CA has smaller memory requirements and a lower latency than existing CAs. The CA has been implemented in hardware. We compare it with two existing DMA controllers. When compared with these DMAs, our CA is up-to 44% smaller in terms of equivalent gate count.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131228835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture 将现有的缓参无关线性代数HPC模块移植到larrabee架构
Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787298
A. Heinecke, C. Trinitis, J. Weidendorfer
{"title":"Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture","authors":"A. Heinecke, C. Trinitis, J. Weidendorfer","doi":"10.1145/1787275.1787298","DOIUrl":"https://doi.org/10.1145/1787275.1787298","url":null,"abstract":"Cache-obliviousness represents an important but relatively new concept for cache optimization. As cache-oblivious algorithms perform well on architectures with arbitrary cache configurations, the programming effort required for porting and optimizing for future architectures can be significantly reduced. In [8] and [9], fast parallel cache-oblivious linear algebra modules have been presented. The underlying matrix storing schemes are based on space filling curves. For matrix multiplication, all cache misses can be avoided, whereas for the LU decomposition algorithm the number of cache misses is minimized. It has been shown that the resulting codes work very well on several kinds of systems ranging from laptops to supercomputers. In this paper, we will show that the runtime characteristics of our existing cache-oblivious codes can be preserved on newer Intel processors. Special emphasis is put on the first many-core processor architecture with complete hardware-based cache coherency: The Larrabee Architecture. As the latter is expected to be available as a PCIe card connected to the host system, porting had to take into account transfer of data structures between different memory address spaces. Unfortunately, Larrabee was canceled as a graphics device for 2010, but Intel is expected to outline futher steps about Larrabee during 2010.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122251937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Energy efficient biomolecular simulations with FPGA-based reconfigurable computing 基于fpga的可重构计算的高能效生物分子模拟
Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787294
Ananth Nallamuthu, M. C. Smith, Scott S. Hampton, P. Agarwal, S. Alam
{"title":"Energy efficient biomolecular simulations with FPGA-based reconfigurable computing","authors":"Ananth Nallamuthu, M. C. Smith, Scott S. Hampton, P. Agarwal, S. Alam","doi":"10.1145/1787275.1787294","DOIUrl":"https://doi.org/10.1145/1787275.1787294","url":null,"abstract":"Reconfigurable computing (RC) is being investigated as a hardware solution for improving time-to-solution for biomolecular simulations. A number of popular molecular dynamics (MD) codes are used to study various aspects of biomolecules. These codes are now capable of simulating nanosecond time-scale trajectories per day on conventional microprocessor-based hardware, but biomolecular processes often occur at the microsecond time-scale or longer. A wide gap exists between the desired and achievable simulation capability; therefore, there is considerable interest in alternative algorithms and hardware for improving the time-to-solution of MD codes. The fine-grain parallelism provided by Field Programmable Gate Arrays (FPGA) combined with their low power consumption make them an attractive solution for improving the performance of MD simulations. In this work, we use an FPGA-based coprocessor to accelerate the compute-intensive calculations of LAMMPS, a popular MD code, achieving up to 5.5 fold speed-up on the non-bonded force computations of the particle mesh Ewald method and up to 2.2 fold speed-up in overall time-to-solution, and potentially an increase by a factor of 9 in power-performance efficiencies for the pair-wise computations. The results presented here provide an example of the multi-faceted benefits to an application in a heterogeneous computing environment.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115333869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Applying statistical machine learning to multicore voltage & frequency scaling 将统计机器学习应用于多核电压和频率缩放
Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787336
Michael Moeng, R. Melhem
{"title":"Applying statistical machine learning to multicore voltage & frequency scaling","authors":"Michael Moeng, R. Melhem","doi":"10.1145/1787275.1787336","DOIUrl":"https://doi.org/10.1145/1787275.1787336","url":null,"abstract":"Dynamic Voltage/Frequency Scaling (DVFS) is a useful tool for improving system energy efficiency, especially in multi-core chips where energy is more of a limiting factor. Per-core DVFS, where cores can independently scale their voltages and frequencies, is particularly effective. We present a DVFS policy using machine learning, which learns the best frequency choices for a machine as a decision tree. Machine learning is used to predict the frequency which will minimize the expected energy per user-instruction (epui) or energy per (user-instruction)2 (epui2). While each core independently sets its frequency and voltage, a core is sensitive to other cores' frequency settings. Also, we examine the viability of using only partial training to train our policy, rather than full profiling for each program. We evaluate our policy on a 16-core machine running multiprogrammed, multithreaded benchmarks from the PARSEC benchmark suite against a baseline fixed frequency as well as a recently-proposed greedy policy. For 1ms DVFS intervals, our technique improves system epui2 by 14.4% over the baseline no-DVFS policy and 11.3% on average over the greedy policy.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"356 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116239416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
A high-speed AES architecture implementation 高速AES架构实现
Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787300
Flavius Opritoiu, M. Vladutiu, L. Prodan, M. Udrescu
{"title":"A high-speed AES architecture implementation","authors":"Flavius Opritoiu, M. Vladutiu, L. Prodan, M. Udrescu","doi":"10.1145/1787275.1787300","DOIUrl":"https://doi.org/10.1145/1787275.1787300","url":null,"abstract":"We present in this paper a high performance implementation for the Advanced Encryption Standard (AES) standard. The design goal is directed toward efficient implementation of an AES cryptocore. The proposed architecture exhibits parallelism by concurrently processing all the bytes of a data block and computes each round key on-the-fly. The design implements both AES encryption and decryption by efficiently sharing the complex design modules. The proposed high-speed iterative implementation performing the AES operations in 11 clock cycles was synthesized for ALTERA's Cyclone II FPGA.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116590071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Session details: Processor architecture 会话细节:处理器架构
S. Mckee
{"title":"Session details: Processor architecture","authors":"S. Mckee","doi":"10.1145/3251919","DOIUrl":"https://doi.org/10.1145/3251919","url":null,"abstract":"","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128720580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient implementation of GPGPU synchronization primitives on CPUs GPGPU同步原语在cpu上的高效实现
Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787295
J. Gummaraju, B. Sander, L. Morichetti, Benedict R. Gaster, Lee W. Howes
{"title":"Efficient implementation of GPGPU synchronization primitives on CPUs","authors":"J. Gummaraju, B. Sander, L. Morichetti, Benedict R. Gaster, Lee W. Howes","doi":"10.1145/1787275.1787295","DOIUrl":"https://doi.org/10.1145/1787275.1787295","url":null,"abstract":"The GPGPU model represents a style of execution where thousands of threads execute in a data-parallel fashion, with a large subset (typically 10s to 100s) needing frequent synchronization. As the GPGPU model evolves target both GPUs and CPUs as acceleration targets, thread synchronization becomes an important problem when running on CPUs. CPUs have little hardware support for synchronization and must be emulated in software, reducing application performance. This paper presents software techniques to implement the GPGPU synchronization primitives on CPUs, while maintaining application debug-ability. Performing limit studies using real hardware, we evaluate the potential performance benefits of an efficient barrier primitive.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121908031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Power 2 会话细节:Power 2
R. Gioiosa
{"title":"Session details: Power 2","authors":"R. Gioiosa","doi":"10.1145/3251916","DOIUrl":"https://doi.org/10.1145/3251916","url":null,"abstract":"","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121680291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient pattern matching on GPUs for intrusion detection systems 基于gpu的入侵检测系统高效模式匹配
Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787296
Antonino Tumeo, Oreste Villa, D. Sciuto
{"title":"Efficient pattern matching on GPUs for intrusion detection systems","authors":"Antonino Tumeo, Oreste Villa, D. Sciuto","doi":"10.1145/1787275.1787296","DOIUrl":"https://doi.org/10.1145/1787275.1787296","url":null,"abstract":"In this paper we present an efficient implementation of the Aho-Corasick pattern matching algorithm on Graphics Processing Units (GPU), showing how we redesigned the algorithm and the data structures to fit on the architecture and comparing it with an equivalent implementation on the CPU. We show that with a synthetic dataset, our implementation obtains a speedup up to 6.67 with respect to the CPU solution.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"24 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132592718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
A hyperscalar multi-core architecture 一个超标量多核架构
Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787291
J. Chiu, Yu-Liang Chou, Ding-Siang Su
{"title":"A hyperscalar multi-core architecture","authors":"J. Chiu, Yu-Liang Chou, Ding-Siang Su","doi":"10.1145/1787275.1787291","DOIUrl":"https://doi.org/10.1145/1787275.1787291","url":null,"abstract":"This paper proposes a reconfigurable multi-core architecture, called hyperscalar that enables many scalar cores to be united dynamically as a larger superscalar processor to accelerate a thread. To accomplish this, we propose the virtual shared register files (VSRF) that allow the instructions of a thread executed in the united cores to logically face a uniform set of register files. We also propose the instruction analyzer (IA) with the capability of detecting and tagging the dependence information to the newly fetched instructions. According to the tags, instructions in the united cores can issue requests to obtain their remote operands via the VSRF. The reconfigurable feature of hyperscalar can cover a spectrum of workloads well, providing high single-thread performance when TLP is low and high throughput when TLP is high. Simulation results show that the a 8-core hyperscalar chip multiprocessor's 2, 4, and 8-core-united configurations archive 94%, 90%, and 83% of the performance of the monolithic 2, 4, and 8-issue out-of-order superscalar processors with lower area costs and better support for software diversity.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126379518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信