2007 25th International Conference on Computer Design最新文献

Passive compensation for high performance inter-chip communication 无源补偿的高性能芯片间通信

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601951

Chun-Chen Liu, Haikun Zhu, Chung-Kuan Cheng

引用次数: 5

A Study on self-timed asynchronous subthreshold logic 自定时异步子阈值逻辑的研究

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601949

N. Lotze, M. Ortmanns, Y. Manoli

引用次数: 23

Fine grain 3D integration for microarchitecture design through cube packing exploration 通过立方体填充探索微架构设计的细粒度三维集成

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601911

Yongxiang Liu, Yuchun Ma, E. Kursun, Glenn D. Reinman, J. Cong

引用次数: 28

LEMap: Controlling leakage in large chip-multiprocessor caches via profile-guided virtual address translation LEMap:通过配置文件引导的虚拟地址转换控制大型芯片多处理器缓存中的泄漏

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601934

Jugash Chandarlapati, Mainak Chaudhuri

{"title":"LEMap: Controlling leakage in large chip-multiprocessor caches via profile-guided virtual address translation","authors":"Jugash Chandarlapati, Mainak Chaudhuri","doi":"10.1109/ICCD.2007.4601934","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601934","url":null,"abstract":"The emerging trend of larger number of cores or processors on a single chip in the server, desktop, and mobile notebook platforms necessarily demands larger amount of on-chip last level cache. However, larger caches threaten to dramatically increase the leakage power as the industry moves into deeper sub-micron technology. In this paper, with the aim of reducing leakage energy we introduce LEMap (low energy map), a novel virtual address translation scheme to control the set of physical pages mapped to each bank of a large multi-banked non-uniform access L2 cache shared across all the cores. Combination of profiling, a simple off-line clustering algorithm, and a new flavor of Irix-style application-directed page placement system call maps the virtual pages that are accessed in the L2 cache roughly together onto the same region of the cache. Thus LEMap makes the access windows of the pages mapped to a region roughly identical and increases the average idle time of a region. As a result, powering down a region after the last access to the clusters of the corresponding virtual pages saves a much bigger amount of L2 cache energy compared to a usual virtual address translation scheme that is oblivious to access patterns. Our execution-driven simulation of an eight-core chip-multiprocessor with a 16 MB shared L2 cache using a 65 nm process on eight shared memory parallel applications drawn from SPLASH-2, SPEC OMP, and DIS suites shows that LEMap, on average, saves 7% of total energy, 50% of L2 cache energy, and 52% of L2 cache power while suffering from a 3% loss in performance compared to a baseline system that employs drowsy cells as well as region power-down without access clustering.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"s1-15 1","pages":"423-430"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85971662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Exploiting eDRAM bandwidth with data prefetching: simulation and measurements 利用eDRAM带宽与数据预取:模拟和测量

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601945

V. Salapura, J. Brunheroto, F. Redígolo, A. Gara

引用次数: 3

Implementing a 2-Gbs 1024-bit ½-rate low-density parity-check code decoder in three-dimensional integrated circuits 在三维集成电路中实现2gb 1024位半速率低密度奇偶校验码解码器

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601900

Lili Zhou, C. Wakayama, Robin Panda, N. Jangkrajarng, B. Hu, C. Shi

引用次数: 2

Multi-core data streaming architecture for ray tracing 用于光线追踪的多核数据流架构

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601897

Yoshiyuki Kaeriyama, Daichi Zaitsu, Ken-ichi Suzuki, Hiroaki Kobayashi, N. Ohba

引用次数: 0

Power variations of multi-port routers in an application-specific NoC design : A case study 多端口路由器在特定应用NoC设计中的功率变化:一个案例研究

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601958

B. Sethuraman, R. Vemuri

引用次数: 0

Floating-point division algorithms for an x86 microprocessor with a rectangular multiplier 带矩形乘法器的x86微处理器的浮点除法算法

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601917

M. Schulte, Dimitri Tan, C. Lemonds

引用次数: 15

Hardware libraries: An architecture for economic acceleration in soft multi-core environments 硬件库:在软多核环境中实现经济加速的体系结构

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601898

David Meisner, S. Reda

{"title":"Hardware libraries: An architecture for economic acceleration in soft multi-core environments","authors":"David Meisner, S. Reda","doi":"10.1109/ICCD.2007.4601898","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601898","url":null,"abstract":"In single processor architectures, computationally- intensive functions are typically accelerated using hardware accelerators, which exploit the concurrency in the function code to achieve a significant speedup over software. The increased design constraints from power density and signal delay have shifted processor architectures in general towards multi-core designs. The migration to multi-core designs introduces the possibility of sharing hardware accelerators between cores. In this paper, we propose the concept of a hardware library, which is a pool of accelerated functions that are accessible by multiple cores. We find that sharing provides significant reductions in the area, logic usage and leakage power required for hardware acceleration. Contention for these units may exist in certain cases; however, the savings in terms of chip area are more appealing to many applications, particularly the embedded domain. We study the performance implications for our proposal using various multi-core arrangements, with actual implementations in FPGA fabrics. FPGAs are particularly appealing due to their cost effectiveness and the attained area savings enable designers to easily add functionality without significant chip revision. Our results show that is possible to save up to 37% of a chip's available logic and interconnect resources at a negligible impact (< 3%) to the performance.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"21 1","pages":"179-186"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88051510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0