2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation最新文献

On the Evaluation of Dense Chip-Multiprocessor Architectures 密集芯片-多处理器体系结构的评价

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300804

Francisco J. Villa, M. Acacio, José M. García

{"title":"On the Evaluation of Dense Chip-Multiprocessor Architectures","authors":"Francisco J. Villa, M. Acacio, José M. García","doi":"10.1109/ICSAMOS.2006.300804","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300804","url":null,"abstract":"Chip-multiprocessors (CMPs) have been revealed as the most promising way of making efficient use of current improvements in integration scale. Nowadays, commercial CMP releases integrate at most 8 processor cores onto the chip. However, 16 or more processor cores are expected to be offered in near future dense-CMP (D-CMP) systems. In this way, these architectures impose new design restrictions, and some topics, such as the cache-coherence problem, must be reviewed. In this paper we present an exhaustive performance evaluation of two recently proposed D-CMP architectures, making special emphasis on the solution to the cache-coherence problem that each one of them introduces. The shared bus fabric architecture (SBF) features a snoop cache-coherence protocol and is based on a high-performance bus fabric interconnection network. The second architecture follows a directory-based approach and integrates a bi-dimensional mesh as the interconnection network. Our results show that the performance achieved by the SBF architecture is hard-limited by the bandwidth restrictions of the bus fabric. On the other hand, the directory-based architecture outperforms the SBF one, but presents some performance inefficiencies due to the additional indirection that the directory structure stored in the L2 cache level introduces","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122717352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Accelerating RTL Simulation by Several Orders of Magnitude Using Clock Suppression 利用时钟抑制加速RTL仿真的几个数量级

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300818

H. Muhr, Roland Höler

引用次数: 10

Performance Improvements in Microprocessor Systems Utilizing a Coprocessor Data-Path 利用协处理器数据路径的微处理器系统性能改进

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300813

M. D. Galanis, G. Dimitroulakos, C. Goutis

{"title":"Performance Improvements in Microprocessor Systems Utilizing a Coprocessor Data-Path","authors":"M. D. Galanis, G. Dimitroulakos, C. Goutis","doi":"10.1109/ICSAMOS.2006.300813","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300813","url":null,"abstract":"The speedups achieved in a generic microprocessor system by employing a high-performance data-path are presented in this work. The data-path acts as a coprocessor that accelerates computational intensive kernel regions thereby increasing the overall performance. It is composed by flexible computational components (FCCs) that can realize any two-level template of primitive operations. The automated coprocessor synthesis method and its integration to a design flow for executing applications on the system is presented. Analytical exploration in respect to the type of the custom data-path and to the microprocessor architecture is performed. The overall application speedups of eight real-life applications, relative to the software execution on the microprocessor, are estimated using the design flow. These speedups range from 1.75 to 3.95, having an average value of 2.72, while the overhead in circuit area is small. A comparison with another high-performance data-path showed that the proposed coprocessor achieves better performance while having smaller area-time products for the generated data-paths","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124812567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Table-Based Application-Specific Prefetch Engine for Object-Oriented Embedded Systems 面向对象嵌入式系统的基于表的应用程序预取引擎

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300802

S. Hessabi, M. Modarressi, M. Goudarzi, Hani JavanHemmat

引用次数: 2

Parameterized Mapping of Algorithms onto Processor Arrays with Sub-Word Parallelism 具有子字并行性的算法到处理器阵列的参数化映射

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300815

Rainer Schaffer, R. Merker

引用次数: 5

Chip Size Estimation for SOC Design Space Exploration SOC设计空间探索中的芯片尺寸估算

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1016/j.sysarc.2007.01.012

H. Jeschke

引用次数: 2

Area-Aware Optimizations for Resource Constrained Branch Predictors Exploited in Embedded Processors 嵌入式处理器中资源约束分支预测器的区域感知优化

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300808

Babak Salamat, A. Baniasadi, K. J. Deris

引用次数: 2

Static Energy Saving Through Multi-Bank Memory Architecture 通过多存储库架构实现静态节能

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300807

S. Lafond, J. Lilius

引用次数: 2

An Efficient Hierarchical Fuzzy Approach for System Level System-on-a-Chip Design 系统级片上系统设计中一种有效的层次模糊方法

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300817

G. Ascia, V. Catania, A. D. Nuovo, M. Palesi, Davide Patti

引用次数: 0

Modified Hotspot Cache Architecture: A Low Energy Fast Cache for Embedded Processors 改进的热点缓存架构:嵌入式处理器的低能量快速缓存

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300806

K. Ali, M. Aboelaze, S. Datta

{"title":"Modified Hotspot Cache Architecture: A Low Energy Fast Cache for Embedded Processors","authors":"K. Ali, M. Aboelaze, S. Datta","doi":"10.1109/ICSAMOS.2006.300806","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300806","url":null,"abstract":"The cache memory plays a crucial role in the performance of any processor. The cache memory (SRAM), especially the on chip cache, is 3-4 times faster than the main memory (DRAM). It can vastly improve the processor performance and speed. Also the cache consumes much less energy than the main memory. That leads to a huge power saving which is very important for embedded applications. In today's processors, although the cache memory reduces the energy consumption of the processor, however the energy consumption in the on-chip cache account to almost 40% of the total energy consumption of the processor. In this paper, we propose a cache architecture, for the instruction cache, that is a modification of the hotspot architecture. Our proposed architecture consists of a small filter cache in parallel with the hotspot cache, between the L1 cache and the main memory. The small filter cache is to hold the code that was not captured by the hotspot cache. We also propose a prediction mechanism to steer the memory access to either the hotspot cache, the filter cache, or the L1 cache. Our design has both a faster access time and less energy consumption compared to both the filter cache and the hotspot cache architectures. We use Mibench and Mediabench benchmarks, together with the simplescalar simulator in order to evaluate the performance of our proposed architecture and compares it with the filter cache and the hotspot cache architectures. The simulation results show that our design outperforms both the filter cache and the hotspot cache in both the average memory access time and the energy consumption","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114723083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10