2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation最新文献

筛选
英文 中文
On the Evaluation of Dense Chip-Multiprocessor Architectures 密集芯片-多处理器体系结构的评价
Francisco J. Villa, M. Acacio, José M. García
{"title":"On the Evaluation of Dense Chip-Multiprocessor Architectures","authors":"Francisco J. Villa, M. Acacio, José M. García","doi":"10.1109/ICSAMOS.2006.300804","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300804","url":null,"abstract":"Chip-multiprocessors (CMPs) have been revealed as the most promising way of making efficient use of current improvements in integration scale. Nowadays, commercial CMP releases integrate at most 8 processor cores onto the chip. However, 16 or more processor cores are expected to be offered in near future dense-CMP (D-CMP) systems. In this way, these architectures impose new design restrictions, and some topics, such as the cache-coherence problem, must be reviewed. In this paper we present an exhaustive performance evaluation of two recently proposed D-CMP architectures, making special emphasis on the solution to the cache-coherence problem that each one of them introduces. The shared bus fabric architecture (SBF) features a snoop cache-coherence protocol and is based on a high-performance bus fabric interconnection network. The second architecture follows a directory-based approach and integrates a bi-dimensional mesh as the interconnection network. Our results show that the performance achieved by the SBF architecture is hard-limited by the bandwidth restrictions of the bus fabric. On the other hand, the directory-based architecture outperforms the SBF one, but presents some performance inefficiencies due to the additional indirection that the directory structure stored in the L2 cache level introduces","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122717352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Accelerating RTL Simulation by Several Orders of Magnitude Using Clock Suppression 利用时钟抑制加速RTL仿真的几个数量级
H. Muhr, Roland Höler
{"title":"Accelerating RTL Simulation by Several Orders of Magnitude Using Clock Suppression","authors":"H. Muhr, Roland Höler","doi":"10.1109/ICSAMOS.2006.300818","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300818","url":null,"abstract":"In recent years designers of embedded computer systems face a tremendous growth in complexity of their systems. This, together with the fact that the used system clock frequencies rise and that the real time required to see features start up and work correctly in an embedded system also increases, let skyrocket the simulation times of event based simulation engines. Performing these simulations on register transfer level (RTL), however, is crucial to achieve functional verification of embedded computer systems. The acceleration of such event based simulations thus is the aim of the work presented in this paper. To this end a methodology called clock suppression is presented and thoroughly discussed. To underpin the feasibility and performance of this approach, evaluation results of simulation experiments for several designs will be shown","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"297 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127639278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Performance Improvements in Microprocessor Systems Utilizing a Coprocessor Data-Path 利用协处理器数据路径的微处理器系统性能改进
M. D. Galanis, G. Dimitroulakos, C. Goutis
{"title":"Performance Improvements in Microprocessor Systems Utilizing a Coprocessor Data-Path","authors":"M. D. Galanis, G. Dimitroulakos, C. Goutis","doi":"10.1109/ICSAMOS.2006.300813","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300813","url":null,"abstract":"The speedups achieved in a generic microprocessor system by employing a high-performance data-path are presented in this work. The data-path acts as a coprocessor that accelerates computational intensive kernel regions thereby increasing the overall performance. It is composed by flexible computational components (FCCs) that can realize any two-level template of primitive operations. The automated coprocessor synthesis method and its integration to a design flow for executing applications on the system is presented. Analytical exploration in respect to the type of the custom data-path and to the microprocessor architecture is performed. The overall application speedups of eight real-life applications, relative to the software execution on the microprocessor, are estimated using the design flow. These speedups range from 1.75 to 3.95, having an average value of 2.72, while the overhead in circuit area is small. A comparison with another high-performance data-path showed that the proposed coprocessor achieves better performance while having smaller area-time products for the generated data-paths","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124812567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Table-Based Application-Specific Prefetch Engine for Object-Oriented Embedded Systems 面向对象嵌入式系统的基于表的应用程序预取引擎
S. Hessabi, M. Modarressi, M. Goudarzi, Hani JavanHemmat
{"title":"A Table-Based Application-Specific Prefetch Engine for Object-Oriented Embedded Systems","authors":"S. Hessabi, M. Modarressi, M. Goudarzi, Hani JavanHemmat","doi":"10.1109/ICSAMOS.2006.300802","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300802","url":null,"abstract":"A table-based application-specific data prefetching mechanism is presented in this paper. This mechanism is proposed to improve the performance of the application specific instruction-set processors (ASIP) we develop customized to an object-oriented application. In this approach, we divide the data accesses of a class method into two conditional and unconditional parts. We supply the prefetch engine with the static information about each part to prefetch all data fields of an object required by a class method when the class method is invoked. Effective management of memory access patterns by dividing them based on the method to which they belong and storing the access information of nested loops using a simple structure are the merits of the proposed mechanism. In addition, by adding a prefetch flag to cache blocks, we eliminate a large number of prefetch related tag comparisons. The results show that the proposed mechanism reduces the cache miss ratio and prefetch related tag comparisons on average by 66% and 21%, respectively","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134103185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Parameterized Mapping of Algorithms onto Processor Arrays with Sub-Word Parallelism 具有子字并行性的算法到处理器阵列的参数化映射
Rainer Schaffer, R. Merker
{"title":"Parameterized Mapping of Algorithms onto Processor Arrays with Sub-Word Parallelism","authors":"Rainer Schaffer, R. Merker","doi":"10.1109/ICSAMOS.2006.300815","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300815","url":null,"abstract":"Upcoming processor architectures support parallel processing on different levels. Multiple processing elements (PEs) run in parallel. The PEs consists of several functional units and the functional units allow sub-word parallelism (SWP), i.e. the parallel execution of operations with low data word width. In this paper, a parameterized mapping of algorithms onto massively parallel processor architectures (PAs) is derived which exploits both parallelism on PA and SWP on PE level. It establishes a correlation between the parameters of the algorithms and the parameters of the PA, which enables optimization strategies with respect to several expense factors of the PA. The design approach is based on the co-partitioning method and the partitioning of data dependencies. Both are used in a hierarchical manner. Besides the parameters of the PA (such as shape, number of PEs, number of sub-words processed in parallel, channels between the PEs, and their delay), the packing instructions for exploiting SWP can be deduced","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127313813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Chip Size Estimation for SOC Design Space Exploration SOC设计空间探索中的芯片尺寸估算
H. Jeschke
{"title":"Chip Size Estimation for SOC Design Space Exploration","authors":"H. Jeschke","doi":"10.1016/j.sysarc.2007.01.012","DOIUrl":"https://doi.org/10.1016/j.sysarc.2007.01.012","url":null,"abstract":"","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"118842030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Area-Aware Optimizations for Resource Constrained Branch Predictors Exploited in Embedded Processors 嵌入式处理器中资源约束分支预测器的区域感知优化
Babak Salamat, A. Baniasadi, K. J. Deris
{"title":"Area-Aware Optimizations for Resource Constrained Branch Predictors Exploited in Embedded Processors","authors":"Babak Salamat, A. Baniasadi, K. J. Deris","doi":"10.1109/ICSAMOS.2006.300808","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300808","url":null,"abstract":"Modern embedded processors (e.g., Intel's XScale) use small and simple branch predictors to improve performance. Such predictors impose little area and power overhead but may offer low accuracy. As a result, branch misprediction rate could be high. Such mispredictions result in longer program runtime and wasted activity. To address this inefficiency, we introduce two optimization techniques: first, we introduce an adaptive and low-complexity branch prediction technique. Our branch predictor removes up to a maximum of 50% of the branch mispredictions of a bimodal predictor. This results in improving performance by up to 16%. Second, we present front-end gating techniques and reduce wasted activity up to a maximum of 32%","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125150624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Static Energy Saving Through Multi-Bank Memory Architecture 通过多存储库架构实现静态节能
S. Lafond, J. Lilius
{"title":"Static Energy Saving Through Multi-Bank Memory Architecture","authors":"S. Lafond, J. Lilius","doi":"10.1109/ICSAMOS.2006.300807","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300807","url":null,"abstract":"Managing the energy consumption of embedded systems has become a major problem with the increasing demand for portable electronic devices. This paper proposes a multi-bank memory architecture as a solution to decrease the static energy cost in memory. We set up the equations ruling the optimization problem for decreasing the memory static energy cost, analyze the impact of different parameters on the energy cost and finally present some case study results","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121525254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Efficient Hierarchical Fuzzy Approach for System Level System-on-a-Chip Design 系统级片上系统设计中一种有效的层次模糊方法
G. Ascia, V. Catania, A. D. Nuovo, M. Palesi, Davide Patti
{"title":"An Efficient Hierarchical Fuzzy Approach for System Level System-on-a-Chip Design","authors":"G. Ascia, V. Catania, A. D. Nuovo, M. Palesi, Davide Patti","doi":"10.1109/ICSAMOS.2006.300817","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300817","url":null,"abstract":"One of the most important bottleneck in the overall design flow of a complex embedded system is due to the simulation. Simulation occurs at every phase of the design flow. In this paper we focus on system level design proposing a novel approach to speed up the evaluation of a system configuration. The approach, which uses a fuzzy system as approximator for the different components of the system, is used to accelerate the design space exploration of a VLIW based parameterized SoC platform for the optimization of both performance and power dissipation. The experiments carried out on a multimedia benchmark suite, demonstrate the scalability and the accuracy of the proposed approach","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115919400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modified Hotspot Cache Architecture: A Low Energy Fast Cache for Embedded Processors 改进的热点缓存架构:嵌入式处理器的低能量快速缓存
K. Ali, M. Aboelaze, S. Datta
{"title":"Modified Hotspot Cache Architecture: A Low Energy Fast Cache for Embedded Processors","authors":"K. Ali, M. Aboelaze, S. Datta","doi":"10.1109/ICSAMOS.2006.300806","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300806","url":null,"abstract":"The cache memory plays a crucial role in the performance of any processor. The cache memory (SRAM), especially the on chip cache, is 3-4 times faster than the main memory (DRAM). It can vastly improve the processor performance and speed. Also the cache consumes much less energy than the main memory. That leads to a huge power saving which is very important for embedded applications. In today's processors, although the cache memory reduces the energy consumption of the processor, however the energy consumption in the on-chip cache account to almost 40% of the total energy consumption of the processor. In this paper, we propose a cache architecture, for the instruction cache, that is a modification of the hotspot architecture. Our proposed architecture consists of a small filter cache in parallel with the hotspot cache, between the L1 cache and the main memory. The small filter cache is to hold the code that was not captured by the hotspot cache. We also propose a prediction mechanism to steer the memory access to either the hotspot cache, the filter cache, or the L1 cache. Our design has both a faster access time and less energy consumption compared to both the filter cache and the hotspot cache architectures. We use Mibench and Mediabench benchmarks, together with the simplescalar simulator in order to evaluate the performance of our proposed architecture and compares it with the filter cache and the hotspot cache architectures. The simulation results show that our design outperforms both the filter cache and the hotspot cache in both the average memory access time and the energy consumption","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114723083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信