2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation最新文献_第2页

SimGate: Full-System, Cycle-Close Simulation of the Stargate Sensor Network Intermediate Node SimGate:星际之门传感器网络中间节点的全系统、周期闭合仿真

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI: 10.1109/ICSAMOS.2006.300819

Selim Gurun, Ye Wen, Navraj Chohan, R. Wolski, C. Krintz

引用次数: 8

Memory-constrained Block Processing Optimization for Synthesis of DSP Software 基于DSP软件合成的内存约束块处理优化

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300820

Ming-Yung Ko, Chung-Ching Shen, S. Bhattacharyya

{"title":"Memory-constrained Block Processing Optimization for Synthesis of DSP Software","authors":"Ming-Yung Ko, Chung-Ching Shen, S. Bhattacharyya","doi":"10.1109/ICSAMOS.2006.300820","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300820","url":null,"abstract":"Digital signal processing (DSP) applications involve processing long streams of input data. It is important to take into account this form of processing when implementing embedded software for DSP systems. Task-level vectorization, or block processing, is a useful dataflow graph transformation that can significantly improve execution performance by allowing subsequences of data items to be processed through individual task invocations. In this way, several benefits can be obtained, including reduced context switch overhead, increased memory locality, improved utilization of processor pipelines, and use of more efficient DSP-oriented addressing modes. On the other hand, block processing generally results in increased memory requirements since it effectively increases the sizes of the input and output values associated with processing tasks. In this paper, we investigate the memory-performance tradeoff associated with block processing. We develop novel block processing algorithms that take carefully take into account memory constraints to achieve efficient block processing configurations within given memory space limitations. Our experimental results indicate that these methods derive optimal memory-constrained block processing solutions most of the time. We demonstrate the advantages of our block processing techniques on practical kernel functions and applications in the DSP domain","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126925863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Parallel Memory Implementation for Arbitrary Stride Accesses 任意跨行访问的并行内存实现

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300801

E. Aho, Jarno Vanne, T. Hämäläinen

引用次数: 8

Reduction of Energy Consumption in Processors by Early Detection and Bypassing of Trivial Operations 通过早期检测和绕过琐碎操作来降低处理器的能耗

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300805

Md. Mafijul Islam, P. Stenström

{"title":"Reduction of Energy Consumption in Processors by Early Detection and Bypassing of Trivial Operations","authors":"Md. Mafijul Islam, P. Stenström","doi":"10.1109/ICSAMOS.2006.300805","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300805","url":null,"abstract":"Previous research has established that trivial operations, i.e., instructions whose outcome can be trivially inferred from the operands, e.g. addition of zero, account for a quite significant portion of the dynamically executed instructions. By detecting them early and removing them from the pipeline, it is possible to reduce the energy consumption. This paper first presents a new classification of trivial operations in which especially such trivial operations that can be detected early, i.e. at the decode stage, in the pipeline are identified. Our analysis shows that on average as many as 10% of all executed instructions are of this kind across 12 applications from SPEC2000. We find that a majority (indeed 89%) of them are identity-trivial in which at least one of the operands is the identity element - zero or one. By detecting them early, one can bypass their execution and eliminate register accesses if the processor uses a logical/physical register remapping unit. We find that as many as 75% of all trivial operations can be detected and eliminated at the decode stage because the identity element is available that often. With such support, we find that the energy consumption in the functional units, the result bus, the instruction window infrastructure, and the register file can be reduced by 13%, 9%, 27%, and 26%, respectively yielding 18% reduction of the energy in the core pipeline","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"296 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114845541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

FLUX Networks: Interconnects on Demand FLUX网络:按需互连

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300823

S. Vassiliadis, I. Sourdis

引用次数: 23

On-Chip Communication in Run-Time Assembled Reconfigurable Systems 运行时组装可重构系统中的片上通信

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300824

P. Sedcole, P. Cheung, G. Constantinides, W. Luk

引用次数: 3

Hardware DWT accelerator for MultiProcessor System-on-Chip on FPGA FPGA上多处理器片上系统的硬件DWT加速器

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300816

Simone Borgio, Davide Bosisio, Fabrizio Ferrandi, M. Monchiero, M. Santambrogio, D. Sciuto, Antonino Tumeo

{"title":"Hardware DWT accelerator for MultiProcessor System-on-Chip on FPGA","authors":"Simone Borgio, Davide Bosisio, Fabrizio Ferrandi, M. Monchiero, M. Santambrogio, D. Sciuto, Antonino Tumeo","doi":"10.1109/ICSAMOS.2006.300816","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300816","url":null,"abstract":"High performance multimedia applications are typical targets of today embedded systems. These applications, complex both in terms of execution flow and amount of elaborated data, can be well addressed by multiprocessor systems on-chip (MPSoCs). MPSoCs are composed of simple processors and memories tightly interconnected with fast communication channels and customized IP cores for the most demanding functions can be implemented and attached to these systems to enhance performance even more. Reconfigurable devices like FPGA, can act as a target, even programmed at runtime, for the custom IP cores, or as a prototyping platform for the whole system. Image compression like JPEG2000, can benefit very much from this approach and this type of architectures. This paper shows how the most demanding task of the JPEG2000 compression algorithm, the two-dimensional discrete wavelet transform, can be hardware accelerated and implemented in a multiprocessor system-on-chip prototyping platform on field programmable gate array (FPGA), CerberO. Architectures with different number of processors and hardware accelerators, shared among the processors or dedicated, have been implemented. To validate the approach, we show some experimental results on the platform with the hardware and the software implementation of the transformation","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125891169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Exploration of Distributed Shared Memory Architectures for NoC-based Multiprocessors 基于noc的多处理器分布式共享内存架构的探索

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300821

M. Monchiero, G. Palermo, C. Silvano, Oreste Villa

{"title":"Exploration of Distributed Shared Memory Architectures for NoC-based Multiprocessors","authors":"M. Monchiero, G. Palermo, C. Silvano, Oreste Villa","doi":"10.1109/ICSAMOS.2006.300821","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300821","url":null,"abstract":"Multiprocessor system-on-chip (MP-SoC) platforms represent an emerging trend for embedded multimedia applications. To enable MP-SoC platforms, scalable communication-centric interconnect fabrics, such as networks-on-chip (NoC), have been recently proposed. The shared memory represents one of the key elements in designing MP-SoCs, since its function is to provide data exchange and synchronization support. In this paper, a distributed shared memory architecture has been explored, that is suitable for low-power on-chip multiprocessors based on NoC. In particular, the paper focuses on the energy/delay exploration of on-chip physically distributed and logically shared memory address space for MP-SoCs based on a parameterizable NoC. The data allocation on the physically distributed shared memory space is dynamically managed by an on-chip hardware memory management unit. Experimental results show the impact of different NoC topologies and distributed shared memory configurations for a selected set of parallel benchmark applications from the power/performance perspective","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132620246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Pareto-Based Application Specification for MP-SoC Customized Run-Time Management 基于pareto的MP-SoC自定义运行时管理应用规范

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300812

C. Ykman-Couvreur, V. Nollet, T. Marescaux, E. Brockmeyer, F. Catthoor, H. Corporaal

引用次数: 35

Performance Evaluation of RISC-based SoC Platforms in Network Processing Applications 基于risc的SoC平台在网络处理应用中的性能评估

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI: 10.1109/ICSAMOS.2006.300822

Rainer Ohlendorf, Thomas Wild, Michael Meitinger, Holm Rauchfuss, A. Herkersdorf

{"title":"Performance Evaluation of RISC-based SoC Platforms in Network Processing Applications","authors":"Rainer Ohlendorf, Thomas Wild, Michael Meitinger, Holm Rauchfuss, A. Herkersdorf","doi":"10.1109/ICSAMOS.2006.300822","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2006.300822","url":null,"abstract":"In this paper, results of a simulative performance evaluation of RISC-based SoC platforms for networking applications are presented. We use our SystemC simulation environment that is calibrated with a reference implementation on an FPGA-based prototyping environment, consisting of a single RISC-CPU, memory system, Ethernet MAC and an autonomous DMA engine. In order to achieve precise results, a real IP stack has been profiled. Starting with an analysis of the reference scenario, two approaches for improvements are investigated. At first, hardware assists are added, which offload the CPU from compute-intensive bit-level manipulations. Second, the concept of flexible processing paths as proposed in FlexPath NP with AutoRoute is evaluated, in which some part of the traffic can bypass the central CPU cluster. For each of the three scenarios the maximum throughput is determined, and the improvements and limitations of each solution are discussed. It can be shown that a FlexPath NP achieves up to 2.5 times the throughput of the unoptimized reference scenario under realistic traffic assumptions","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114434731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4