International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.最新文献

Fast cosimulation of transformative systems with OS support on SMP computer SMP计算机上支持操作系统的转换系统快速协同仿真

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016761

Zhengting He, A. Mok

{"title":"Fast cosimulation of transformative systems with OS support on SMP computer","authors":"Zhengting He, A. Mok","doi":"10.1145/1016720.1016761","DOIUrl":"https://doi.org/10.1145/1016720.1016761","url":null,"abstract":"Transformative applications are a class of dataflow computation characterized by iterative behavior. The problem of partitioning a transformative application specification to a set of available hardware (HW) and software (SW) processing elements (PEs) and derivation of a job execution order (scheduling) on them has been quite well studied, but the problem of obtaining fast simulation of these applications poses different constraints. In this paper, we propose an efficient framework for a symmetric multi-processor (SMP) simulation host to achieve fast HW/SW co-simulation for transformative applications, given the partition solutions and the derived schedulers. The framework overcomes the limitations in existing Linux SMP kernel and requires only a reasonable amount of modifications to it. We also present a heuristic algorithm which effectively assigns simulation tasks to the processors on the simulation host, considering both average job simulation time on each processor and other simulation overhead. Our experiments show that the algorithm is able to find satisfactory suboptimal solutions with very little computation time. Based on the task assignment solution, the simulation time can be reduced by 25% to 50% from the obvious but naive approach.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123328766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Efficient mapping of hierarchical trees on coarse-grain reconfigurable architectures 层次树在粗粒度可重构体系结构上的有效映射

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016731

F. Rivera, M. Sanchez-Elez, M. Fernandez, R. Hermida, N. Bagherzadeh

引用次数: 4

Detecting overflow detection 检测溢出检测

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016732

V. Kotlyar, M. Moudgill

{"title":"Detecting overflow detection","authors":"V. Kotlyar, M. Moudgill","doi":"10.1145/1016720.1016732","DOIUrl":"https://doi.org/10.1145/1016720.1016732","url":null,"abstract":"Fixed-point saturating arithmetic is widely used in media and digital signal processing applications. A number of processor architectures provide instructions that implement saturating operations. However, standard high-level languages, such as ANSI C, provide no direct support for saturating arithmetic. Applications written in standard languages have to implement saturating operations in terms of basic two's complement operations. In order to provide fast execution of such programs it is important to have an optimizing compiler automatically detect and convert appropriate code fragments to hardware instructions. We present a set of techniques for automatic recognition of saturating arithmetic operations. We show that in most cases the recognition problem is simply one of Boolean circuit equivalence. Given the expense of solving circuit equivalence, we develop a set of practical approximations based on abstract interpretation. Experiments show that our techniques, while reliably recognizing saturating arithmetic, have small compile-time overhead. We also demonstrate that our approach is not limited to saturating arithmetic, but is directly applicable to recognizing other idioms, such as add-with-carry and absolute value.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130265077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Dynamic overlay of scratchpad memory for energy minimization 动态覆盖的刮刮板存储器的能量最小化

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1109/CODES+ISSS.2004.20

Manish Verma, L. Wehmeyer, P. Marwedel

{"title":"Dynamic overlay of scratchpad memory for energy minimization","authors":"Manish Verma, L. Wehmeyer, P. Marwedel","doi":"10.1109/CODES+ISSS.2004.20","DOIUrl":"https://doi.org/10.1109/CODES+ISSS.2004.20","url":null,"abstract":"The memory subsystem accounts for a significant portion of the aggregate energy budget of contemporary embedded systems. Moreover, there exists a large potential for optimizing the energy consumption of the memory subsystem. Consequently, novel memories as well as novel algorithms for their efficient utilization are being designed. Scratchpads are known to perform better than caches in terms of power, performance, area and predictability. However, unlike caches they depend upon software allocation techniques for their utilization. We present an allocation technique which analyzes the application and inserts instructions to dynamically copy both code segments and variables onto the scratchpad at runtime. We demonstrate that the problem of dynamically overlaying scratchpad is an extension of the global register allocation problem. The overlay problem is solved optimally using ILP formulation techniques. Our approach improves upon the only previously known allocation technique for statically allocating both variables and code segments onto the scratchpad. Experiments report an average reduction of 34% and 18% in the energy consumption and the runtime of the applications, respectively. A minimal increase in code size is also reported.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134521550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 150

Fast cycle-accurate simulation and instruction set generation for constraint-based descriptions of programmable architectures 基于约束描述的可编程体系结构的快速周期精确仿真和指令集生成

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016728

S. Weber, M. Moskewicz, M. Gries, C. Sauer, K. Keutzer

{"title":"Fast cycle-accurate simulation and instruction set generation for constraint-based descriptions of programmable architectures","authors":"S. Weber, M. Moskewicz, M. Gries, C. Sauer, K. Keutzer","doi":"10.1145/1016720.1016728","DOIUrl":"https://doi.org/10.1145/1016720.1016728","url":null,"abstract":"State-of-the-art architecture description languages have been successfully used to model application-specific programmable architectures limited to particular control schemes. We introduce a language and methodology that provide a framework for constructing and simulating a wider range of architectures. The framework exploits the fact that designers are often only concerned with data paths, not the instruction set and control. In the framework, each processing element is described in a structural language that only requires the specification of the data path and constraints on how it can be used. From such a description, the supported operations of the processing clement are automatically extracted and a controller is generated. Various architectures are then realized by composing the processing elements. Furthermore, hardware descriptions and bit-true cycle-accurate simulators are automatically generated. Results show that our simulators are up to an order of magnitude faster than other reported simulators of this type and two orders of magnitude faster than equivalent Verilog simulations.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115911405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Design and programming of embedded multiprocessors: an interface-centric approach 嵌入式多处理器的设计和编程:以接口为中心的方法

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1109/CODES+ISSS.2004.17

P. V. D. Wolf, E. Kock, T. Henriksson, W. Kruijtzer, G. Essink

引用次数: 152

Modeling operation and microarchitecture concurrency for communication architectures with application to retargetable simulation 通信体系结构的操作和微体系结构并发建模及其可重目标仿真应用

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016738

Xinping Zhu, W. Qin, S. Malik

{"title":"Modeling operation and microarchitecture concurrency for communication architectures with application to retargetable simulation","authors":"Xinping Zhu, W. Qin, S. Malik","doi":"10.1145/1016720.1016738","DOIUrl":"https://doi.org/10.1145/1016720.1016738","url":null,"abstract":"In multiprocessor based SoCs, optimizing the communication architecture is often as important as, if not more than, optimizing the computation architecture. While there are mature platforms and techniques for the modeling and evaluation of computation architectures, the same is not true for the communication architectures. A major challenge in modeling the communication architecture is managing the concurrency at multiple levels: at the operation level, multiple communication operations may be active at any time; at the microarchitecture level, several microarchitectural components may be operating in parallel. Further, it is important to be able to clearly specify how the operation level concurrency maps to the microarchitectural level concurrency. This work presents a modeling methodology and a retargetable simulation framework which fill this gap. This framework seeks to facilitate the design space exploration of the communication sub-system through a rigorous modeling approach based on a formal concurrency model, the operation state machine (OSM). We first introduce the basic notions and concepts of OSM and show by example how this model can be used to represent the inherent concurrency in the architecture and microarchitecture of processors. Then we demonstrate the applicability of OSM in modeling on-chip communication architectures (OCAs) by walking though a router based packet switching network example and a bus example. Due to the fact that the OSM model is naturally suited to handle the operation and microarchitecture level concurrencies of OCAs as well, our OSM-based modeling methodology enables the entire system including both the computation and communication architectures to be modeled in a single OSM framework. This allows us to develop a tool set that can synthesize cycle-accurate system simulators for multi-PE SoC prototypes. To demonstrate the flexibility of this methodology, we choose two distinct system configurations with different types of OCA: a 4/spl times/4 mesh network of 16 PEs, and a cluster of 4 PEs connected by a bus. We show that by simulation, critical system information such as timing and communication patterns can be obtained and evaluated. Consequently, system-level design choices regarding the communication architecture can be made with high confidence in early stages of design. In addition to improving design quality, this methodology also results in significantly shortened design-time.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130963573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Analytical models for leakage power estimation of memory array structures 存储阵列结构泄漏功率估计的分析模型

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016757

M. Mamidipaka, K. Khouri, N. Dutt, M. Abadir

{"title":"Analytical models for leakage power estimation of memory array structures","authors":"M. Mamidipaka, K. Khouri, N. Dutt, M. Abadir","doi":"10.1145/1016720.1016757","DOIUrl":"https://doi.org/10.1145/1016720.1016757","url":null,"abstract":"There is a growing need for accurate power models at the system level. Memory structures such as caches, branch target buffers (BTBs), and register files occupy significant area in contemporary SoC designs and are the main contributors to system leakage power dissipation. Existing models for leakage power estimation in array structures typically use coefficients derived from elaborate SPICE simulations. However, these methodologies are not applicable to array designs in a newer technology, that require power estimates early in the design cycle. In this paper, we propose analytical models for array structures that are based only on high level design parameters. Assuming typical circuit implementation styles, we identify the transistors that contribute to the leakage power in each array sub-circuit and develop models as a function of the operation (read/write/idle) on the array and organizational parameters of the array. The developed models are validated by comparing their estimates against the leakage power measured using SPICE simulations on industrial array designs belonging to the e500 processor core. The comparison shows that the models are accurate with an error margin of less than 21.5% and thus can be used in high-level power-performance exploration. Interestingly, in array designs with dual threshold voltage technology, we observed that contrary to the general expectation, the array memory core contributes to just 9% and the address decoder contributes to as much as 62% of the total leakage power.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114645634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

A timing-accurate HW/SW cosimulation of an ISS with SystemC 国际空间站与SystemC定时精确的软硬件联合仿真

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016759

L. Formaggio, F. Fummi, G. Pravadelli

引用次数: 18

Compiler-directed code restructuring for reducing data TLB energy 编译器导向的代码重组，以减少数据TLB能量

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016747

M. Kandemir, I. Kadayif, G. Chen

引用次数: 23