International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.最新文献

筛选
英文 中文
Fast cosimulation of transformative systems with OS support on SMP computer SMP计算机上支持操作系统的转换系统快速协同仿真
Zhengting He, A. Mok
{"title":"Fast cosimulation of transformative systems with OS support on SMP computer","authors":"Zhengting He, A. Mok","doi":"10.1145/1016720.1016761","DOIUrl":"https://doi.org/10.1145/1016720.1016761","url":null,"abstract":"Transformative applications are a class of dataflow computation characterized by iterative behavior. The problem of partitioning a transformative application specification to a set of available hardware (HW) and software (SW) processing elements (PEs) and derivation of a job execution order (scheduling) on them has been quite well studied, but the problem of obtaining fast simulation of these applications poses different constraints. In this paper, we propose an efficient framework for a symmetric multi-processor (SMP) simulation host to achieve fast HW/SW co-simulation for transformative applications, given the partition solutions and the derived schedulers. The framework overcomes the limitations in existing Linux SMP kernel and requires only a reasonable amount of modifications to it. We also present a heuristic algorithm which effectively assigns simulation tasks to the processors on the simulation host, considering both average job simulation time on each processor and other simulation overhead. Our experiments show that the algorithm is able to find satisfactory suboptimal solutions with very little computation time. Based on the task assignment solution, the simulation time can be reduced by 25% to 50% from the obvious but naive approach.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123328766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient mapping of hierarchical trees on coarse-grain reconfigurable architectures 层次树在粗粒度可重构体系结构上的有效映射
F. Rivera, M. Sanchez-Elez, M. Fernandez, R. Hermida, N. Bagherzadeh
{"title":"Efficient mapping of hierarchical trees on coarse-grain reconfigurable architectures","authors":"F. Rivera, M. Sanchez-Elez, M. Fernandez, R. Hermida, N. Bagherzadeh","doi":"10.1145/1016720.1016731","DOIUrl":"https://doi.org/10.1145/1016720.1016731","url":null,"abstract":"Reconfigurable architectures have become increasingly important in years. We present an approach to the problem of executing 3D graphics interactive applications onto these architectures. The hierarchical trees are usually implemented to reduce the data processed, thereby diminishing the execution time. We have developed a mapping scheme that parallelizes the tree execution onto a SIMD reconfigurable architecture. This mapping scheme considerably reduces the time penalty caused by the possibility of executing different tree nodes in SIMD fashion. We have developed a technique that achieves an efficient hierarchical tree execution taking decisions at execution time. It also promotes the possibility of data coherence in order to reduce the execution time. The experimental results show high performance and efficient resource utilization on tested applications.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"22 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114015214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Detecting overflow detection 检测溢出检测
V. Kotlyar, M. Moudgill
{"title":"Detecting overflow detection","authors":"V. Kotlyar, M. Moudgill","doi":"10.1145/1016720.1016732","DOIUrl":"https://doi.org/10.1145/1016720.1016732","url":null,"abstract":"Fixed-point saturating arithmetic is widely used in media and digital signal processing applications. A number of processor architectures provide instructions that implement saturating operations. However, standard high-level languages, such as ANSI C, provide no direct support for saturating arithmetic. Applications written in standard languages have to implement saturating operations in terms of basic two's complement operations. In order to provide fast execution of such programs it is important to have an optimizing compiler automatically detect and convert appropriate code fragments to hardware instructions. We present a set of techniques for automatic recognition of saturating arithmetic operations. We show that in most cases the recognition problem is simply one of Boolean circuit equivalence. Given the expense of solving circuit equivalence, we develop a set of practical approximations based on abstract interpretation. Experiments show that our techniques, while reliably recognizing saturating arithmetic, have small compile-time overhead. We also demonstrate that our approach is not limited to saturating arithmetic, but is directly applicable to recognizing other idioms, such as add-with-carry and absolute value.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130265077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Dynamic overlay of scratchpad memory for energy minimization 动态覆盖的刮刮板存储器的能量最小化
Manish Verma, L. Wehmeyer, P. Marwedel
{"title":"Dynamic overlay of scratchpad memory for energy minimization","authors":"Manish Verma, L. Wehmeyer, P. Marwedel","doi":"10.1109/CODES+ISSS.2004.20","DOIUrl":"https://doi.org/10.1109/CODES+ISSS.2004.20","url":null,"abstract":"The memory subsystem accounts for a significant portion of the aggregate energy budget of contemporary embedded systems. Moreover, there exists a large potential for optimizing the energy consumption of the memory subsystem. Consequently, novel memories as well as novel algorithms for their efficient utilization are being designed. Scratchpads are known to perform better than caches in terms of power, performance, area and predictability. However, unlike caches they depend upon software allocation techniques for their utilization. We present an allocation technique which analyzes the application and inserts instructions to dynamically copy both code segments and variables onto the scratchpad at runtime. We demonstrate that the problem of dynamically overlaying scratchpad is an extension of the global register allocation problem. The overlay problem is solved optimally using ILP formulation techniques. Our approach improves upon the only previously known allocation technique for statically allocating both variables and code segments onto the scratchpad. Experiments report an average reduction of 34% and 18% in the energy consumption and the runtime of the applications, respectively. A minimal increase in code size is also reported.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134521550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 150
Fast cycle-accurate simulation and instruction set generation for constraint-based descriptions of programmable architectures 基于约束描述的可编程体系结构的快速周期精确仿真和指令集生成
S. Weber, M. Moskewicz, M. Gries, C. Sauer, K. Keutzer
{"title":"Fast cycle-accurate simulation and instruction set generation for constraint-based descriptions of programmable architectures","authors":"S. Weber, M. Moskewicz, M. Gries, C. Sauer, K. Keutzer","doi":"10.1145/1016720.1016728","DOIUrl":"https://doi.org/10.1145/1016720.1016728","url":null,"abstract":"State-of-the-art architecture description languages have been successfully used to model application-specific programmable architectures limited to particular control schemes. We introduce a language and methodology that provide a framework for constructing and simulating a wider range of architectures. The framework exploits the fact that designers are often only concerned with data paths, not the instruction set and control. In the framework, each processing element is described in a structural language that only requires the specification of the data path and constraints on how it can be used. From such a description, the supported operations of the processing clement are automatically extracted and a controller is generated. Various architectures are then realized by composing the processing elements. Furthermore, hardware descriptions and bit-true cycle-accurate simulators are automatically generated. Results show that our simulators are up to an order of magnitude faster than other reported simulators of this type and two orders of magnitude faster than equivalent Verilog simulations.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115911405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Design and programming of embedded multiprocessors: an interface-centric approach 嵌入式多处理器的设计和编程:以接口为中心的方法
P. V. D. Wolf, E. Kock, T. Henriksson, W. Kruijtzer, G. Essink
{"title":"Design and programming of embedded multiprocessors: an interface-centric approach","authors":"P. V. D. Wolf, E. Kock, T. Henriksson, W. Kruijtzer, G. Essink","doi":"10.1109/CODES+ISSS.2004.17","DOIUrl":"https://doi.org/10.1109/CODES+ISSS.2004.17","url":null,"abstract":"We present design technology for the structured design and programming of embedded multi-processor systems. It comprises a task-level interface that can be used both for developing parallel application models and as a platform interface for implementing applications on multi-processor architectures. Associated mapping technology supports refinement of application models towards implementation. By linking application development and implementation aspects, the technology integrates the specification and design phases in the MPSoC design process. Two design cases demonstrate the efficient implementation of the platform interface on different architectures. Industry-wide standardization of a task-level interface can facilitate reuse of function-specific hardware/software modules across companies.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121553776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 152
Modeling operation and microarchitecture concurrency for communication architectures with application to retargetable simulation 通信体系结构的操作和微体系结构并发建模及其可重目标仿真应用
Xinping Zhu, W. Qin, S. Malik
{"title":"Modeling operation and microarchitecture concurrency for communication architectures with application to retargetable simulation","authors":"Xinping Zhu, W. Qin, S. Malik","doi":"10.1145/1016720.1016738","DOIUrl":"https://doi.org/10.1145/1016720.1016738","url":null,"abstract":"In multiprocessor based SoCs, optimizing the communication architecture is often as important as, if not more than, optimizing the computation architecture. While there are mature platforms and techniques for the modeling and evaluation of computation architectures, the same is not true for the communication architectures. A major challenge in modeling the communication architecture is managing the concurrency at multiple levels: at the operation level, multiple communication operations may be active at any time; at the microarchitecture level, several microarchitectural components may be operating in parallel. Further, it is important to be able to clearly specify how the operation level concurrency maps to the microarchitectural level concurrency. This work presents a modeling methodology and a retargetable simulation framework which fill this gap. This framework seeks to facilitate the design space exploration of the communication sub-system through a rigorous modeling approach based on a formal concurrency model, the operation state machine (OSM). We first introduce the basic notions and concepts of OSM and show by example how this model can be used to represent the inherent concurrency in the architecture and microarchitecture of processors. Then we demonstrate the applicability of OSM in modeling on-chip communication architectures (OCAs) by walking though a router based packet switching network example and a bus example. Due to the fact that the OSM model is naturally suited to handle the operation and microarchitecture level concurrencies of OCAs as well, our OSM-based modeling methodology enables the entire system including both the computation and communication architectures to be modeled in a single OSM framework. This allows us to develop a tool set that can synthesize cycle-accurate system simulators for multi-PE SoC prototypes. To demonstrate the flexibility of this methodology, we choose two distinct system configurations with different types of OCA: a 4/spl times/4 mesh network of 16 PEs, and a cluster of 4 PEs connected by a bus. We show that by simulation, critical system information such as timing and communication patterns can be obtained and evaluated. Consequently, system-level design choices regarding the communication architecture can be made with high confidence in early stages of design. In addition to improving design quality, this methodology also results in significantly shortened design-time.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130963573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Analytical models for leakage power estimation of memory array structures 存储阵列结构泄漏功率估计的分析模型
M. Mamidipaka, K. Khouri, N. Dutt, M. Abadir
{"title":"Analytical models for leakage power estimation of memory array structures","authors":"M. Mamidipaka, K. Khouri, N. Dutt, M. Abadir","doi":"10.1145/1016720.1016757","DOIUrl":"https://doi.org/10.1145/1016720.1016757","url":null,"abstract":"There is a growing need for accurate power models at the system level. Memory structures such as caches, branch target buffers (BTBs), and register files occupy significant area in contemporary SoC designs and are the main contributors to system leakage power dissipation. Existing models for leakage power estimation in array structures typically use coefficients derived from elaborate SPICE simulations. However, these methodologies are not applicable to array designs in a newer technology, that require power estimates early in the design cycle. In this paper, we propose analytical models for array structures that are based only on high level design parameters. Assuming typical circuit implementation styles, we identify the transistors that contribute to the leakage power in each array sub-circuit and develop models as a function of the operation (read/write/idle) on the array and organizational parameters of the array. The developed models are validated by comparing their estimates against the leakage power measured using SPICE simulations on industrial array designs belonging to the e500 processor core. The comparison shows that the models are accurate with an error margin of less than 21.5% and thus can be used in high-level power-performance exploration. Interestingly, in array designs with dual threshold voltage technology, we observed that contrary to the general expectation, the array memory core contributes to just 9% and the address decoder contributes to as much as 62% of the total leakage power.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114645634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
A timing-accurate HW/SW cosimulation of an ISS with SystemC 国际空间站与SystemC定时精确的软硬件联合仿真
L. Formaggio, F. Fummi, G. Pravadelli
{"title":"A timing-accurate HW/SW cosimulation of an ISS with SystemC","authors":"L. Formaggio, F. Fummi, G. Pravadelli","doi":"10.1145/1016720.1016759","DOIUrl":"https://doi.org/10.1145/1016720.1016759","url":null,"abstract":"The paper presents a system level co-simulation methodology for modeling, validating, and analyzing the performance of embedded systems. The proposed solution relies on the integration between an instruction set simulator (ISS) and the SystemC simulation kernel. In this way, the ISS is used to abstract the model of the real programmable device where the SW should run, while SystemC is used to model HW components that interact with the SW. A correct validation of such an architecture is infeasible without taking care of timing information. Thus, the paper proposes an effective timing synchronization mechanism, which uses timing information of an ISS (or a board) to synchronize the SystemC simulation.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115889979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Compiler-directed code restructuring for reducing data TLB energy 编译器导向的代码重组,以减少数据TLB能量
M. Kandemir, I. Kadayif, G. Chen
{"title":"Compiler-directed code restructuring for reducing data TLB energy","authors":"M. Kandemir, I. Kadayif, G. Chen","doi":"10.1145/1016720.1016747","DOIUrl":"https://doi.org/10.1145/1016720.1016747","url":null,"abstract":"Prior work on TLB power optimization considered circuit and architectural techniques. A recent software-based technique for data TLBs has considered the possibility of storing the frequently used virtual-to-physical address translations in a set of translation registers (TRs), and using them when necessary instead of going to the data TLB. This work presents a compiler-based strategy for increasing the effectiveness of TRs. The idea is to restructure the application code in such a fashion that once a TR is loaded, its contents are reused as much as possible. Our experimental evaluation with six array-based benchmarks from the Spec2000 suite indicates that the proposed TR reuse strategy brings significant reductions in data TLB energy over an alternate strategy that employs TRs but does not restructure the code for TR reuse.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123624966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信