International Conference on Hardware/Software Codesign and System Synthesis最新文献_第9页

Reliable performance analysis of a multicore multithreaded system-on-chip 多核多线程片上系统的可靠性能分析

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450172

S. Schliecker, Mircea Negrean, G. Nicolescu, P. Paulin, R. Ernst

{"title":"Reliable performance analysis of a multicore multithreaded system-on-chip","authors":"S. Schliecker, Mircea Negrean, G. Nicolescu, P. Paulin, R. Ernst","doi":"10.1145/1450135.1450172","DOIUrl":"https://doi.org/10.1145/1450135.1450172","url":null,"abstract":"Formal performance analysis is now regularly applied in the design of distributed embedded systems such as automotive electronics, where it greatly contributes to an improved predictability and platform robustness of complex networked systems. Even though it might be highly beneficial also in MpSoC design, formal performance analysis could not easily be applied so far, because the classical task communication model does not cover processor-memory traffic, which is an integral part of MpSoC timing. Introducing memory accesses as individual transactions under the classical model has shown to be inefficient, and previous approaches work well only under strict orthogonalization of different traffic streams.\u0000 Recent research has presented extensions of the classical task model and a corresponding analysis that covers performance implications of shared memory traffic. In this paper we present a multithreaded multiprocessors platform and multimedia application. We conduct performance analysis using the new analysis options and specifically benchmark the quality of the available approach. Our experiments show that corner case coverage can now be supplied with a very high accuracy, allowing to quickly investigate architectural alternatives.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124074706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Don't forget memories: a case study redesigning a pattern counting ASIC circuit for FPGAs 不要忘记记忆:一个重新设计fpga模式计数ASIC电路的案例研究

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450171

David Sheldon, F. Vahid

{"title":"Don't forget memories: a case study redesigning a pattern counting ASIC circuit for FPGAs","authors":"David Sheldon, F. Vahid","doi":"10.1145/1450135.1450171","DOIUrl":"https://doi.org/10.1145/1450135.1450171","url":null,"abstract":"Modern embedded compute platforms increasingly contain both microprocessors and field-programmable gate arrays (FPGAs). The FPGAs may implement accelerators or other circuits to speedup performance. Many such circuits have been previously designed for acceleration via application-specific integrated circuits (ASICs). Redesigning an ASIC circuit for FPGA implementation involves several challenges. We describe a case study that highlights a common challenge related to memories. The study involves converting a pattern counting circuit architecture, based on a pipelined binary tree and originally designed for ASIC implementation, into a circuit suitable for FPGAs. The original ASIC-oriented circuit, when mapped to a Spartan 3e FPGA, could process 10 million patterns per second and handle up to 4,096 patterns. The redesigned circuit could instead process 100 million patterns per second and handle up to 32,768 patterns, representing a 10x performance improvement and a 4x utilization improvement. The redesign involved partitioning large memories into smaller ones at the expense of redundant control logic. Through this and other case studies, design patterns may emerge that aid designers in redesigning ASIC circuits for FPGAs as well as in building new high-performance and efficient circuits for FPGAs.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134471601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Hardware/software partitioning of floating point software applications to fixed-pointed coprocessor circuits 浮点软件应用到定点协处理器电路的硬件/软件划分

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450148

L. Saldanha, Roman L. Lysecky

{"title":"Hardware/software partitioning of floating point software applications to fixed-pointed coprocessor circuits","authors":"L. Saldanha, Roman L. Lysecky","doi":"10.1145/1450135.1450148","DOIUrl":"https://doi.org/10.1145/1450135.1450148","url":null,"abstract":"While hardware/software partitioning has been shown to provide significant performance gains, most hardware/software partitioning approaches are limited to partitioning computational kernels utilizing integers or fixed point implementations. Software developers often initially develop an application using built-in floating point representations and later convert the application to a fixed point representation - a potentially time consuming process. In this paper, we present a hardware/software partitioning approach for floating point applications that eliminates the need for developers to rewrite software applications for fixed point implementations. Instead, the proposed approach incorporates efficient, configurable floating point to fixed point and fixed point to floating point hardware converters at the boundary between the hardware coprocessors and memory. This effectively separates the system into a floating point domain consisting of the microprocessor and memory subsystem and a fixed point domain consisting of the partitioned hardware coprocessors, thereby providing an efficient and rapid method for implementing fixed point hardware coprocessors.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"428 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132690059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Speculative DMA for architecturally visible storage in instruction set extensions 指令集扩展中架构可见存储的推测性DMA

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450191

Theo Kluter, P. Brisk, P. Ienne, E. Charbon

{"title":"Speculative DMA for architecturally visible storage in instruction set extensions","authors":"Theo Kluter, P. Brisk, P. Ienne, E. Charbon","doi":"10.1145/1450135.1450191","DOIUrl":"https://doi.org/10.1145/1450135.1450191","url":null,"abstract":"Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS) - compiler-controlled memories, similar to scratchpads, that are accessible only to ISEs. To achieve a speedup using AVS, Direct Memory Access (DMA) transfers are required to move data from the main memory to the AVS; unfortunately, this creates coherence problems between the AVS and the cache, which previous methods for ISEs with AVS failed to address; additionally, these methods need to leave many conservative DMA transfers in place, whose execution significantly limits the achievable speedup. This paper presents a memory coherence scheme for ISEs with AVS, which can ensure execution correctness and memory consistency with minimal area overhead. We also present a method that speculatively removes redundant DMA transfers. Cycle-accurate experimental results were obtained using an FPGA-emulation platform. These results show that the application-specific instruction-set extended processors with speculative DMA-enhanced AVS gain significantly over previous techniques, despite the overhead of the coherence mechanism.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130925637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Application specific non-volatile primary memory for embedded systems 嵌入式系统专用非易失性主存储器

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450144

Kwangyoon Lee, A. Orailoglu

{"title":"Application specific non-volatile primary memory for embedded systems","authors":"Kwangyoon Lee, A. Orailoglu","doi":"10.1145/1450135.1450144","DOIUrl":"https://doi.org/10.1145/1450135.1450144","url":null,"abstract":"Memory subsystems have been considered as one of the most critical components in embedded systems and furthermore, displaying increasing complexity as application requirements diversify. Modern embedded systems are generally equipped with multiple heterogeneous memory devices to satisfy diverse requirements and constraints. NAND flash memory has been widely adopted for data storage because of its outstanding benefits on cost, power, capacity and non-volatility. However, in NAND flash memory, the intrinsic costs for the read and write accesses are highly disproportionate in performance and access granularity. The consequent data management complexity and performance deterioration have precluded the adoption of NAND flash memory. In this paper, we introduce a highly effective non-volatile primary memory architecture which incorporates application specific information to develop a NAND flash based primary memory. The proposed architecture provides a unified non-volatile primary memory solution which relieves design complications caused by the growing complexity in memory subsystems. Our architecture aggressively minimizes the overhead and redundancy of the NAND based systems by exploiting efficient address space management and dynamic data migration based on accurate application behavioral analysis. We also propose a highly parallelized memory architecture through an active and dynamic data redistribution over the multiple flash memories based on run-time workload analysis. The experimental results show that our proposed architecture significantly enhances average memory access cycle time which is comparable to the standard DRAM access cycle time and also considerably prolongs the device life-cycle by autonomous wear-leveling and minimizing the program/erase operations.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120937480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Symbolic voter placement for dependability-aware system synthesis 可靠性感知系统综合的符号选民安置

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450190

Felix Reimann, M. Glaß, M. Lukasiewycz, J. Keinert, C. Haubelt, J. Teich

引用次数: 27

LOCS: a low overhead profiler-driven design flow for security of MPSoCs los:用于mpsoc安全性的低开销评测器驱动的设计流程

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450154

K. Patel, S. Parameswaran

{"title":"LOCS: a low overhead profiler-driven design flow for security of MPSoCs","authors":"K. Patel, S. Parameswaran","doi":"10.1145/1450135.1450154","DOIUrl":"https://doi.org/10.1145/1450135.1450154","url":null,"abstract":"Security is a growing concern in processor based systems and hence requires immediate attention. New paradigms in the design of MPSoCs must be found, with security as one of the primary objectives. Software attacks like Code Injection Attacks exploit vulnerabilities in \"trusted\" code. Previous countermeasures addressing code injection attacks in MPSoCs have significant performance overheads and do not check every single line of code. The work described in this paper has reduced performance overhead and ensures that all the lines in the program code are checked.\u0000 We propose an MPSoC system where one processor (which we call a MONITOR processor) is responsible for supervising all other application processors. Our design flow, LOCS, instruments and profiles the execution of basic blocks in the program. LOCS subsequently uses the profiler output to re-instrument the source files to minimize runtime overheads. LOCS also aids in the design of hardware customizations required by the MONITOR. At runtime, the MONITOR checks the validity of the control flow transitions and the execution time of basic blocks.\u0000 We implemented our system on a commercial extensible processor, Xtensa LX2, and tested it on three multimedia benchmarks. The experiments show that our system has the worst-case performance degradation of about 24% and an area overhead of approximately 40%. LOCS has smaller performance, area and code size overheads than all previous code injection countermeasures for MPSoCs.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114367136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Distributed flit-buffer flow control for networks-on-chip 片上网络的分布式暂存缓冲流量控制

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450183

Nicola Concer, M. Petracca, L. Carloni

{"title":"Distributed flit-buffer flow control for networks-on-chip","authors":"Nicola Concer, M. Petracca, L. Carloni","doi":"10.1145/1450135.1450183","DOIUrl":"https://doi.org/10.1145/1450135.1450183","url":null,"abstract":"The combination of flit-buffer flow control methods and latency-insensitive protocols is an effective solution for networks-on-chip (NoC). Since they both rely on backpressure, the two techniques are easy to combine while offering complementary advantages: low complexity of router design and the ability to cope with long communication channels via automatic wire pipelining. We study various alternative implementations of this idea by considering the combination of three different types of flit-buffer flow control methods and two different classes of channel repeaters (based respectively on flip-flops and relay stations). We characterize the area and performance of the two most promising alternative implementations for NoCs by completing the RTL design and logic synthesis of the repeaters and routers for different channel parallelisms. Finally, we derive high-level abstractions of our circuit designs and we use them to perform system-level simulations under various scenarios for two distinct NoC topologies and various applications. Based on our comparative analysis and experimental results, we propose a NoC design approach that combines the reduction of the router queues to a minimum size with the distribution of flit buffering onto the channels. This approach provides precious flexibility during the physical design phase for many NoCs, particularly in those systems-on-chip that must be designed to meet a tight constraint on the target clock frequency.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130849833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

A time-predictable system initialization design for huge-capacity flash-memory storage systems 大容量闪存存储系统的时间可预测系统初始化设计

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450140

Chin-Hsien Wu

引用次数: 5

Holistic design and caching in mobile computing 移动计算中的整体设计和缓存

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450161

Mwaffaq Otoom, J. M. Paul

引用次数: 6