{"title":"Efficient simulation of synthesis-oriented system level designs","authors":"Rajesh K. Gupta, S. Shukla, N. Savoiu","doi":"10.1145/581199.581237","DOIUrl":"https://doi.org/10.1145/581199.581237","url":null,"abstract":"Modeling for synthesis and modeling for simulation seem to be two competing goals in the context of C++-based modeling frameworks. One of the reasons is while most hardware systems have some inherent parallelism efficiently expressing it depends on whether the target usage is synthesis or simulation. For synthesis, designs are usually described with synthesis tools in mind and are therefore partitioned according to the targeted hardware units. For simulation, runtime efficiency is critical but our previous work has shown that a synthesis-oriented description is not necessarily the most efficient, especially if using multiprocessor simulators. Multiprocessor simulation requires preemptive multithreading but most current C++-based high level system description languages use cooperative multithreading to exploit parallelism to reduce overhead. We have seen that, for synthesis-oriented models, along with adding preemptive threading we need to transform the threading structure for good simulation performance. In this paper we present an algorithm for automatically applying such transformations to C++-based hardware models, ongoing work aimed at proving the equivalence between the original and transformed model, and a 62% to 76% simulation time improvement on a dual processor simulator.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127473324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Xiao, Z. Shao, C. Phongpensri, E. Sha, Qingfeng Zhuge
{"title":"Optimal code size reduction for software-pipelined and unfolded loops","authors":"Bin Xiao, Z. Shao, C. Phongpensri, E. Sha, Qingfeng Zhuge","doi":"10.1145/581199.581232","DOIUrl":"https://doi.org/10.1145/581199.581232","url":null,"abstract":"Software pipelining and unfolding are commonly used techniques to increase parallelism for DSP applications. However, these techniques expand the code size of the application significantly. For most DSP systems with limited memory resources, code size becomes one of the most critical concerns for the high-performance applications. In this paper, we present the code size reduction theory based on retiming and unfolding concepts. We propose a code size reduction framework to achieve the optimal code size of software-pipelined and unfolded loops by using conditional registers. The experimental results on several well-known benchmarks show the effectiveness of our code size reduction technique in controlling the code size of optimized loops.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123265784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Cassidy, Christopher P. Andrews, D. E. Thomas, J. M. Paul
{"title":"System-level modeling of a network switch SoC","authors":"A. Cassidy, Christopher P. Andrews, D. E. Thomas, J. M. Paul","doi":"10.1145/581199.581215","DOIUrl":"https://doi.org/10.1145/581199.581215","url":null,"abstract":"We present the modeling of the high-level design of a next generation network switch from the perspective of a Computer-Aided Design (CAD) team within the larger context of a design team consisting of an experienced network switch designer and an experienced VLSI hardware designer. After facilitating the design process, the CAD team observed how designers approach highlevel designs, beyond RTL. We motivate the need for CAD support that allows designers to effectively manipulate what we refer to as Memory Visualization Level (MVL) design.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123142640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Datapath merging and interconnection sharing for reconfigurable architectures","authors":"G. Araújo, S. Malik, Zhining Huang, N. Moreano","doi":"10.1145/581199.581210","DOIUrl":"https://doi.org/10.1145/581199.581210","url":null,"abstract":"Recent work in reconfigurable computing research has shown that a substantial performance speedup can be achieved through architectures that map the most relevant application inner-loops to a reconfigurable datapath. Any solution to this problem must be able to synthesize a datapath for each loop and to merge them together into a single reconfigurable datapath. The main contribution of this paper is a novel graph-based technique for the datapath merge problem. This approach is based on the solution of a maximum clique problem that merges datapaths one at a time. A set of experiments, using the MediaBench benchmark, shows that the proposed technique produces 24% fewer datapath interconnections than a previous solution to this problem.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116251567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Security-driven exploration of cryptography in DSP cores","authors":"C. Gebotys","doi":"10.1145/581199.581218","DOIUrl":"https://doi.org/10.1145/581199.581218","url":null,"abstract":"With the popularity of wireless communication devices a new important dimension of embedded systems design has arisen, that of security. This paper presents for the first time design exploration for secure implementation of cryptographic applications on a complex DSP processor core. A new metric for security, the implementation security index, is introduced for measuring resistance to power attacks. Elliptic curve cryptographic algorithms are used to demonstrate and quantize security, energy, performance and code size tradeoffs. Modification of power traces is performed to maximize security against power attacks which has significant savings in energy dissipation compared to an existing mathematical approach. This research is important for industry since efficient yet secure cryptography is crucial for wireless communication embedded system devices.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115171664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient power reduction techniques for time multiplexed address buses","authors":"N. Dutt, D. Hirschberg, M. Mamidipaka","doi":"10.1145/581199.581246","DOIUrl":"https://doi.org/10.1145/581199.581246","url":null,"abstract":"We address the problem of reducing power dissipation on the time multiplexed address buses employed by contemporary DRAMs in SOC designs. We propose address encoding techniques to reduce the transition activity on the time-multiplexed address buses and hence reduce power dissipation. The reduction in transition activity is achieved by exploiting the principle of locality in address streams in addition to its sequential nature. We consider a realistic processor-memory architecture and apply the proposed techniques on the address streams derived from time-multiplexed DRAM addresses. Although the techniques by themselves axe not new, we show that a judicious combination of the existing techniques yield significant gains in power reductions. Experiments on SPEC95 benchmark programs show that our encoding techniques yield as much as 82% in transition activity compared to binary encoding. We show that these reductions amount to as much 60% reduction in the off-chip address bus power. Also since the encoder/decoder add some power overhead, we calculate the minimum off-chip bus capacitance to the internal node capacitance ratio needed to achieve power reductions.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"743 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125658214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"System-level abstraction semantics","authors":"D. Gajski, A. Gerstlauer","doi":"10.1145/581199.581251","DOIUrl":"https://doi.org/10.1145/581199.581251","url":null,"abstract":"Raising the level of abstraction is widely seen as the solution for closing the productivity gap in system design. They key for the success of this approach, however, axe well-defined abstraction levels and models. In this paper, we present such system level semantics to cover the system design process. We define properties and features of each model. Formalization of the flow enables design automation for synthesis and verification to achieve the required productivity gains. Through customization, the semantics allow creation of specific design methodologies. We applied the concepts to system languages SystemC and SpecC. Using the example of a JPEG encoder, we will demonstrate the feasibility and effectiveness of the approach.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133548205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Timing analysis of embedded software for speculative processors","authors":"Abhik Roychoudhury, Xianfeng Li, T. Mitra","doi":"10.1145/581199.581229","DOIUrl":"https://doi.org/10.1145/581199.581229","url":null,"abstract":"Static timing analysis of embedded software is important for systems with hard real-time constraints. To accurately estimate time bounds, it is essential to model the underlying micro-architecture. In this paper, we study static timing analysis of embedded programs for modern processors with speculative execution. Speculation of conditional branch outcomes significantly improves processor performance, and hence program execution time. Although speculation is used in most modern processors, its effect on software timing has not been systematically studied before. The main contribution of our work is a parameterized framework to model different control flow speculation schemes. The accuracy of our framework is illustrated through tight timing estimates obtained for benchmark programs.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126472379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Onoye, Yukihiro Nakamura, A. Shigiya, K. Chikamura, K. Tsujino, T. Izumi, H. Yamamoto
{"title":"System-level design of IEEE1394 bus segment bridge","authors":"T. Onoye, Yukihiro Nakamura, A. Shigiya, K. Chikamura, K. Tsujino, T. Izumi, H. Yamamoto","doi":"10.1145/581199.581217","DOIUrl":"https://doi.org/10.1145/581199.581217","url":null,"abstract":"A system simulation environment is constructed dedicatedly for IEEE1394 high-speed digital communication. In this environment, various network transactions inherent in communication systems are taken into account for system simulation, which is indispensable to enable IP (Intellectual Property)-based design of the systems. By using the proposed environment, system-level design of IEEE1394 link layer controller and bus segment bridge is achieved with great ability of network transactions as well as connectivities with physical layer chips. Functionalities of the designed bus segment bridge has been verified according to its FPGA implementation.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134102766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design experience of a chip multiprocessor Merlot and expectation to functional verification","authors":"S. Matsushita","doi":"10.1145/581199.581223","DOIUrl":"https://doi.org/10.1145/581199.581223","url":null,"abstract":"We have fabricated a Chip Multiprocessor prototype code-named Merlot to proof our novel speculative multithreading architecture. On Merlot, multiple threads provide wider issue window beyond ordinal instruction level parallel (ILP) processors like superscalar or VLIW. With the architecture, we estimate 3.0 times speedup against single processing elements (PE) on speech recognition code and IDCT code with four PEs. Merlot integrates on-chip devices, PCI interface, and SDRAM interfaces. We have encountered design issues of chip multiprocessor and SoC design. We have successfully run parallelized mpeg3 decoder on the first silicon with several software workarounds, thanks to functional verification environment including system modeling on RTL. However, bugs found in later stage of design have required larger manpower or delay of project. In this paper, we also discuss the methodology to improve functional verification coverage, and expect the solution in formal approaches.","PeriodicalId":413693,"journal":{"name":"15th International Symposium on System Synthesis, 2002.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124506900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}