International Conference on Hardware/Software Codesign and System Synthesis最新文献_第4页

SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems SuSeSim:为嵌入式系统找到最佳L1缓存配置的快速仿真策略

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629476

M. S. Haque, Andhi Janapsatya, S. Parameswaran

{"title":"SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems","authors":"M. S. Haque, Andhi Janapsatya, S. Parameswaran","doi":"10.1145/1629435.1629476","DOIUrl":"https://doi.org/10.1145/1629435.1629476","url":null,"abstract":"Simulation of an application is a popular and reliable approach to find the optimal configuration of level one cache memory for an application specific embedded system processor. However, long simulation time is one of the main disadvantages of simulation based approaches. In this paper, we propose a new and fast simulation method, Super Set Simulator (SuSeSim). While previous methods use Top-Down searching strategy, SuSeSim utilizes a Bottom-Up search strategy along with a new elaborate data structure to reduce the search space to determine a cache hit or miss. SuSeSim can simulate hundreds of cache configurations simultaneously by reading an application's memory request trace just once. Total number of cache hits and misses are accurately recorded. Depending on different cache block sizes and benchmark applications, SuSeSim can reduce the number of tags to be checked by up to 43% compared to the existing fastest simulation approach (the CRCB algorithm). With the help of a faster search and an easy to maintain data structure, SuSeSim can be up to 94% faster in simulating memory requests compared to the CRCB algorithm.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124706013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

A high-level virtual platform for early MPSoC software development 一个用于早期MPSoC软件开发的高级虚拟平台

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629438

J. Ceng, Weihua Sheng, J. Castrillón, Anastasia Stulova, R. Leupers, G. Ascheid, H. Meyr

引用次数: 44

ILP optimal scheduling for multi-module memory 多模块内存的ILP最优调度

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629473

Meikang Qiu, Lei Zhang, E. Sha

引用次数: 8

Synthesis of topology configurations and deadlock free routing algorithms for ReNoC-based systems-on-chip 基于recc的片上系统拓扑结构的综合和无死锁路由算法

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629500

M.B. Stuart, M. B. Stensgaard, J. Sparsø

{"title":"Synthesis of topology configurations and deadlock free routing algorithms for ReNoC-based systems-on-chip","authors":"M.B. Stuart, M. B. Stensgaard, J. Sparsø","doi":"10.1145/1629435.1629500","DOIUrl":"https://doi.org/10.1145/1629435.1629500","url":null,"abstract":"In the near future, generic System-on-Chip (SoC) platforms will be replacing custom designed SoCs. Such generic platforms require a highly flexible interconnect in order to support a wide variety of applications. The ReNoC architecture provides this by allowing power efficient, application specific topologies to be configured on top of a fixed but reconfigurable physical architecture through a mixture of packet switching and physical circuit switching.\u0000 The first contribution of this paper is three novel algorithms that, given an abstract description of the application and the physical architecture, 1) synthesize the application specific topologies, 2) map them onto the physical architecture, and 3) create deadlock free, application specific routing algorithms.\u0000 The second contribution is a novel physical architecture based on an extended mesh of ReNoC nodes. We apply our algorithms to a mixture of real and synthetic applications and three different physical architectures. Our results show that the different algorithms' performance are highly dependent on the physical architecture. On average, our novel physical architecture reduces power consumption by 58% compared to a conventional Network-on-Chip.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134285888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Fast model-based test case classification for performance analysis of multimedia MPSoC platforms 基于快速模型的多媒体MPSoC平台性能分析测试用例分类

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629492

Deepak Gangadharan, S. Chakraborty, Roger Zimmermann

{"title":"Fast model-based test case classification for performance analysis of multimedia MPSoC platforms","authors":"Deepak Gangadharan, S. Chakraborty, Roger Zimmermann","doi":"10.1145/1629435.1629492","DOIUrl":"https://doi.org/10.1145/1629435.1629492","url":null,"abstract":"Currently, performance analysis of multimedia-MPSoC platforms largely rely on simulation. The execution of one or more applications on such a platform is simulated for a library of test video clips. If all specified performance constraints are satisfied for this library, then the architecture is assumed to be well-designed. This is similar to testing software for functional correctness. However, in contrast to functional testing, simulating a set of video clips for a complex application/architecture is extremely time consuming. In this paper we propose a technique for clustering a library of video clips, such that it is sufficient to simulate only one clip from each cluster rather than the entire library. Our clustering is scalable, i.e., the number of clusters may be determined based on the number of clips that the system designer wishes to simulate (which is independent of the input library size). For each video clip in the library, we perform a fast bitstream analysis from which the workload generated while processing this clip on the given architecture may be estimated. This workload information, in conjunction with a workload model and a performance model of the architecture, is used for the clustering. This entire process does not involve any simulation and is hence extremely fast. We illustrate its utility through a detailed case study using an MPEG-2 decoder application running on an MPSoC platform. As part of validation of our methodology, it was observed that video clips falling into the same cluster exhibit similar worst case buffer backlogs and worst case delays for one macroblock. Overall the results demonstrate that the proposed method provides a very fast and accurate analysis and hence can be of significant benefit to the system designer.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126634004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An MDP-based application oriented optimal policy for wireless sensor networks 一种基于mdp的面向应用的无线传感器网络优化策略

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629461

Arslan Munir, A. Gordon-Ross

{"title":"An MDP-based application oriented optimal policy for wireless sensor networks","authors":"Arslan Munir, A. Gordon-Ross","doi":"10.1145/1629435.1629461","DOIUrl":"https://doi.org/10.1145/1629435.1629461","url":null,"abstract":"Technological advancements due to Moore's law have led to the proliferation of complex wireless sensor network (WSN) domains. One commonality across all WSN domains is the need to meet application requirements (i.e. lifetime, responsiveness, etc.) through domain specific sensor node design. Techniques such as sensor node parameter tuning enable WSN designers to specialize tunable parameters (i.e. processor voltage and frequency, sensing frequency, etc.) to meet these application requirements. However, given WSN domain diversity, varying environmental situations (stimuli), and sensor node complexity, sensor node parameter tuning is a very challenging task. In this paper, we propose an automated Markov Decision Process (MDP)-based methodology to prescribe optimal sensor node operation (selection of values for tunable parameters such as processor voltage, processor frequency, and sensing frequency) to meet application requirements and adapt to changing environmental stimuli. Numerical results confirm the optimality of our proposed methodology and reveal that our methodology more closely meets application requirements compared to other feasible policies.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121966102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Mapping pipelined applications onto heterogeneous embedded systems: a bayesian optimization algorithm based approach 将流水线应用程序映射到异构嵌入式系统:基于贝叶斯优化算法的方法

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629495

Antonino Tumeo, Marco Branca, L. Camerini, C. Pilato, P. Lanzi, Fabrizio Ferrandi, D. Sciuto

引用次数: 6

Statistical physics approaches for network-on-chip traffic characterization 片上网络流量表征的统计物理方法

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629498

P. Bogdan, R. Marculescu

{"title":"Statistical physics approaches for network-on-chip traffic characterization","authors":"P. Bogdan, R. Marculescu","doi":"10.1145/1629435.1629498","DOIUrl":"https://doi.org/10.1145/1629435.1629498","url":null,"abstract":"In order to face the growing complexity of embedded applications, we aim to build highly efficient Network-on-Chip (NoC) architectures which can connect in a scalable manner various computational modules of the platform. For such networked platforms, it is increasingly important to accurately model the traffic characteristics as this is intimately related to our ability to determine the optimal buffer size at various routers in the network and thus provide analytical metrics for various power-performance trade-offs. In this paper, we show that the main limitations of queueing theory and Markov chain approaches to solving the buffer sizing problem can be overcome by adopting a statistical physics approach to probability density characterization which incorporates the power law distribution, correlations, and scaling properties exhibited within an NoC architecture due to various network transactions. As experimental results show, this new approach represents a breakthrough in accurate traffic modeling under non-equilibrium conditions. As such, our results can be directly used to solve the buffer sizing problem for multiprocessor systems where communication happens via the NoC approach.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132812655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Squashing microcode stores to size in embedded systems while delivering rapid microcode accesses 在嵌入式系统中压缩微码存储，同时提供快速的微码访问

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629471

Chengmo Yang, Mingjing Chen, A. Orailoglu

{"title":"Squashing microcode stores to size in embedded systems while delivering rapid microcode accesses","authors":"Chengmo Yang, Mingjing Chen, A. Orailoglu","doi":"10.1145/1629435.1629471","DOIUrl":"https://doi.org/10.1145/1629435.1629471","url":null,"abstract":"Microcoded customized IPs offer superior performance and direct programmability of micro-architectural structures compared to instruction-based processors, yet at the cost of drastically enlarged code sizes. Code compression can deliver size reductions but necessitates attention to performance issues, so that the performance benefits of microcoded IPs are not squandered in the process. To attain this goal, we propose in this paper a fast code compression technique through exploiting the fact that the microcodes contain a sizable amount of unspecified bits. Although the values and the positions of the specified bits are highly irregular, the proposed technique can still flexibly and precisely fill in these fully specified bits through utilizing a linear network. The linear property inherent in the compression strategy in turn enables the development of an extremely low-overhead decompression engine. At runtime, the decompressed code can be generated in such a way that all the specified bits can be filled as required by a fixed-bandwidth XOR network. The combination of the proposed flexible XOR-based network with a minimum two-level storage for highly specified fields, such as immediate values, offers utmost code compression, attained within a negligible amount of performance and hardware overhead.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators 使用连续统计机器学习在混合指令/周期精确指令集模拟器中实现高速性能预测

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629478

D. Powell, Björn Franke

引用次数: 19