International Conference on Hardware/Software Codesign and System Synthesis最新文献

筛选
英文 中文
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems SuSeSim:为嵌入式系统找到最佳L1缓存配置的快速仿真策略
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629476
M. S. Haque, Andhi Janapsatya, S. Parameswaran
{"title":"SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems","authors":"M. S. Haque, Andhi Janapsatya, S. Parameswaran","doi":"10.1145/1629435.1629476","DOIUrl":"https://doi.org/10.1145/1629435.1629476","url":null,"abstract":"Simulation of an application is a popular and reliable approach to find the optimal configuration of level one cache memory for an application specific embedded system processor. However, long simulation time is one of the main disadvantages of simulation based approaches. In this paper, we propose a new and fast simulation method, Super Set Simulator (SuSeSim). While previous methods use Top-Down searching strategy, SuSeSim utilizes a Bottom-Up search strategy along with a new elaborate data structure to reduce the search space to determine a cache hit or miss. SuSeSim can simulate hundreds of cache configurations simultaneously by reading an application's memory request trace just once. Total number of cache hits and misses are accurately recorded. Depending on different cache block sizes and benchmark applications, SuSeSim can reduce the number of tags to be checked by up to 43% compared to the existing fastest simulation approach (the CRCB algorithm). With the help of a faster search and an easy to maintain data structure, SuSeSim can be up to 94% faster in simulating memory requests compared to the CRCB algorithm.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124706013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A high-level virtual platform for early MPSoC software development 一个用于早期MPSoC软件开发的高级虚拟平台
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629438
J. Ceng, Weihua Sheng, J. Castrillón, Anastasia Stulova, R. Leupers, G. Ascheid, H. Meyr
{"title":"A high-level virtual platform for early MPSoC software development","authors":"J. Ceng, Weihua Sheng, J. Castrillón, Anastasia Stulova, R. Leupers, G. Ascheid, H. Meyr","doi":"10.1145/1629435.1629438","DOIUrl":"https://doi.org/10.1145/1629435.1629438","url":null,"abstract":"Multiprocessor System-on-Chips (MPSoCs) are nowadays widely used, but the problem of their software development persists to be one of the biggest challenges for developers. Virtual Platforms (VPs) are introduced to the industry, which allow MPSoC software development without a hardware prototype. Nevertheless, for developers in early design stage where no VP is available, the software programming support is not satisfactory.\u0000 This paper introduces a High-level Virtual Platform (HVP) which aims at early MPSoC software development. The framework provides a set of tools for abstract MPSoC simulation and the corresponding application programming support in order to enable the development of reusable C code at a high level. The case study performed on several MPSoCs shows that the code developed on the HVP can be easily reused on different target platforms. Moreover, the high simulation speed achieved by the HVP also improves the design efficiency of software developers.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126024842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
ILP optimal scheduling for multi-module memory 多模块内存的ILP最优调度
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629473
Meikang Qiu, Lei Zhang, E. Sha
{"title":"ILP optimal scheduling for multi-module memory","authors":"Meikang Qiu, Lei Zhang, E. Sha","doi":"10.1145/1629435.1629473","DOIUrl":"https://doi.org/10.1145/1629435.1629473","url":null,"abstract":"In high-end digital signal processing (DSP) system, multi-module memory provides high memory bandwidth and low power operating mode for energy savings. However, making full use of these architectural features is a challenging problem for code optimization. In this paper, we propose an integer linear programming model to optimize the performance and energy consumption of multi-module memories by solving variable assignment, instruction scheduling and operating mode setting problems simultaneously. The combined effect of performance and energy saving requirements also has been considered. We develop two optimization techniques to improve the computation efficiency of our ILP model. The experimental results show that the optimal performance and energy solution can be achieved within a reasonable amount of time.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Synthesis of topology configurations and deadlock free routing algorithms for ReNoC-based systems-on-chip 基于recc的片上系统拓扑结构的综合和无死锁路由算法
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629500
M.B. Stuart, M. B. Stensgaard, J. Sparsø
{"title":"Synthesis of topology configurations and deadlock free routing algorithms for ReNoC-based systems-on-chip","authors":"M.B. Stuart, M. B. Stensgaard, J. Sparsø","doi":"10.1145/1629435.1629500","DOIUrl":"https://doi.org/10.1145/1629435.1629500","url":null,"abstract":"In the near future, generic System-on-Chip (SoC) platforms will be replacing custom designed SoCs. Such generic platforms require a highly flexible interconnect in order to support a wide variety of applications. The ReNoC architecture provides this by allowing power efficient, application specific topologies to be configured on top of a fixed but reconfigurable physical architecture through a mixture of packet switching and physical circuit switching.\u0000 The first contribution of this paper is three novel algorithms that, given an abstract description of the application and the physical architecture, 1) synthesize the application specific topologies, 2) map them onto the physical architecture, and 3) create deadlock free, application specific routing algorithms.\u0000 The second contribution is a novel physical architecture based on an extended mesh of ReNoC nodes. We apply our algorithms to a mixture of real and synthetic applications and three different physical architectures. Our results show that the different algorithms' performance are highly dependent on the physical architecture. On average, our novel physical architecture reduces power consumption by 58% compared to a conventional Network-on-Chip.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134285888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Fast model-based test case classification for performance analysis of multimedia MPSoC platforms 基于快速模型的多媒体MPSoC平台性能分析测试用例分类
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629492
Deepak Gangadharan, S. Chakraborty, Roger Zimmermann
{"title":"Fast model-based test case classification for performance analysis of multimedia MPSoC platforms","authors":"Deepak Gangadharan, S. Chakraborty, Roger Zimmermann","doi":"10.1145/1629435.1629492","DOIUrl":"https://doi.org/10.1145/1629435.1629492","url":null,"abstract":"Currently, performance analysis of multimedia-MPSoC platforms largely rely on simulation. The execution of one or more applications on such a platform is simulated for a library of test video clips. If all specified performance constraints are satisfied for this library, then the architecture is assumed to be well-designed. This is similar to testing software for functional correctness. However, in contrast to functional testing, simulating a set of video clips for a complex application/architecture is extremely time consuming. In this paper we propose a technique for clustering a library of video clips, such that it is sufficient to simulate only one clip from each cluster rather than the entire library. Our clustering is scalable, i.e., the number of clusters may be determined based on the number of clips that the system designer wishes to simulate (which is independent of the input library size). For each video clip in the library, we perform a fast bitstream analysis from which the workload generated while processing this clip on the given architecture may be estimated. This workload information, in conjunction with a workload model and a performance model of the architecture, is used for the clustering. This entire process does not involve any simulation and is hence extremely fast. We illustrate its utility through a detailed case study using an MPEG-2 decoder application running on an MPSoC platform. As part of validation of our methodology, it was observed that video clips falling into the same cluster exhibit similar worst case buffer backlogs and worst case delays for one macroblock. Overall the results demonstrate that the proposed method provides a very fast and accurate analysis and hence can be of significant benefit to the system designer.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126634004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An MDP-based application oriented optimal policy for wireless sensor networks 一种基于mdp的面向应用的无线传感器网络优化策略
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629461
Arslan Munir, A. Gordon-Ross
{"title":"An MDP-based application oriented optimal policy for wireless sensor networks","authors":"Arslan Munir, A. Gordon-Ross","doi":"10.1145/1629435.1629461","DOIUrl":"https://doi.org/10.1145/1629435.1629461","url":null,"abstract":"Technological advancements due to Moore's law have led to the proliferation of complex wireless sensor network (WSN) domains. One commonality across all WSN domains is the need to meet application requirements (i.e. lifetime, responsiveness, etc.) through domain specific sensor node design. Techniques such as sensor node parameter tuning enable WSN designers to specialize tunable parameters (i.e. processor voltage and frequency, sensing frequency, etc.) to meet these application requirements. However, given WSN domain diversity, varying environmental situations (stimuli), and sensor node complexity, sensor node parameter tuning is a very challenging task. In this paper, we propose an automated Markov Decision Process (MDP)-based methodology to prescribe optimal sensor node operation (selection of values for tunable parameters such as processor voltage, processor frequency, and sensing frequency) to meet application requirements and adapt to changing environmental stimuli. Numerical results confirm the optimality of our proposed methodology and reveal that our methodology more closely meets application requirements compared to other feasible policies.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121966102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Squashing microcode stores to size in embedded systems while delivering rapid microcode accesses 在嵌入式系统中压缩微码存储,同时提供快速的微码访问
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629471
Chengmo Yang, Mingjing Chen, A. Orailoglu
{"title":"Squashing microcode stores to size in embedded systems while delivering rapid microcode accesses","authors":"Chengmo Yang, Mingjing Chen, A. Orailoglu","doi":"10.1145/1629435.1629471","DOIUrl":"https://doi.org/10.1145/1629435.1629471","url":null,"abstract":"Microcoded customized IPs offer superior performance and direct programmability of micro-architectural structures compared to instruction-based processors, yet at the cost of drastically enlarged code sizes. Code compression can deliver size reductions but necessitates attention to performance issues, so that the performance benefits of microcoded IPs are not squandered in the process. To attain this goal, we propose in this paper a fast code compression technique through exploiting the fact that the microcodes contain a sizable amount of unspecified bits. Although the values and the positions of the specified bits are highly irregular, the proposed technique can still flexibly and precisely fill in these fully specified bits through utilizing a linear network. The linear property inherent in the compression strategy in turn enables the development of an extremely low-overhead decompression engine. At runtime, the decompressed code can be generated in such a way that all the specified bits can be filled as required by a fixed-bandwidth XOR network. The combination of the proposed flexible XOR-based network with a minimum two-level storage for highly specified fields, such as immediate values, offers utmost code compression, attained within a negligible amount of performance and hardware overhead.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mapping pipelined applications onto heterogeneous embedded systems: a bayesian optimization algorithm based approach 将流水线应用程序映射到异构嵌入式系统:基于贝叶斯优化算法的方法
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629495
Antonino Tumeo, Marco Branca, L. Camerini, C. Pilato, P. Lanzi, Fabrizio Ferrandi, D. Sciuto
{"title":"Mapping pipelined applications onto heterogeneous embedded systems: a bayesian optimization algorithm based approach","authors":"Antonino Tumeo, Marco Branca, L. Camerini, C. Pilato, P. Lanzi, Fabrizio Ferrandi, D. Sciuto","doi":"10.1145/1629435.1629495","DOIUrl":"https://doi.org/10.1145/1629435.1629495","url":null,"abstract":"In this paper we propose a flow based on the Bayesian Optimization Algorithm (BOA) for mapping pipelined applications on a heterogeneous multiprocessor platform on Field Programmable Gate Array (FPGA) with customizable processors. BOA is a Probabilistic Model Building Genetic Algorithm (PMBGA) that, substituting the classical mutation and crossover operators with the construction and the sampling of a Bayesian network, is able to identify correlated sub-structures within the problem to be maintained while generating new solutions.\u0000 The paper introduces the model adopted for pipelined applications and then shows why BOA fits the problem better than other search algorithms, like Genetic Algorithm (GA), Simulated Annealing (SA) and Tabu Search (TS). We also show that our algorithm is able to cope with data parallel pipelined algorithms. We finally validate our flow on realistic applications like JPEG and ADPCM coding by executing the resulting mapping on our platform.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"601 1-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132453592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Statistical physics approaches for network-on-chip traffic characterization 片上网络流量表征的统计物理方法
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629498
P. Bogdan, R. Marculescu
{"title":"Statistical physics approaches for network-on-chip traffic characterization","authors":"P. Bogdan, R. Marculescu","doi":"10.1145/1629435.1629498","DOIUrl":"https://doi.org/10.1145/1629435.1629498","url":null,"abstract":"In order to face the growing complexity of embedded applications, we aim to build highly efficient Network-on-Chip (NoC) architectures which can connect in a scalable manner various computational modules of the platform. For such networked platforms, it is increasingly important to accurately model the traffic characteristics as this is intimately related to our ability to determine the optimal buffer size at various routers in the network and thus provide analytical metrics for various power-performance trade-offs. In this paper, we show that the main limitations of queueing theory and Markov chain approaches to solving the buffer sizing problem can be overcome by adopting a statistical physics approach to probability density characterization which incorporates the power law distribution, correlations, and scaling properties exhibited within an NoC architecture due to various network transactions. As experimental results show, this new approach represents a breakthrough in accurate traffic modeling under non-equilibrium conditions. As such, our results can be directly used to solve the buffer sizing problem for multiprocessor systems where communication happens via the NoC approach.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132812655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators 使用连续统计机器学习在混合指令/周期精确指令集模拟器中实现高速性能预测
International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629478
D. Powell, Björn Franke
{"title":"Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators","authors":"D. Powell, Björn Franke","doi":"10.1145/1629435.1629478","DOIUrl":"https://doi.org/10.1145/1629435.1629478","url":null,"abstract":"Functional instruction set simulators perform instruction-accurate simulation of benchmarks at high instruction rates. Unlike their slower, but cycle-accurate counterparts however, they are not capable of providing cycle counts due to the higher level of hardware abstraction. In this paper we present a novel approach to performance prediction based on statistical machine learning utilizing a hybrid instruction- and cycle-accurate simulator. We introduce the concept of continuous machine learning to simulation whereby new training data points are acquired on demand and used for on-the-fly updates of the performance model. Furthermore, we show how statistical regression can be adapted to reduce the cost of these updates during a performance-critical simulation. For a state-of-the-art simulator modeling the ARC 750D embedded processor we demonstrate that our approach is highly accurate, with average error <2.5% while achieving a speed-up of approx. 50% over the baseline cycle-accurate simulation.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121791219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信