2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)最新文献_第4页

parSC: Synchronous parallel SystemC simulation on multi-core host architectures parSC:多核主机架构上的同步并行SystemC仿真

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1879005

Christoph Schumacher, R. Leupers, D. Petras, A. Hoffmann

引用次数: 70

Demand-based block-level address mapping in large-scale NAND flash storage systems 大规模NAND闪存存储系统中基于需求的块级地址映射

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878991

Zhiwei Qin, Yi Wang, Duo Liu, Z. Shao

{"title":"Demand-based block-level address mapping in large-scale NAND flash storage systems","authors":"Zhiwei Qin, Yi Wang, Duo Liu, Z. Shao","doi":"10.1145/1878961.1878991","DOIUrl":"https://doi.org/10.1145/1878961.1878991","url":null,"abstract":"The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping in the Flash Translation Layer (FTL) design. This paper proposes a novel Demand-based block-level Address mapping scheme with two-level Caching mechanism (DAC) for large-scale NAND flash storage systems. The objective is to reduce RAM footprint without sacrificing too much system response time. In our technique, the block-level address mapping table is stored in fixed pages (called translation pages) in the flash memory. Considering temporal locality that workloads exhibit, we maintain one cache in RAM to store the on-demand block-level address mapping information. Meanwhile, by exploring both spatial locality and access frequency of workloads with another two caches, the second-level cache is designed to cache selected translation pages into RAM. In such a way, address mapping information for both sequential accesses and most-frequently-accessed translation pages can be found in the cache, and therefore, the system response time can be improved. We conduct experiments on a mixture of real-world and synthetic traces. The experimental results show that our technique can significantly reduce the RAM footprint while the average response time is kept well under control. Moreover, our technique shows big improvement on wear-leveling compared with the previous work.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124039578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

Automatic Memory Partitioning: Increasing memory parallelism via data structure partitioning 自动内存分区:通过数据结构分区增加内存并行性

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878989

Y. Ben-Asher, Nadav Rotem

引用次数: 29

A task remapping technique for reliable multi-core embedded systems 可靠多核嵌入式系统的任务重映射技术

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1879014

Chanhee Lee, Hokeun Kim, Hae-woo Park, Sungchan Kim, Hyunok Oh, S. Ha

{"title":"A task remapping technique for reliable multi-core embedded systems","authors":"Chanhee Lee, Hokeun Kim, Hae-woo Park, Sungchan Kim, Hyunok Oh, S. Ha","doi":"10.1145/1878961.1879014","DOIUrl":"https://doi.org/10.1145/1878961.1879014","url":null,"abstract":"With the continuous scaling of semiconductor technology, the life-time of circuit is decreasing so that processor failure becomes an important issue in MPSoC design. A software solution to tolerate run-time processor failure is to migrate tasks from the failed processors to the live processors when failure occurs. Previous works on run-time task migration usually aim to minimize the migration overhead with or without a given latency constraint. For streaming applications, however, it is more important to minimize the throughput degradation than the migration overhead or the latency. Hence, we propose a task remapping technique to minimize the throughput degradation assuming that the migration overhead can be amortized safely. The target multi-core system assumed in this paper consists of processor pools and each pool consists of homogeneous processors. The proposed technique is based on an intensive compile-time analysis for all possible failure scenarios. It involves the following steps; 1) Determine the static mapping of tasks onto the live processors, aiming to minimize the throughput degradation: 2) Find an optimal processor-to-processor mapping to minimize the task migration overhead: and 3) Store the resultant task remapping information that includes task mapping and processor-to-processor mapping results. Since the task remapping information is pre-computed at compile-time for all possible failure scenarios, it should be efficiently represented and stored. At run-time, we simply remap the tasks following the compile-time decision. We examine the scalability of the proposed technique on both space and run-time overhead for compile-time analysis varying the number of failed processors. Through intensive experiments, we show that the proposed technique outperforms the previous works with respect to application throughput.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114175151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 84

A holistic approach to Network-on-Chip synthesis 片上网络综合的整体方法

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1879001

G. Leary, Karam S. Chatha

{"title":"A holistic approach to Network-on-Chip synthesis","authors":"G. Leary, Karam S. Chatha","doi":"10.1145/1878961.1879001","DOIUrl":"https://doi.org/10.1145/1878961.1879001","url":null,"abstract":"Application specific Network-on-Chip (NoC) architectures have emerged as a leading technology to address the communication woes of multi-processor System-on-Chip architectures. Synthesis approaches for custom NoC must address several requirements including cumulative bandwidth and transaction level (TL) communication requirements, multiple application use-cases, deadlock avoidance, and router port bandwidth and arity constraints. In this paper we present a holistic algorithm for NoC synthesis which is able to address all these requirements together in an integrated manner. The approach is able to generate designs that consume minimum dynamic power consumption, and at most twice the number of routers (and leakage power) as an optimal solution. In terms of performance the technique is able to generate NoC designs with very low average communication latencies (verified by actual simulations) and equally low standard deviation (jitter) while utilizing simple best effort routers. We evaluated the effectiveness and quality of the proposed technique by comparisons with two existing approaches. Extensive experimental results are presented for synthetic/realistic multiple use case applications, cumulative/transaction traffic requirements, increasing application bandwidth requirements, and different port arity constraints.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117284961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications 针对流应用的延迟和吞吐量受限的流水线mpsoc的最佳合成

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878978

Haris Javaid, Xin He, A. Ignjatović, S. Parameswaran

{"title":"Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications","authors":"Haris Javaid, Xin He, A. Ignjatović, S. Parameswaran","doi":"10.1145/1878961.1878978","DOIUrl":"https://doi.org/10.1145/1878961.1878978","url":null,"abstract":"A streaming application, characterized by a kernel that can be broken down into independent tasks which can be executed in a pipelined fashion, inherently allows its implementation on a pipeline of Application Specific Instruction set Processors (ASIPs), called a pipelined MPSoC. The latency and throughput requirements of streaming applications put constraints on the design of such a pipelined MPSoC, where each ASIP has a number of available configurations differing by additional instructions, and instruction and data cache sizes. Thus, the design space of a pipelined MPSoC is all the possible combinations of ASIP configurations (design points). In this paper, a methodology is proposed to optimize the area of a pipelined MPSoC under a latency or a throughput constraint. The final design point is a set of ASIP configurations with one configuration for each ASIP. We proposed an Integer Linear Programming (ILP) based solution to the area optimization problem under a latency constraint, and an algorithm for optimization of pipelined MPSoC area under a throughput constraint. The proposed solutions were evaluated using four streaming applications: JPEG encoder; JPEG decoder; MP3 encoder; and H.264 decoder. The time to find the Pareto front of each pipelined MPSoC was less than 4 minutes where design spaces had up to 1016 design points, illustrating the applicability of our approach.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115259022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Scheduling garbage collection in real-time systems 在实时系统中调度垃圾收集

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878971

Martin Kero, Simon Aittamaa

引用次数: 4

Statistical approach in a system level methodology to deal with process variation 系统级方法中处理过程变化的统计方法

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878983

Concepción Sanz, M. Prieto, J. I. Gómez, C. Tenllado, F. Catthoor

{"title":"Statistical approach in a system level methodology to deal with process variation","authors":"Concepción Sanz, M. Prieto, J. I. Gómez, C. Tenllado, F. Catthoor","doi":"10.1145/1878961.1878983","DOIUrl":"https://doi.org/10.1145/1878961.1878983","url":null,"abstract":"The impact of process variation in state of the art technology makes traditional (worst case) designs unnecessarily pessimistic, which translates to suboptimal designs in terms of both energy consumption and performance. In this context, developing variation aware design methodologies becomes a must. These techniques should provide better performance-energy balances while the percentage of faulty products keeps controlled. Furthermore, it would be advisable to consider adaptations of the system during lifetime, in order to provide robustness against ageing. In this paper we propose a design approach which tackles process variation on the memory system by using multimode memories. At design time we perform a heuristic exploration using probabilistic models of these memories, which generates a set of system configurations that minimize energy consumption for a given set of timing constraints. The percentage of systems that will satisfy these deadlines, even under process variation, is taken as a design parameter. Additionally, if system monitors are available, a setup stage optimizes the initial set of configurations for the actual memory parameters. Our simulations show that this methodology provides significant energy savings while still meeting timing constraints.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130781997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rank based dynamic voltage and frequency scaling for tiled graphics processors 基于等级的动态电压和频率缩放平铺图形处理器

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878965

B. Silpa, G. Krishnaiah, P. Panda

{"title":"Rank based dynamic voltage and frequency scaling for tiled graphics processors","authors":"B. Silpa, G. Krishnaiah, P. Panda","doi":"10.1145/1878961.1878965","DOIUrl":"https://doi.org/10.1145/1878961.1878965","url":null,"abstract":"With increasing interest in sophisticated graphics capabilities in mobile systems, energy consumption of graphics hardware is becoming a major design concern in addition to the traditional performance enhancement criteria. Our study of various modern games substantiates the observation that the workload of games varies significantly with time and hence can benefit from dynamic voltage and frequency scaling (DVFS). Since visual quality of graphics applications is highly dependent on the rate at which frames are processed, it is important to devise a DVFS scheme that minimizes deadline misses due to inaccuracies in workload prediction. In this paper, we demonstrate that tiled-graphics renderers exhibit substantial advantages over immediate-mode renderers in obtaining access to frame parameters that help in enhancing the workload estimation accuracy. We also show that, operating at a finer granularity of “tiles” as opposed to “frames” allows early detection and corrective action in case of a mis-prediction. We propose an accurate workload estimation technique and two DVFS schemes: (i) tile-history based DVFS and (ii) tile-rank based DVFS for tiled-rendering architectures. The proposed schemes are demonstrated to be more efficient in terms of power and performance than the frame level DVFS schemes proposed in recent literature. With a system with 8 DVFS levels, our tile-history based DVFS scheme results in 60% improvement in quality (deadline misses) over the frame history based DVFS schemes and gives 58% saving in energy. The more sophisticated tile-rank based scheme achieves 75% improvement in quality over the frame history based DVFS scheme and results in 58% saving in energy. We have also compared the efficiency of the proposed tile-level DVFS schemes with frame-level schemes for different number of DVFS levels, and found that while the frame-level schemes suffer from increasing deadline misses as the frequency levels increase, the impact on tile-level schemes is negligible. The Energy per Frame-rate for our scheme is the minimum, indicating that it delivers the best performance-energy results.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128426849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Accurate online power estimation and automatic battery behavior based power model generation for smartphones 智能手机准确的在线功率估计和基于电池行为的自动功率模型生成

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878982

Lide Zhang, B. Tiwana, Zhiyun Qian, Zhaoguang Wang, R. Dick, Z. Morley Mao, Lei Yang

引用次数: 1239