2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)最新文献

筛选
英文 中文
parSC: Synchronous parallel SystemC simulation on multi-core host architectures parSC:多核主机架构上的同步并行SystemC仿真
Christoph Schumacher, R. Leupers, D. Petras, A. Hoffmann
{"title":"parSC: Synchronous parallel SystemC simulation on multi-core host architectures","authors":"Christoph Schumacher, R. Leupers, D. Petras, A. Hoffmann","doi":"10.1145/1878961.1879005","DOIUrl":"https://doi.org/10.1145/1878961.1879005","url":null,"abstract":"Time-consuming cycle-accurate MPSoC simulation is often needed for debugging and verification. Its practicability is put at risk by the growing MPSoC complexity. This work presents a conservative synchronous parallel simulation approach along with a SystemC framework to accelerate tightly-coupled MPSoC simulations on multi-core hosts. Key contribution is the implementation strategy, which utilizes techniques from the high-performance computing domain. Results show speed-ups of up to 4.4 on four host cores.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131098976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Demand-based block-level address mapping in large-scale NAND flash storage systems 大规模NAND闪存存储系统中基于需求的块级地址映射
Zhiwei Qin, Yi Wang, Duo Liu, Z. Shao
{"title":"Demand-based block-level address mapping in large-scale NAND flash storage systems","authors":"Zhiwei Qin, Yi Wang, Duo Liu, Z. Shao","doi":"10.1145/1878961.1878991","DOIUrl":"https://doi.org/10.1145/1878961.1878991","url":null,"abstract":"The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping in the Flash Translation Layer (FTL) design. This paper proposes a novel Demand-based block-level Address mapping scheme with two-level Caching mechanism (DAC) for large-scale NAND flash storage systems. The objective is to reduce RAM footprint without sacrificing too much system response time. In our technique, the block-level address mapping table is stored in fixed pages (called translation pages) in the flash memory. Considering temporal locality that workloads exhibit, we maintain one cache in RAM to store the on-demand block-level address mapping information. Meanwhile, by exploring both spatial locality and access frequency of workloads with another two caches, the second-level cache is designed to cache selected translation pages into RAM. In such a way, address mapping information for both sequential accesses and most-frequently-accessed translation pages can be found in the cache, and therefore, the system response time can be improved. We conduct experiments on a mixture of real-world and synthetic traces. The experimental results show that our technique can significantly reduce the RAM footprint while the average response time is kept well under control. Moreover, our technique shows big improvement on wear-leveling compared with the previous work.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124039578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Automatic Memory Partitioning: Increasing memory parallelism via data structure partitioning 自动内存分区:通过数据结构分区增加内存并行性
Y. Ben-Asher, Nadav Rotem
{"title":"Automatic Memory Partitioning: Increasing memory parallelism via data structure partitioning","authors":"Y. Ben-Asher, Nadav Rotem","doi":"10.1145/1878961.1878989","DOIUrl":"https://doi.org/10.1145/1878961.1878989","url":null,"abstract":"In high-level synthesis, pipelined designs are often restricted by the number of memory banks available to the synthesis system. Using multiple memory banks can improve the performance of accelerated applications. Currently, programmers must manually assign data structures to specific memory banks on the accelerator. This paper presents Automatic Memory Partitioning, a method for automatically partitioning data structures into multiple memory banks for increased parallelism and performance. We use source code instrumentation to collect memory traces in order to detect linear memory access patterns. The memory traces are used to split data structures into disjoint memory regions and determine which segments may benefit from parallel memory access. Experiments show significant improvements in performance while using a minimal number of memory banks.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134132016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A task remapping technique for reliable multi-core embedded systems 可靠多核嵌入式系统的任务重映射技术
Chanhee Lee, Hokeun Kim, Hae-woo Park, Sungchan Kim, Hyunok Oh, S. Ha
{"title":"A task remapping technique for reliable multi-core embedded systems","authors":"Chanhee Lee, Hokeun Kim, Hae-woo Park, Sungchan Kim, Hyunok Oh, S. Ha","doi":"10.1145/1878961.1879014","DOIUrl":"https://doi.org/10.1145/1878961.1879014","url":null,"abstract":"With the continuous scaling of semiconductor technology, the life-time of circuit is decreasing so that processor failure becomes an important issue in MPSoC design. A software solution to tolerate run-time processor failure is to migrate tasks from the failed processors to the live processors when failure occurs. Previous works on run-time task migration usually aim to minimize the migration overhead with or without a given latency constraint. For streaming applications, however, it is more important to minimize the throughput degradation than the migration overhead or the latency. Hence, we propose a task remapping technique to minimize the throughput degradation assuming that the migration overhead can be amortized safely. The target multi-core system assumed in this paper consists of processor pools and each pool consists of homogeneous processors. The proposed technique is based on an intensive compile-time analysis for all possible failure scenarios. It involves the following steps; 1) Determine the static mapping of tasks onto the live processors, aiming to minimize the throughput degradation: 2) Find an optimal processor-to-processor mapping to minimize the task migration overhead: and 3) Store the resultant task remapping information that includes task mapping and processor-to-processor mapping results. Since the task remapping information is pre-computed at compile-time for all possible failure scenarios, it should be efficiently represented and stored. At run-time, we simply remap the tasks following the compile-time decision. We examine the scalability of the proposed technique on both space and run-time overhead for compile-time analysis varying the number of failed processors. Through intensive experiments, we show that the proposed technique outperforms the previous works with respect to application throughput.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114175151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
A holistic approach to Network-on-Chip synthesis 片上网络综合的整体方法
G. Leary, Karam S. Chatha
{"title":"A holistic approach to Network-on-Chip synthesis","authors":"G. Leary, Karam S. Chatha","doi":"10.1145/1878961.1879001","DOIUrl":"https://doi.org/10.1145/1878961.1879001","url":null,"abstract":"Application specific Network-on-Chip (NoC) architectures have emerged as a leading technology to address the communication woes of multi-processor System-on-Chip architectures. Synthesis approaches for custom NoC must address several requirements including cumulative bandwidth and transaction level (TL) communication requirements, multiple application use-cases, deadlock avoidance, and router port bandwidth and arity constraints. In this paper we present a holistic algorithm for NoC synthesis which is able to address all these requirements together in an integrated manner. The approach is able to generate designs that consume minimum dynamic power consumption, and at most twice the number of routers (and leakage power) as an optimal solution. In terms of performance the technique is able to generate NoC designs with very low average communication latencies (verified by actual simulations) and equally low standard deviation (jitter) while utilizing simple best effort routers. We evaluated the effectiveness and quality of the proposed technique by comparisons with two existing approaches. Extensive experimental results are presented for synthetic/realistic multiple use case applications, cumulative/transaction traffic requirements, increasing application bandwidth requirements, and different port arity constraints.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117284961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications 针对流应用的延迟和吞吐量受限的流水线mpsoc的最佳合成
Haris Javaid, Xin He, A. Ignjatović, S. Parameswaran
{"title":"Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications","authors":"Haris Javaid, Xin He, A. Ignjatović, S. Parameswaran","doi":"10.1145/1878961.1878978","DOIUrl":"https://doi.org/10.1145/1878961.1878978","url":null,"abstract":"A streaming application, characterized by a kernel that can be broken down into independent tasks which can be executed in a pipelined fashion, inherently allows its implementation on a pipeline of Application Specific Instruction set Processors (ASIPs), called a pipelined MPSoC. The latency and throughput requirements of streaming applications put constraints on the design of such a pipelined MPSoC, where each ASIP has a number of available configurations differing by additional instructions, and instruction and data cache sizes. Thus, the design space of a pipelined MPSoC is all the possible combinations of ASIP configurations (design points). In this paper, a methodology is proposed to optimize the area of a pipelined MPSoC under a latency or a throughput constraint. The final design point is a set of ASIP configurations with one configuration for each ASIP. We proposed an Integer Linear Programming (ILP) based solution to the area optimization problem under a latency constraint, and an algorithm for optimization of pipelined MPSoC area under a throughput constraint. The proposed solutions were evaluated using four streaming applications: JPEG encoder; JPEG decoder; MP3 encoder; and H.264 decoder. The time to find the Pareto front of each pipelined MPSoC was less than 4 minutes where design spaces had up to 1016 design points, illustrating the applicability of our approach.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115259022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Scheduling garbage collection in real-time systems 在实时系统中调度垃圾收集
Martin Kero, Simon Aittamaa
{"title":"Scheduling garbage collection in real-time systems","authors":"Martin Kero, Simon Aittamaa","doi":"10.1145/1878961.1878971","DOIUrl":"https://doi.org/10.1145/1878961.1878971","url":null,"abstract":"The key to successful deployment of garbage collection in real-time systems is to enable provably safe schedulability tests of the real-time tasks. At the same time one must be able to determine the total heap usage of the system. Schedulability tests typically require a uniformed model of timing assumptions (inter-arrival times, deadlines, etc.). Incorporating the cost of garbage collection in such tests typically requires both artificial timing assumptions of the garbage collector and restricted capabilities of the task scheduler. In this paper, we pursue a different approach. We show how the reactive object model of the programming language Timber enables us to decouple the cost of a concurrently running copying garbage collector from the schedulability of the real-time tasks. I.e., we enable any regular schedulability analysis without the need of incorporating the cost of an interfering garbage collector. We present the garbage collection demand analysis, which determines if the garbage collector can be feasibly scheduled in the system.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114083833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Statistical approach in a system level methodology to deal with process variation 系统级方法中处理过程变化的统计方法
Concepción Sanz, M. Prieto, J. I. Gómez, C. Tenllado, F. Catthoor
{"title":"Statistical approach in a system level methodology to deal with process variation","authors":"Concepción Sanz, M. Prieto, J. I. Gómez, C. Tenllado, F. Catthoor","doi":"10.1145/1878961.1878983","DOIUrl":"https://doi.org/10.1145/1878961.1878983","url":null,"abstract":"The impact of process variation in state of the art technology makes traditional (worst case) designs unnecessarily pessimistic, which translates to suboptimal designs in terms of both energy consumption and performance. In this context, developing variation aware design methodologies becomes a must. These techniques should provide better performance-energy balances while the percentage of faulty products keeps controlled. Furthermore, it would be advisable to consider adaptations of the system during lifetime, in order to provide robustness against ageing. In this paper we propose a design approach which tackles process variation on the memory system by using multimode memories. At design time we perform a heuristic exploration using probabilistic models of these memories, which generates a set of system configurations that minimize energy consumption for a given set of timing constraints. The percentage of systems that will satisfy these deadlines, even under process variation, is taken as a design parameter. Additionally, if system monitors are available, a setup stage optimizes the initial set of configurations for the actual memory parameters. Our simulations show that this methodology provides significant energy savings while still meeting timing constraints.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130781997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rank based dynamic voltage and frequency scaling for tiled graphics processors 基于等级的动态电压和频率缩放平铺图形处理器
B. Silpa, G. Krishnaiah, P. Panda
{"title":"Rank based dynamic voltage and frequency scaling for tiled graphics processors","authors":"B. Silpa, G. Krishnaiah, P. Panda","doi":"10.1145/1878961.1878965","DOIUrl":"https://doi.org/10.1145/1878961.1878965","url":null,"abstract":"With increasing interest in sophisticated graphics capabilities in mobile systems, energy consumption of graphics hardware is becoming a major design concern in addition to the traditional performance enhancement criteria. Our study of various modern games substantiates the observation that the workload of games varies significantly with time and hence can benefit from dynamic voltage and frequency scaling (DVFS). Since visual quality of graphics applications is highly dependent on the rate at which frames are processed, it is important to devise a DVFS scheme that minimizes deadline misses due to inaccuracies in workload prediction. In this paper, we demonstrate that tiled-graphics renderers exhibit substantial advantages over immediate-mode renderers in obtaining access to frame parameters that help in enhancing the workload estimation accuracy. We also show that, operating at a finer granularity of “tiles” as opposed to “frames” allows early detection and corrective action in case of a mis-prediction. We propose an accurate workload estimation technique and two DVFS schemes: (i) tile-history based DVFS and (ii) tile-rank based DVFS for tiled-rendering architectures. The proposed schemes are demonstrated to be more efficient in terms of power and performance than the frame level DVFS schemes proposed in recent literature. With a system with 8 DVFS levels, our tile-history based DVFS scheme results in 60% improvement in quality (deadline misses) over the frame history based DVFS schemes and gives 58% saving in energy. The more sophisticated tile-rank based scheme achieves 75% improvement in quality over the frame history based DVFS scheme and results in 58% saving in energy. We have also compared the efficiency of the proposed tile-level DVFS schemes with frame-level schemes for different number of DVFS levels, and found that while the frame-level schemes suffer from increasing deadline misses as the frequency levels increase, the impact on tile-level schemes is negligible. The Energy per Frame-rate for our scheme is the minimum, indicating that it delivers the best performance-energy results.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128426849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Accurate online power estimation and automatic battery behavior based power model generation for smartphones 智能手机准确的在线功率估计和基于电池行为的自动功率模型生成
Lide Zhang, B. Tiwana, Zhiyun Qian, Zhaoguang Wang, R. Dick, Z. Morley Mao, Lei Yang
{"title":"Accurate online power estimation and automatic battery behavior based power model generation for smartphones","authors":"Lide Zhang, B. Tiwana, Zhiyun Qian, Zhaoguang Wang, R. Dick, Z. Morley Mao, Lei Yang","doi":"10.1145/1878961.1878982","DOIUrl":"https://doi.org/10.1145/1878961.1878982","url":null,"abstract":"This paper describes PowerBooter, an automated power model construction technique that uses built-in battery voltage sensors and knowledge of battery discharge behavior to monitor power consumption while explicitly controlling the power management and activity states of individual components. It requires no external measurement equipment. We also describe PowerTutor, a component power management and activity state introspection based tool that uses the model generated by PowerBooter for online power estimation. PowerBooter is intended to make it quick and easy for application developers and end users to generate power models for new smartphone variants, which each have different power consumption properties and therefore require different power models. PowerTutor is intended to ease the design and selection of power efficient software for embedded systems. Combined, PowerBooter and PowerTutor have the goal of opening power modeling and analysis for more smartphone variants and their users.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129666908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1239
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信