International Conference on Compilers, Architecture, and Synthesis for Embedded Systems最新文献

Cyber physical systems: Systems engineering of industrial embedded systems - Barriers, enablers and opportunities 网络物理系统:工业嵌入式系统的系统工程。障碍、促成因素和机会

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2013-09-01 DOI: 10.1109/CASES.2013.6662503

C. Jacobson, R. Schooler, M. Laurence

引用次数: 0

The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems 多核异构系统中自动功能专门化的RACECAR启发式算法

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380423

J. Wernsing, G. Stitt

{"title":"The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems","authors":"J. Wernsing, G. Stitt","doi":"10.1145/2380403.2380423","DOIUrl":"https://doi.org/10.1145/2380403.2380423","url":null,"abstract":"Embedded systems increasingly combine multi-core processors and heterogeneous resources such as graphics-processing units and field-programmable gate arrays. However, significant application design complexity for such systems caused by parallel programming and device-specific challenges has often led to untapped performance potential. Application developers targeting such systems currently must determine how to parallelize computation, create different device-specialized implementations for each heterogeneous resource, and then determine how to apportion work to each resource. In this paper, we present the RACECAR heuristic to automate the optimization of applications for multi-core heterogeneous systems by automatically exploring implementation alternatives that include different algorithms, parallelization strategies, and work distributions. Experimental results show RACECAR-specialized implementations can effectively incorporate provided implementations and parallelize computation across multiple cores, graphics-processing units, and field-programmable gate arrays, improving performance by an average of 47x compared to a CPU, while the fastest provided implementations are only able to average 33x.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115138732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Function inlining and loop unrolling for loop acceleration in reconfigurable processors 可重构处理器中用于循环加速的函数内联和循环展开

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380426

Narasinga Rao Miniskar, Pankaj Shailendra Gode, Soma Kohli, Donghoon Yoo

{"title":"Function inlining and loop unrolling for loop acceleration in reconfigurable processors","authors":"Narasinga Rao Miniskar, Pankaj Shailendra Gode, Soma Kohli, Donghoon Yoo","doi":"10.1145/2380403.2380426","DOIUrl":"https://doi.org/10.1145/2380403.2380426","url":null,"abstract":"The next generation SoCs for consumer electronics need software solutions for faster time-to-market, lower development cost and higher performance while maintaining lower energy consumption and area. As a result, reconfigurable processors (RPs) have become increasingly important, which enables just enough exibility of accepting software solutions and providing application-specific hardware reconfigurability. Samsung Electronics has developed a reconfigurable processor called Samsung Reconfigurable Processor (SRP), which is the basis of our work. Though, the SRP is a powerful processor, it requires a smart and intelligent compiler to compile the application software while exploring its reconfigurable architecture. The existing compiler for the SRP does not support functional inlining and loop unrolling, and no study has yet been done on these optimizations for the RPs. In this paper, we study the impact of these optimizations on the performance of applications for the SRP processor and we also show how these optimizations are supported in the SRP compiler. We analyze the performance improvement due to these optimizations on various benchmarks namely Sobel Edge filter, JPEG decoder, and Luma Deblocking filter of the H.264 standard. Our experimental results have shown about 83% gain on performance with the functional inlining optimization and the loop unrolling optimization when compared to the original code for Sobel filter and JPEG encoder, and 11% gain on performance for Luma Deblock filter.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124604586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Static task partitioning for locked caches in multi-core real-time systems 多核实时系统中锁定缓存的静态任务分区

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380434

Abhik Sarkar, F. Mueller, H. Ramaprasad

{"title":"Static task partitioning for locked caches in multi-core real-time systems","authors":"Abhik Sarkar, F. Mueller, H. Ramaprasad","doi":"10.1145/2380403.2380434","DOIUrl":"https://doi.org/10.1145/2380403.2380434","url":null,"abstract":"Locking cache lines in hard real-time systems is a common means to ensure timing predictability of data references and to lower bounds on worst-case execution time, especially in a multi-tasking environment. Growing processing demand on multi-tasking real-time systems can be met by employing scalable multi-core architectures, like the recently introduced tile-based architectures. This paper studies the use of cache locking on massive multi-core architectures with private caches in the context of hard real-time systems. In shared cache architectures, a single resource is shared among {em all} the tasks. However, in scalable cache architectures with private caches, conflicts exist only among the tasks scheduled on one core. This calls for a cache-aware allocation of tasks onto cores. Our work extends the cache-unaware First Fit Decreasing (FFD) algorithm with a Naive locked First Fit Decreasing (NFFD) policy. We further propose two cache-aware static scheduling schemes: (1) Greedy First Fit Decreasing (GFFD) and (2) Colored First Fit Decreasing (CoFFD). This work contributes an adaptation of these algorithms for conflict resolution of partially locked regions. Experiments indicate that NFFD is capable of scheduling high utilization task sets that FFD cannot schedule. Experiments also show that CoFFD consistently outperforms GFFD resulting in lower number of cores and lower system utilization. CoFFD reduces the number of core requirements from 30% to 60% compared to NFFD. With partial locking, the number of cores in some cases is reduced by almost 50% with an increase in system utilization of 10%. Overall, this work is unique in considering the challenges of future multi-core architectures for real-time systems and provides key insights into task partitioning with locked caches for architectures with private caches.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"581 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123937703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A hybrid just-in-time compiler for android: comparing JIT types and the result of cooperation android的混合即时编译器:比较JIT类型和合作的结果

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380418

Guillermo A. Pérez, Chung-Min Kao, Yeh-Ching Chung, W. Hsu

引用次数: 10

Revisiting level-0 caches in embedded processors 重新访问嵌入式处理器中的0级缓存

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380435

Nam Duong, Taesu Kim, Dali Zhao, A. Veidenbaum

引用次数: 15

A cost-effective tag design for memory data authentication in embedded systems 一种低成本的嵌入式系统内存数据认证标签设计

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380414

Mei Hong, Hui Guo, X. Hu

引用次数: 17

Analytical approaches for performance evaluation of networks-on-chip 片上网络性能评估的分析方法

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380442

A. E. Kiasari, A. Jantsch, M. Bekooij, A. Burns, Zhonghai Lu

引用次数: 2

Energy efficient special instruction support in an embedded processor with compact isa 具有紧凑isa的嵌入式处理器中节能的特殊指令支持

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380430

Dongrui She, Yifan He, H. Corporaal

{"title":"Energy efficient special instruction support in an embedded processor with compact isa","authors":"Dongrui She, Yifan He, H. Corporaal","doi":"10.1145/2380403.2380430","DOIUrl":"https://doi.org/10.1145/2380403.2380430","url":null,"abstract":"The use of special instructions that execute complex operation patterns is a common approach in application specific processor design to improve performance and efficiency. However, in an embedded generic processor with compact instruction set architecture (ISA), such instructions may lead to large overhead as: i) more bits are needed to encode the extra opcodes and operands, resulting in wider instructions; ii) more register file (RF) ports are required to provide the extra operands to the function units. Such overhead may increase energy consumption considerably.\u0000 In this paper, we propose to support flexible operation pair patterns in a processor with a compact 24-bit RISC-like ISA using: i) a partially reconfigurable decoder that exploits the locality of patterns to reduce the requirement for opcode space; ii) a software controlled bypass network to reduce the requirement for operand encoding and RF ports. We also propose an energy-aware compiler backend design for the proposed architecture that performs pattern selection and bypass-aware scheduling to generate energy efficient codes. Though proposed design imposes extra constraints on the operation patterns, the experimental results show that the average dynamic instruction count is reduced by over 25%, which is only about 2% less than the architecture without such constraints. Due to the low overhead, the total energy of the proposed architecture reduces by an average of 15.8% compared to the RISC baseline, while the one without constraints achieves almost no energy improvement.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134521359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Integrating software caches with scratch pad memory 集成软件缓存与刮板存储器

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380440

Prasenjit Chakraborty, P. Panda

{"title":"Integrating software caches with scratch pad memory","authors":"Prasenjit Chakraborty, P. Panda","doi":"10.1145/2380403.2380440","DOIUrl":"https://doi.org/10.1145/2380403.2380440","url":null,"abstract":"Software cache refers to cache functionality emulated in software on a compiler-controlled Scratch Pad Memory (SPM). Such structures are useful when standard SPM allocation strategies cannot be used due to hard-to-analyze memory reference patterns in the source code. SPM data allocation strategies generally rely on compile-time inference of spatial and temporal reuse, with the general flow being the copying of a block/tile of array data into the SPM, followed by its processing, and finally, copying back. However, when array index functions are complicated due to conditionals, complex expressions, and dependence on run-time data, the SPM compiler has to rely on expensive DMA for individual words, leading to poor performance. Software caches (SWC) can play a crucial role in improving performance under such circumstances -- their access times are longer than those for direct SPM access, but they retain the advantages (present in hardware caches) of exploiting spatial and temporal locality discovered at run-time. We present the first automated compiler data allocation strategy that considers the presence of a software cache in SPM space, and makes decisions on which arrays should be accessed through it, at which times. Arrays could be accessed differently in different parts of a program, and our algorithm analyzes such uses and considers the possibility of selectively accessing an array through the SWC only when it is efficient, based on a cost model of the overheads involved in SPM/SWC transitions. We implemented our technique in an LLVM based framework and experimented with several applications on a Cell based machine. Our technique results in up to 82% overall performance improvement over a conventional SPM mapping algorithm and up to 27% over a typical SWC-enhanced implementation.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114554396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7