International Conference on Compilers, Architecture, and Synthesis for Embedded Systems最新文献

筛选
英文 中文
Cyber physical systems: Systems engineering of industrial embedded systems - Barriers, enablers and opportunities 网络物理系统:工业嵌入式系统的系统工程。障碍、促成因素和机会
C. Jacobson, R. Schooler, M. Laurence
{"title":"Cyber physical systems: Systems engineering of industrial embedded systems - Barriers, enablers and opportunities","authors":"C. Jacobson, R. Schooler, M. Laurence","doi":"10.1109/CASES.2013.6662503","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662503","url":null,"abstract":"Cyber physical systems: systems engineering of industrial embedded systems-barriers, enablers and opportunities; high-performance, scalable, general-purpose processors to accelerate high-throughput networking and security applications; Low-power high-performance asynchronous processors.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129705430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems 多核异构系统中自动功能专门化的RACECAR启发式算法
J. Wernsing, G. Stitt
{"title":"The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems","authors":"J. Wernsing, G. Stitt","doi":"10.1145/2380403.2380423","DOIUrl":"https://doi.org/10.1145/2380403.2380423","url":null,"abstract":"Embedded systems increasingly combine multi-core processors and heterogeneous resources such as graphics-processing units and field-programmable gate arrays. However, significant application design complexity for such systems caused by parallel programming and device-specific challenges has often led to untapped performance potential. Application developers targeting such systems currently must determine how to parallelize computation, create different device-specialized implementations for each heterogeneous resource, and then determine how to apportion work to each resource. In this paper, we present the RACECAR heuristic to automate the optimization of applications for multi-core heterogeneous systems by automatically exploring implementation alternatives that include different algorithms, parallelization strategies, and work distributions. Experimental results show RACECAR-specialized implementations can effectively incorporate provided implementations and parallelize computation across multiple cores, graphics-processing units, and field-programmable gate arrays, improving performance by an average of 47x compared to a CPU, while the fastest provided implementations are only able to average 33x.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115138732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Function inlining and loop unrolling for loop acceleration in reconfigurable processors 可重构处理器中用于循环加速的函数内联和循环展开
Narasinga Rao Miniskar, Pankaj Shailendra Gode, Soma Kohli, Donghoon Yoo
{"title":"Function inlining and loop unrolling for loop acceleration in reconfigurable processors","authors":"Narasinga Rao Miniskar, Pankaj Shailendra Gode, Soma Kohli, Donghoon Yoo","doi":"10.1145/2380403.2380426","DOIUrl":"https://doi.org/10.1145/2380403.2380426","url":null,"abstract":"The next generation SoCs for consumer electronics need software solutions for faster time-to-market, lower development cost and higher performance while maintaining lower energy consumption and area. As a result, reconfigurable processors (RPs) have become increasingly important, which enables just enough exibility of accepting software solutions and providing application-specific hardware reconfigurability. Samsung Electronics has developed a reconfigurable processor called Samsung Reconfigurable Processor (SRP), which is the basis of our work. Though, the SRP is a powerful processor, it requires a smart and intelligent compiler to compile the application software while exploring its reconfigurable architecture. The existing compiler for the SRP does not support functional inlining and loop unrolling, and no study has yet been done on these optimizations for the RPs. In this paper, we study the impact of these optimizations on the performance of applications for the SRP processor and we also show how these optimizations are supported in the SRP compiler. We analyze the performance improvement due to these optimizations on various benchmarks namely Sobel Edge filter, JPEG decoder, and Luma Deblocking filter of the H.264 standard. Our experimental results have shown about 83% gain on performance with the functional inlining optimization and the loop unrolling optimization when compared to the original code for Sobel filter and JPEG encoder, and 11% gain on performance for Luma Deblock filter.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124604586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Static task partitioning for locked caches in multi-core real-time systems 多核实时系统中锁定缓存的静态任务分区
Abhik Sarkar, F. Mueller, H. Ramaprasad
{"title":"Static task partitioning for locked caches in multi-core real-time systems","authors":"Abhik Sarkar, F. Mueller, H. Ramaprasad","doi":"10.1145/2380403.2380434","DOIUrl":"https://doi.org/10.1145/2380403.2380434","url":null,"abstract":"Locking cache lines in hard real-time systems is a common means to ensure timing predictability of data references and to lower bounds on worst-case execution time, especially in a multi-tasking environment. Growing processing demand on multi-tasking real-time systems can be met by employing scalable multi-core architectures, like the recently introduced tile-based architectures. This paper studies the use of cache locking on massive multi-core architectures with private caches in the context of hard real-time systems. In shared cache architectures, a single resource is shared among {em all} the tasks. However, in scalable cache architectures with private caches, conflicts exist only among the tasks scheduled on one core. This calls for a cache-aware allocation of tasks onto cores. Our work extends the cache-unaware First Fit Decreasing (FFD) algorithm with a Naive locked First Fit Decreasing (NFFD) policy. We further propose two cache-aware static scheduling schemes: (1) Greedy First Fit Decreasing (GFFD) and (2) Colored First Fit Decreasing (CoFFD). This work contributes an adaptation of these algorithms for conflict resolution of partially locked regions. Experiments indicate that NFFD is capable of scheduling high utilization task sets that FFD cannot schedule. Experiments also show that CoFFD consistently outperforms GFFD resulting in lower number of cores and lower system utilization. CoFFD reduces the number of core requirements from 30% to 60% compared to NFFD. With partial locking, the number of cores in some cases is reduced by almost 50% with an increase in system utilization of 10%. Overall, this work is unique in considering the challenges of future multi-core architectures for real-time systems and provides key insights into task partitioning with locked caches for architectures with private caches.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"581 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123937703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A hybrid just-in-time compiler for android: comparing JIT types and the result of cooperation android的混合即时编译器:比较JIT类型和合作的结果
Guillermo A. Pérez, Chung-Min Kao, Yeh-Ching Chung, W. Hsu
{"title":"A hybrid just-in-time compiler for android: comparing JIT types and the result of cooperation","authors":"Guillermo A. Pérez, Chung-Min Kao, Yeh-Ching Chung, W. Hsu","doi":"10.1145/2380403.2380418","DOIUrl":"https://doi.org/10.1145/2380403.2380418","url":null,"abstract":"The Dalvik virtual machine is the main application platform running on Google's Android operating system for mobile devices and tablets. It is a Java Virtual Machine running a basic trace-based JIT compiler, unlike web browser JavaScript engines that usually run a combination of both method and trace-based JIT types. We developed a method-based JIT compiler based on the Low Level Virtual Machine framework that delivers performance improvement comparable to that of an Ahead-Of-Time compiler. We compared our method-based JIT against Dalvik's own trace-based JIT using common benchmarks available in the Android Market. Our results show that our method-based JIT is better than a basic trace-based JIT, and that, by sharing profiling and compilation information among each other, a smart combination of both JIT techniques can achieve a great performance gain.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130473879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Revisiting level-0 caches in embedded processors 重新访问嵌入式处理器中的0级缓存
Nam Duong, Taesu Kim, Dali Zhao, A. Veidenbaum
{"title":"Revisiting level-0 caches in embedded processors","authors":"Nam Duong, Taesu Kim, Dali Zhao, A. Veidenbaum","doi":"10.1145/2380403.2380435","DOIUrl":"https://doi.org/10.1145/2380403.2380435","url":null,"abstract":"Level-0 (L0) caches have been proposed in the past as an inexpensive way to improve performance and reduce energy consumption in resource-constrained embedded processors. This paper proposes new L0 data cache organizations using the assumption that an L0 hit/miss determination can be completed prior to the L1 access. This is a realistic assumption for very small L0 caches that can nevertheless deliver significant miss rate and/or energy reduction. The key issue for such caches is how and when to move data between the L0 and L1 caches. The first new cache, a flow cache, targets a conflict miss reduction in a direct-mapped L1 cache. It offers a simpler hardware design and uses on average 10% less dynamic energy than the victim cache with nearly identical performance. The second new cache, a hit cache, reduces the dynamic energy consumption in a set-associative L1 cache by 30% without impacting performance. A variant of this policy reduces the dynamic energy consumption by up to 50%, with 5% performance degradation.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129661485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A cost-effective tag design for memory data authentication in embedded systems 一种低成本的嵌入式系统内存数据认证标签设计
Mei Hong, Hui Guo, X. Hu
{"title":"A cost-effective tag design for memory data authentication in embedded systems","authors":"Mei Hong, Hui Guo, X. Hu","doi":"10.1145/2380403.2380414","DOIUrl":"https://doi.org/10.1145/2380403.2380414","url":null,"abstract":"This paper presents a tag design approach for memory data integrity protection. The approach is area, power and memory efficient, suitable to embedded systems that often suffer from stringent resource restriction. Experiments have been performed to compare the proposed approach with the state-of-the-art designs, which demonstrate that the approach can produce a memory data protection design with a low resource cost - achieving overhead savings of about 39% on chip area, 45% on power consumption, 65% on performance, and 12% on memory cost while maintaining the same or higher security level.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130966351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Analytical approaches for performance evaluation of networks-on-chip 片上网络性能评估的分析方法
A. E. Kiasari, A. Jantsch, M. Bekooij, A. Burns, Zhonghai Lu
{"title":"Analytical approaches for performance evaluation of networks-on-chip","authors":"A. E. Kiasari, A. Jantsch, M. Bekooij, A. Burns, Zhonghai Lu","doi":"10.1145/2380403.2380442","DOIUrl":"https://doi.org/10.1145/2380403.2380442","url":null,"abstract":"This tutorial reviews four popular mathematical formalisms -- dataflow analysis, schedulability analysis, network calculus, and queueing theory -- and how they have been applied to the analysis of Network-on-Chip (NoC) performance. We review the basic concepts and results of each formalism and provide examples of how they have been used in on-chip communication performance analysis. The tutorial also discusses the respective strengths and weaknesses of each formalism, their suitability for a specific purpose, and the attempts that have been made to bridge these analytical approaches. Finally, we conclude the tutorial by discussing open research issues.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115140473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy efficient special instruction support in an embedded processor with compact isa 具有紧凑isa的嵌入式处理器中节能的特殊指令支持
Dongrui She, Yifan He, H. Corporaal
{"title":"Energy efficient special instruction support in an embedded processor with compact isa","authors":"Dongrui She, Yifan He, H. Corporaal","doi":"10.1145/2380403.2380430","DOIUrl":"https://doi.org/10.1145/2380403.2380430","url":null,"abstract":"The use of special instructions that execute complex operation patterns is a common approach in application specific processor design to improve performance and efficiency. However, in an embedded generic processor with compact instruction set architecture (ISA), such instructions may lead to large overhead as: i) more bits are needed to encode the extra opcodes and operands, resulting in wider instructions; ii) more register file (RF) ports are required to provide the extra operands to the function units. Such overhead may increase energy consumption considerably.\u0000 In this paper, we propose to support flexible operation pair patterns in a processor with a compact 24-bit RISC-like ISA using: i) a partially reconfigurable decoder that exploits the locality of patterns to reduce the requirement for opcode space; ii) a software controlled bypass network to reduce the requirement for operand encoding and RF ports. We also propose an energy-aware compiler backend design for the proposed architecture that performs pattern selection and bypass-aware scheduling to generate energy efficient codes. Though proposed design imposes extra constraints on the operation patterns, the experimental results show that the average dynamic instruction count is reduced by over 25%, which is only about 2% less than the architecture without such constraints. Due to the low overhead, the total energy of the proposed architecture reduces by an average of 15.8% compared to the RISC baseline, while the one without constraints achieves almost no energy improvement.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134521359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Integrating software caches with scratch pad memory 集成软件缓存与刮板存储器
Prasenjit Chakraborty, P. Panda
{"title":"Integrating software caches with scratch pad memory","authors":"Prasenjit Chakraborty, P. Panda","doi":"10.1145/2380403.2380440","DOIUrl":"https://doi.org/10.1145/2380403.2380440","url":null,"abstract":"Software cache refers to cache functionality emulated in software on a compiler-controlled Scratch Pad Memory (SPM). Such structures are useful when standard SPM allocation strategies cannot be used due to hard-to-analyze memory reference patterns in the source code. SPM data allocation strategies generally rely on compile-time inference of spatial and temporal reuse, with the general flow being the copying of a block/tile of array data into the SPM, followed by its processing, and finally, copying back. However, when array index functions are complicated due to conditionals, complex expressions, and dependence on run-time data, the SPM compiler has to rely on expensive DMA for individual words, leading to poor performance. Software caches (SWC) can play a crucial role in improving performance under such circumstances -- their access times are longer than those for direct SPM access, but they retain the advantages (present in hardware caches) of exploiting spatial and temporal locality discovered at run-time. We present the first automated compiler data allocation strategy that considers the presence of a software cache in SPM space, and makes decisions on which arrays should be accessed through it, at which times. Arrays could be accessed differently in different parts of a program, and our algorithm analyzes such uses and considers the possibility of selectively accessing an array through the SWC only when it is efficient, based on a cost model of the overheads involved in SPM/SWC transitions. We implemented our technique in an LLVM based framework and experimented with several applications on a Cell based machine. Our technique results in up to 82% overall performance improvement over a conventional SPM mapping algorithm and up to 27% over a typical SWC-enhanced implementation.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114554396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信