International Conference on Compilers, Architecture, and Synthesis for Embedded Systems最新文献_第2页

Energy efficient hybrid display and predictive models for embedded and mobile systems 用于嵌入式和移动系统的节能混合显示和预测模型

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380429

Y. Wen, Ziyi Liu, W. Shi, Yifei Jiang, A. Cheng, Khoa N. Le

引用次数: 5

When less is more (LIMO):controlled parallelism forimproved efficiency 当少即是多(LIMO):控制并行以提高效率

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380431

Gaurav Chadha, S. Mahlke, S. Narayanasamy

{"title":"When less is more (LIMO):controlled parallelism forimproved efficiency","authors":"Gaurav Chadha, S. Mahlke, S. Narayanasamy","doi":"10.1145/2380403.2380431","DOIUrl":"https://doi.org/10.1145/2380403.2380431","url":null,"abstract":"While developing shared-memory programs, programmers often contend with the problem of how many threads to create for best efficiency. Creating as many threads as the number of available processor cores, or more, may not be the most efficient configuration. Too many threads can result in excessive contention for shared resources, wasting energy, which is of primary concern for embedded devices. Furthermore, thermal and power constraints prevent us from operating all the processor cores at the highest possible frequency, favoring fewer threads. The best number of threads to run depends on the application, user input and hardware resources available. It can also change at runtime making it infeasible for the programmer to determine this number.\u0000 To address this problem, we propose LIMO, a runtime system that dynamically manages the number of running threads of an application for maximizing peformance and energy-efficiency. LIMO monitors threads' progress along with the usage of shared hardware resources to determine the best number of threads to run and the voltage and frequency level. With dynamic adaptation, LIMO provides an average of 21% performance improvement and a 2x improvement in energy-efficiency on a 32-core system over the default configuration of 32 threads for a set of concurrent applications from the PARSEC suite, the Apache web server, and the Sphinx speech recognition system.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132574865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Scenario-based design flow for mapping streaming applications onto on-chip many-core systems 将流应用程序映射到片上多核系统的基于场景的设计流程

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380422

Lars Schor, Iuliana Bacivarov, Devendra Rai, Hoeseok Yang, Shin-Haeng Kang, L. Thiele

{"title":"Scenario-based design flow for mapping streaming applications onto on-chip many-core systems","authors":"Lars Schor, Iuliana Bacivarov, Devendra Rai, Hoeseok Yang, Shin-Haeng Kang, L. Thiele","doi":"10.1145/2380403.2380422","DOIUrl":"https://doi.org/10.1145/2380403.2380422","url":null,"abstract":"The next generation of embedded software has high performance requirements and is increasingly dynamic. Multiple applications are typically sharing the system, running in parallel in different combinations, starting and stopping their individual execution at different moments in time. The different combinations of applications are forming system execution scenarios. In this paper, we present the distributed application layer, a scenario-based design flow for mapping a set of applications onto heterogeneous on-chip many-core systems. Applications are specified as Kahn process networks and the execution scenarios are combined into a finite state machine. Transitions between scenarios are triggered by behavioral events generated by either running applications or the run-time system. A set of optimal mappings are precalculated during design-time analysis. Later, at run-time, hierarchically organized controllers monitor behavioral events, and apply the precalculated mappings when starting new applications. To handle architectural failures, spare cores are allocated at design-time. At run-time, the controllers have the ability to move all processes assigned to a faulty physical core to a spare core. Finally, we apply the proposed design flow to design and optimize a picture-in-picture software.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132839779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 113

A low-overhead interconnect architecture for virtual reconfigurable fabrics 用于虚拟可重构结构的低开销互连体系结构

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380427

Aaron Landy, G. Stitt

{"title":"A low-overhead interconnect architecture for virtual reconfigurable fabrics","authors":"Aaron Landy, G. Stitt","doi":"10.1145/2380403.2380427","DOIUrl":"https://doi.org/10.1145/2380403.2380427","url":null,"abstract":"Field-programmable gate arrays (FPGAs) have been widely shown to have significant performance and power advantages compared to microprocessors and graphics-processing units (GPUs), but remain a niche technology due in part to productivity challenges. Although such challenges have numerous causes, previous work has shown two significant contributing factors: 1) prohibitive place-and-route times preventing mainstream design methodologies, and 2) limited application portability preventing design reuse. Virtual reconfigurable architectures, referred to as intermediate fabrics (IFs), were recently introduced as a potential solution to these problems, providing 100x-1000x place-and-route speedup, while also enabling application portability across potentially any physical FPGA. However, one significant limitation of existing intermediate fabrics is area overhead incurred from virtualized interconnect resources. In this paper, we perform design-space exploration of virtual interconnect architectures and introduce an optimized virtual interconnect that reduces area overhead by 48% to 54% compared to previous work, while also improving clock frequencies by 24% with a modest routability overhead of 16%.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114849362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Lazy cache invalidation for self-modifying codes 自修改代码的延迟缓存失效

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380433

Anthony Gutierrez, Joseph Pusdesris, R. Dreslinski, T. Mudge

{"title":"Lazy cache invalidation for self-modifying codes","authors":"Anthony Gutierrez, Joseph Pusdesris, R. Dreslinski, T. Mudge","doi":"10.1145/2380403.2380433","DOIUrl":"https://doi.org/10.1145/2380403.2380433","url":null,"abstract":"Just-in-time compilation with dynamic code optimization is often used to help improve the performance of applications that utilize high-level languages and virtual run-time environments, such as those found in smartphones. Just-in-time compilation introduces additional overhead into the instruction fetch stage of a processor that is particularly problematic for user applications-instruction cache invalidation due to the use of self-modifying code. This software-assisted cache coherence serializes cache line invalidations, or causes a costly invalidation of the entire instruction cache, and prevents useful instructions from being fetched for the period during which the stale instructions are being invalidated. This overhead is not acceptable for user applications, which are expected to respond quickly.\u0000 In this work we introduce a new technique that can, using a single instruction, invalidate cache lines in page-sized chunks as opposed to invalidating only a single line at a time. Lazy cache invalidation reduces the amount of time spent stalling due to instruction cache invalidation by removing stale instructions on demand as they are accessed, as opposed to all at once. The key observation behind lazy cache invalidation is that stale instructions do not necessarily need to be removed from the instruction cache; as long as it is guaranteed that attempts to fetch stale instructions will not hit in the instruction cache, the program will behave as the developer had intended.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114435706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Static secure page allocation for light-weight dynamic information flow tracking 用于轻量级动态信息流跟踪的静态安全页面分配

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380415

Juan Carlos Martínez Santos, Yunsi Fei, Z. Shi

引用次数: 3

From sequential programming to flexible parallel execution 从顺序编程到灵活的并行执行

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380417

A. Raman, Jae W. Lee, David I. August

引用次数: 2

Power agnostic technique for efficient temperature estimation of multicore embedded systems 多核嵌入式系统有效温度估计的功率不可知技术

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380421

Devendra Rai, Hoeseok Yang, Iuliana Bacivarov, L. Thiele

{"title":"Power agnostic technique for efficient temperature estimation of multicore embedded systems","authors":"Devendra Rai, Hoeseok Yang, Iuliana Bacivarov, L. Thiele","doi":"10.1145/2380403.2380421","DOIUrl":"https://doi.org/10.1145/2380403.2380421","url":null,"abstract":"Temperature plays an increasingly important role in the overall performance and reliability of a computing system. Multi- and many-core systems provide an opportunity to manage the overall temperature profile by cleverly designing the application-to-core mapping and the associated scheduling policies. An uncontrolled temperature profile may lead to an unplanned performance loss, since the system activates protective mechanisms such as voltage and/or frequency scaling to cool itself. Similarly, deep thermal cycles with high frequency lead to severe deterioration in the overall reliability of the system. Design space exploration tools are often used to optimize binding and scheduling choices based on a given set of constraints and objectives, thus motivating the need for fast and accurate temperature estimation techniques. We argue that the currently available techniques are not an ideal fit to design space exploration tools, and suggest a system level technique which is based on application fingerprinting. It does not need any information about the processor floorplan, the physical and thermal structure, or about power consumption. Instead, its temperature estimation is based on a set of application-specific calibration runs and associated temperature measurements using available built-in sensors. We show that a given application possesses a unique thermal signature on the system it executes on, which provides a computationally fast method to calculate accurate temperature traces. Extensive experimental studies show that our technique can estimate temperature on all cores of a system to within $5^{o}C$, and is three orders of magnitude faster than state of the art numerical simulators like emph{Hotspot.}","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121888094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

LLBT: an LLVM-based static binary translator LLBT:一个基于llvm的静态二进制转换器

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380419

Bor-Yeh Shen, Jiunn-Yeu Chen, W. Hsu, Wuu Yang

{"title":"LLBT: an LLVM-based static binary translator","authors":"Bor-Yeh Shen, Jiunn-Yeu Chen, W. Hsu, Wuu Yang","doi":"10.1145/2380403.2380419","DOIUrl":"https://doi.org/10.1145/2380403.2380419","url":null,"abstract":"Lack of applications has always been a serious concern for designing machines with a new but incompatible ISA. To address this concern, binary translation is one common technique to migrate applications from one legacy ISA to new ones. In the past, dynamic binary translation (DBT) has been more widely adopted for migrating applications since it avoids some challenging problems for binary translation such as code discovery for variable length ISA and code location issues for handling indirect branches. Static binary translation (SBT) is usually regarded as a less general solution and has not been actively researched on. However, SBT has advantages of performing more aggressive optimizations, which could yield more compact code and greater code quality. In general, SBT translated applications are likely to consume less memory, processor cycles and power, and can be started more quickly. All the above advantages are more critical for embedded systems than for general systems. Therefore, we believe that even though SBT is not as general as DBT, it has a unique role to play for migrating applications in embedded systems.\u0000 In this paper, we designed and implemented a new portable SBT tool, called LLBT, which translates source binary into LLVM IR and then retargets the LLVM IR to various ISAs by using the LLVM compiler infrastructure. Using the LLVM compiler infrastructure, LLBT successfully leverages two important functionalities from LLVM: the comprehensive optimizations and the retargetability. For example, most DBTs map guest architecture states into the host registers to minimize accessing guest architecture states with memory operations, but must deal with guest architecture state saving/reloading at trace/block entry/exit points. LLBT can treat the complete application binary as a single function and uses the global register allocation optimization in LLVM to consistently map guest architecture states in host registers so as to avoid the costly state saving and reloading at trace/block exits.\u0000 In this paper, we have shown our ARM-based LLBT can effectively migrate EEMBC benchmark Suite from ARMv5 to Intel IA32, Intel x64, MIPS, and other ARMs such as ARMv7. On the Intel i7 based host systems, the LLBT generated code can run 3 to 64 times faster than emulating with QEMU, which uses the DBT technique.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126497451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Architectural synthesis of flow-based microfluidic large-scale integration biochips 流动微流控大规模集成生物芯片的结构合成

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI: 10.1145/2380403.2380437

W. H. Minhass, P. Pop, J. Madsen, Felician Stefan Blaga

{"title":"Architectural synthesis of flow-based microfluidic large-scale integration biochips","authors":"W. H. Minhass, P. Pop, J. Madsen, Felician Stefan Blaga","doi":"10.1145/2380403.2380437","DOIUrl":"https://doi.org/10.1145/2380403.2380437","url":null,"abstract":"Microfluidic biochips are replacing the conventional biochemical analyzers and are able to integrate the necessary functions for biochemical analysis on-chip. In this paper we are interested in flow-based biochips, in which the flow of liquid is manipulated using integrated microvalves. By combining several microvalves, more complex units, such as micropumps, switches, mixers, and multiplexers, can be built. The manufacturing technology, soft lithography, used for the flow-based biochips is advancing faster than Moore's law, resulting in increased architectural complexity. However, the designers are still using full-custom and bottom-up, manual techniques in order to design and implement these chips. As the chips become larger and the applications become more complex, the manual methodologies will not scale, becoming highly inadequate. Therefore, for the first time to our knowledge,we propose a top-down architectural synthesis methodology for the flow-based biochips. Starting from a given biochemical application and a microfluidic component library, we are interested in synthesizing a biochip architecture, i.e., performing component allocation from the library based on the biochemical application, generating the biochip schematic (netlist) and then performing physical synthesis (deciding the placement of the microfluidic components on the chip and performing routing of the microfluidic channels), such that the application completion time is minimized. We evaluate our proposed approach by synthesizing architectures for real-life applications as well as synthetic benchmarks.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"440 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122930052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75