International Conference on Compilers, Architecture, and Synthesis for Embedded Systems最新文献_第8页

Exploiting residue number system for power-efficient digital signal processing in embedded processors 利用剩余数系统在嵌入式处理器中实现高效节能的数字信号处理

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629401

Rooju Chokshi, Krzysztof S. Berezowski, Aviral Shrivastava, S. Piestrak

{"title":"Exploiting residue number system for power-efficient digital signal processing in embedded processors","authors":"Rooju Chokshi, Krzysztof S. Berezowski, Aviral Shrivastava, S. Piestrak","doi":"10.1145/1629395.1629401","DOIUrl":"https://doi.org/10.1145/1629395.1629401","url":null,"abstract":"2's complement number system imposes a fundamental limitation on the power and performance of arithmetic circuits, due to the fundamental need of cross-datapath carry propagation. Residue Number System (RNS) breaks free of these bonds by decomposing a number into parts and performing arithmetic operations in parallel, significantly reducing the breadth of carry propagation. Consequently, RNS arithmetic has been proposed as a solution to improve the power-efficiency of arithmetic hardware. However, limitations of the expressiveness of RNS in terms of arithmetic operations together with overheads related to interaction with 2's complement arithmetic make programmable processor design that takes advantage of these benefits challenging.\u0000 In this paper we meet this challenge by multi-tier synergistic co-design of architecture, micro-architecture, hardware components, as well as compilation techniques. Our experiments not only demonstrate simultaneous improvement of up to 30% in performance and 57% reduction in functional unit power consumption, but also that most of these benefits can be exploited with automatically generated code.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133671783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

Fine-grain performance scaling of soft vector processors 软矢量处理器的细粒度性能缩放

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629411

Peter Yiannacouras, J. Steffan, Jonathan Rose

{"title":"Fine-grain performance scaling of soft vector processors","authors":"Peter Yiannacouras, J. Steffan, Jonathan Rose","doi":"10.1145/1629395.1629411","DOIUrl":"https://doi.org/10.1145/1629395.1629411","url":null,"abstract":"Embedded systems are often implemented on FPGA devices and 25% of the time include a soft processor--a processor built using the FPGA reprogrammable fabric. Because of their prevalence and flexibility, soft processors are compelling targets for customization--although current soft processors provide few architectural variations. Recent work has proposed augmenting soft processors with customizable vector processing support, enabling designers to easily scale performance by exploiting the data parallelism available in an application. However this approach provides only coarse-grain scaling, by successively doubling the number of vector datapaths for less than double the performance.\u0000 In this work we further augment soft vector processors with more fine-grain architectural modifications: we add support for (i) vector chaining and (ii) heterogeneous vector lanes, allowing the soft vector processor to be customized to not only the data-level parallelism available in an application, but to the functional unit demand. We evaluate the area and wall clock performance with full hardware implementations on state-of-the-art FPGAs and find that chaining can provide between 15-45% average performance for less area than doubling the lanes, and that heterogeneous lanes can save 6-13% area with little or no performance loss in some cases. Finally, we implement 1200 soft vector processors variants and find that the peak performance per area compared to our base vector processor can be increased by an average of 13% and up to 34% when choosing the best variant per application.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128593876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

OPAIMS: open architecture precision agriculture information monitoring system OPAIMS:开放式精准农业信息监控系统

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629428

Yuexuan Wang, Yongcai Wang, Xiao Qi, Liwen Xu

引用次数: 10

A buffer replacement algorithm exploiting multi-chip parallelism in solid state disks 利用固态磁盘中多芯片并行性的缓冲区替换算法

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629416

Jinho Seol, Hyotaek Shim, Jaegeuk Kim, S. Maeng

引用次数: 31

Fast enumeration of maximal valid subgraphs for custom-instruction identification 用于自定义指令识别的最大有效子图的快速枚举

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629402

Tao Li, Zhigang Sun, Jigang Wu, Xicheng Lu

引用次数: 31

Progressive spill code placement 渐进式溢油规则放置

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629408

D. Ebner, Bernhard Scholz, A. Krall

引用次数: 8

Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects 通过概率和近似设计在嵌入式计算中维持摩尔定律:回顾与展望

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629397

K. Palem, Lakshmi N. Chakrapani, Z. Kedem, L. Avinash, Kirthi Krishna Muntimadugu

{"title":"Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects","authors":"K. Palem, Lakshmi N. Chakrapani, Z. Kedem, L. Avinash, Kirthi Krishna Muntimadugu","doi":"10.1145/1629395.1629397","DOIUrl":"https://doi.org/10.1145/1629395.1629397","url":null,"abstract":"The central theme of our work is the probabilistic and approximate design of embedded computing systems. This novel approach consists of two distinguishing aspects: (i) the design and implementation of embedded systems, using components which are susceptible to perturbations from various sources and (ii) a design methodology which consists of an exploration of a design space which characterizes the trade-off between quality of output and cost, to implement high performance and low energy embedded systems. In contrast with other work, our design methodology does not attempt to correct the errors introduced by components which are susceptible to perturbations, instead we design \"good enough\" systems. Our work has the potential to address challenges and impediments to Moore's law arising from material properties and manufacturing difficulties, which dictate that we shift from the current-day deterministic design paradigm to statistical and probabilistic designs of the future. In this paper, we provide a broad overview of our work on probabilistic and approximate design, present novel results in approximate arithmetic and its impact on digital signal processing algorithms, and sketch future directions for research.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115642125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

Instruction cache locking inside a binary rewriter 二进制重写器内的指令缓存锁定

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629422

K. Anand, R. Barua

{"title":"Instruction cache locking inside a binary rewriter","authors":"K. Anand, R. Barua","doi":"10.1145/1629395.1629422","DOIUrl":"https://doi.org/10.1145/1629395.1629422","url":null,"abstract":"Cache memories in embedded systems play an important role in reducing the execution time of the applications. Various kinds of extensions have been added to cache hardware to enable software involvement in replacement decisions, thus improving the run-time over a purely hardware-managed cache. Novel embedded systems, like Intel's Xscale and ARM Cortex processors provide the facility of locking one or more lines in cache - this feature is called cache locking. This paper presents the first method in the literature for instruction-cache locking that is able to reduce the average-case run-time of the program. We devise a cost-benefit model to discover the memory addresses which should be locked in the cache. We implement our scheme inside a binary rewriter, thus widening the applicability of our scheme to binaries compiled using any compiler. Results obtained on a suite of MiBench and MediaBench benchmarks show up to 25% improvement in the instruction-cache miss rate on average and up to 13.5% improvement in the execution time on average for applications having instruction accesses as a bottleneck, depending on the cache configuration. The improvement in execution time is as high as 23.5% for some benchmarks.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129431230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Mapping stream programs onto heterogeneous multiprocessor systems 将流程序映射到异构多处理器系统

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629406

P. Carpenter, Alex Ramírez, E. Ayguadé

引用次数: 35

Complete nanowire crossbar framework optimized for the multi-spacer patterning technique 完整的纳米线横条框架优化为多间隔图像化技术

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI: 10.1145/1629395.1629398

M. B. Jamaa, G. Cerofolini, Y. Leblebici, G. Micheli

引用次数: 6