International Conference on Compilers, Architecture, and Synthesis for Embedded Systems最新文献

筛选
英文 中文
Exploiting residue number system for power-efficient digital signal processing in embedded processors 利用剩余数系统在嵌入式处理器中实现高效节能的数字信号处理
Rooju Chokshi, Krzysztof S. Berezowski, Aviral Shrivastava, S. Piestrak
{"title":"Exploiting residue number system for power-efficient digital signal processing in embedded processors","authors":"Rooju Chokshi, Krzysztof S. Berezowski, Aviral Shrivastava, S. Piestrak","doi":"10.1145/1629395.1629401","DOIUrl":"https://doi.org/10.1145/1629395.1629401","url":null,"abstract":"2's complement number system imposes a fundamental limitation on the power and performance of arithmetic circuits, due to the fundamental need of cross-datapath carry propagation. Residue Number System (RNS) breaks free of these bonds by decomposing a number into parts and performing arithmetic operations in parallel, significantly reducing the breadth of carry propagation. Consequently, RNS arithmetic has been proposed as a solution to improve the power-efficiency of arithmetic hardware. However, limitations of the expressiveness of RNS in terms of arithmetic operations together with overheads related to interaction with 2's complement arithmetic make programmable processor design that takes advantage of these benefits challenging.\u0000 In this paper we meet this challenge by multi-tier synergistic co-design of architecture, micro-architecture, hardware components, as well as compilation techniques. Our experiments not only demonstrate simultaneous improvement of up to 30% in performance and 57% reduction in functional unit power consumption, but also that most of these benefits can be exploited with automatically generated code.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133671783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Fine-grain performance scaling of soft vector processors 软矢量处理器的细粒度性能缩放
Peter Yiannacouras, J. Steffan, Jonathan Rose
{"title":"Fine-grain performance scaling of soft vector processors","authors":"Peter Yiannacouras, J. Steffan, Jonathan Rose","doi":"10.1145/1629395.1629411","DOIUrl":"https://doi.org/10.1145/1629395.1629411","url":null,"abstract":"Embedded systems are often implemented on FPGA devices and 25% of the time include a soft processor--a processor built using the FPGA reprogrammable fabric. Because of their prevalence and flexibility, soft processors are compelling targets for customization--although current soft processors provide few architectural variations. Recent work has proposed augmenting soft processors with customizable vector processing support, enabling designers to easily scale performance by exploiting the data parallelism available in an application. However this approach provides only coarse-grain scaling, by successively doubling the number of vector datapaths for less than double the performance.\u0000 In this work we further augment soft vector processors with more fine-grain architectural modifications: we add support for (i) vector chaining and (ii) heterogeneous vector lanes, allowing the soft vector processor to be customized to not only the data-level parallelism available in an application, but to the functional unit demand. We evaluate the area and wall clock performance with full hardware implementations on state-of-the-art FPGAs and find that chaining can provide between 15-45% average performance for less area than doubling the lanes, and that heterogeneous lanes can save 6-13% area with little or no performance loss in some cases. Finally, we implement 1200 soft vector processors variants and find that the peak performance per area compared to our base vector processor can be increased by an average of 13% and up to 34% when choosing the best variant per application.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128593876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
OPAIMS: open architecture precision agriculture information monitoring system OPAIMS:开放式精准农业信息监控系统
Yuexuan Wang, Yongcai Wang, Xiao Qi, Liwen Xu
{"title":"OPAIMS: open architecture precision agriculture information monitoring system","authors":"Yuexuan Wang, Yongcai Wang, Xiao Qi, Liwen Xu","doi":"10.1145/1629395.1629428","DOIUrl":"https://doi.org/10.1145/1629395.1629428","url":null,"abstract":"In order to realize precision agriculture information monitoringover long periods of time and over large areas of space, we propose OPAIMS, an open-architecture precision agriculture information monitoring system. OPAIMS consists of a two-tiered sensor network and an information service platform. The sensor network contains a large amount of energy-limited low tier nodes (LNs) to capture and report information of their designated vicinity, and some powerful GPRS gateways in the high tier to organize the LNs to form clusters and to report the aggregated information to the Internet. The information service platform logs information from the sensor network and provide value-created services to the users. In this paper, we focus on the design methodologies of OPAIMS, including the system architecture, the standard interfaces and the multi-hop joint scheduling of LNs. Such designs make OPAIMS not only scalable and longevous, but also universal for various kinds of sensors and hardware. Users can easily establish a precision agriculture monitoring system based on the proposed OPAIMS.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130804861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A buffer replacement algorithm exploiting multi-chip parallelism in solid state disks 利用固态磁盘中多芯片并行性的缓冲区替换算法
Jinho Seol, Hyotaek Shim, Jaegeuk Kim, S. Maeng
{"title":"A buffer replacement algorithm exploiting multi-chip parallelism in solid state disks","authors":"Jinho Seol, Hyotaek Shim, Jaegeuk Kim, S. Maeng","doi":"10.1145/1629395.1629416","DOIUrl":"https://doi.org/10.1145/1629395.1629416","url":null,"abstract":"Solid State Disks (SSDs) are superior to magnetic disks from a performance point of view due to the favorable features of NAND flash memory. Furthermore, thanks to improvement on flash memory density and adopting a multi-chip architecture, SSDs replace magnetic disks rapidly. Most previous studies have been conducted for enhancing the performance of SSDs, but these studies have been worked on the assumption that the operation unit of a host interface is the same as the operation unit of NAND flash memory, where it is needless to give consideration to partially-filled pages. In this paper, we analyze the overhead caused by the partially-filled pages, and propose a buffer replacement algorithm exploiting multi-chip parallelism to enhance the write performance. Our simulation results show that the proposed algorithm improves the write performance by up to 30% over existing approaches.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131031120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Fast enumeration of maximal valid subgraphs for custom-instruction identification 用于自定义指令识别的最大有效子图的快速枚举
Tao Li, Zhigang Sun, Jigang Wu, Xicheng Lu
{"title":"Fast enumeration of maximal valid subgraphs for custom-instruction identification","authors":"Tao Li, Zhigang Sun, Jigang Wu, Xicheng Lu","doi":"10.1145/1629395.1629402","DOIUrl":"https://doi.org/10.1145/1629395.1629402","url":null,"abstract":"Extensible processors are increasingly becoming popular as they allow for incorporating custom instructions to meet design constraints. However, identifying custom instructions under architectural input/output ports constraint is a time consuming process particularly when large applications are considered. To rapidly identify the most profitable custom instructions with large inputs and outputs, this paper proposes a novel identification algorithm for enumerating maximal convex subgraphs containing no invalid node (i.e., maximal valid subgraphs). The proposed enumerating strategy is based on divide-and-conquer with a top-down manner, rather than the bottom-up manner utilized in the state-of-the-art. The division operation only considers invalid inner nodes of the given DFG, rather than taking all the invalid nodes into account, and thus accelerates enumeration of the maximal valid subgraphs. Experimental results show that, the improvement over the latest work is more than 90% for 60% DFG instances of the acknowledged benchmarks.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121113226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Progressive spill code placement 渐进式溢油规则放置
D. Ebner, Bernhard Scholz, A. Krall
{"title":"Progressive spill code placement","authors":"D. Ebner, Bernhard Scholz, A. Krall","doi":"10.1145/1629395.1629408","DOIUrl":"https://doi.org/10.1145/1629395.1629408","url":null,"abstract":"Register allocation has gained renewed attention in the recent past. Several authors propose a separation of the problem into decoupled sub-tasks including spilling, allocation, assignment, and coalescing. This approach is largely motivated by recent advances in SSA-based register allocation that suggest that a decomposition does not significantly degrade the overall allocation quality.\u0000 The algorithmic challenges of intra-procedural spilling have been neglected so far and very crude heuristics were employed. In this work, (1) we introduce the constrained min-cut (CMC) problem for solving the spilling problem, (2) we provide an integer linear program formulation for computing an optimal solution of CMC, and (3) we devise a progressive Lagrangian solver that is viable for production compilers. Our experiments with Spec2k and MiBench show that optimal solutions are feasible, even for very large programs, and that heuristics leave significant potential behind for small register files.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"55 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126138381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects 通过概率和近似设计在嵌入式计算中维持摩尔定律:回顾与展望
K. Palem, Lakshmi N. Chakrapani, Z. Kedem, L. Avinash, Kirthi Krishna Muntimadugu
{"title":"Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects","authors":"K. Palem, Lakshmi N. Chakrapani, Z. Kedem, L. Avinash, Kirthi Krishna Muntimadugu","doi":"10.1145/1629395.1629397","DOIUrl":"https://doi.org/10.1145/1629395.1629397","url":null,"abstract":"The central theme of our work is the probabilistic and approximate design of embedded computing systems. This novel approach consists of two distinguishing aspects: (i) the design and implementation of embedded systems, using components which are susceptible to perturbations from various sources and (ii) a design methodology which consists of an exploration of a design space which characterizes the trade-off between quality of output and cost, to implement high performance and low energy embedded systems. In contrast with other work, our design methodology does not attempt to correct the errors introduced by components which are susceptible to perturbations, instead we design \"good enough\" systems. Our work has the potential to address challenges and impediments to Moore's law arising from material properties and manufacturing difficulties, which dictate that we shift from the current-day deterministic design paradigm to statistical and probabilistic designs of the future. In this paper, we provide a broad overview of our work on probabilistic and approximate design, present novel results in approximate arithmetic and its impact on digital signal processing algorithms, and sketch future directions for research.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115642125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Instruction cache locking inside a binary rewriter 二进制重写器内的指令缓存锁定
K. Anand, R. Barua
{"title":"Instruction cache locking inside a binary rewriter","authors":"K. Anand, R. Barua","doi":"10.1145/1629395.1629422","DOIUrl":"https://doi.org/10.1145/1629395.1629422","url":null,"abstract":"Cache memories in embedded systems play an important role in reducing the execution time of the applications. Various kinds of extensions have been added to cache hardware to enable software involvement in replacement decisions, thus improving the run-time over a purely hardware-managed cache. Novel embedded systems, like Intel's Xscale and ARM Cortex processors provide the facility of locking one or more lines in cache - this feature is called cache locking. This paper presents the first method in the literature for instruction-cache locking that is able to reduce the average-case run-time of the program. We devise a cost-benefit model to discover the memory addresses which should be locked in the cache. We implement our scheme inside a binary rewriter, thus widening the applicability of our scheme to binaries compiled using any compiler. Results obtained on a suite of MiBench and MediaBench benchmarks show up to 25% improvement in the instruction-cache miss rate on average and up to 13.5% improvement in the execution time on average for applications having instruction accesses as a bottleneck, depending on the cache configuration. The improvement in execution time is as high as 23.5% for some benchmarks.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129431230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Mapping stream programs onto heterogeneous multiprocessor systems 将流程序映射到异构多处理器系统
P. Carpenter, Alex Ramírez, E. Ayguadé
{"title":"Mapping stream programs onto heterogeneous multiprocessor systems","authors":"P. Carpenter, Alex Ramírez, E. Ayguadé","doi":"10.1145/1629395.1629406","DOIUrl":"https://doi.org/10.1145/1629395.1629406","url":null,"abstract":"This paper presents a partitioning and allocation algorithm for an iterative stream compiler, targeting heterogeneous multiprocessors with constrained distributed memory and any communications topology. We introduce a novel definition of connectedness that enables the algorithm to model the capabilities of the compiler. The algorithm uses convexity and connectedness constraints to produce partitions that are easier to compile and require short pipelines. Software pipelining is an effective transformation, but it increases memory footprint and latency, and has a startup overhead. Our algorithm takes account of these downstream costs. We show results for the StreamIt 2.1.1 benchmarks for an SMP, 2*2 mesh, SMP plus accelerator, and IBM QS20 blade, which has two Cell processors. Our results show that the average performance is within 5% of the unrestricted optimum found using a brute force search, while seldom requiring software pipelining. The heuristic is robust, and fast enough to be inside the feedback loop of an iterative compiler.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116107807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Complete nanowire crossbar framework optimized for the multi-spacer patterning technique 完整的纳米线横条框架优化为多间隔图像化技术
M. B. Jamaa, G. Cerofolini, Y. Leblebici, G. Micheli
{"title":"Complete nanowire crossbar framework optimized for the multi-spacer patterning technique","authors":"M. B. Jamaa, G. Cerofolini, Y. Leblebici, G. Micheli","doi":"10.1145/1629395.1629398","DOIUrl":"https://doi.org/10.1145/1629395.1629398","url":null,"abstract":"Nanowire crossbar circuits are an emerging architectural paradigm that promises a higher integration density and an improved fault-tolerance due to its reconfigurability. In this paper, we propose for the first time the utilization of the multi-spacer patterning technique to fabricate nanowire crossbars with a high cross-point density up to 1010 cm(-2). We propose a novel decoder fabrication method that can be included in a process dedicated to the multi-spacer patterning technique. We address the technology problems consisting in the variability and fabrication complexity at the design level by optimizing the encoding scheme. We show an overall reduction of the variability by 18% and a cancelation of the fabrication complexity overhead.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127559904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信