2008 International Conference on Application-Specific Systems, Architectures and Processors最新文献

筛选
英文 中文
Reducing power consumption of embedded processors through register file partitioning and compiler support 通过寄存器文件分区和编译器支持降低嵌入式处理器的功耗
Xuan Guan, Yunsi Fei
{"title":"Reducing power consumption of embedded processors through register file partitioning and compiler support","authors":"Xuan Guan, Yunsi Fei","doi":"10.1109/ASAP.2008.4580190","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580190","url":null,"abstract":"As embedded processors being widely used in specific application domains, such as communications, multimedia, and networking, the register file has contributed a substantial budget in embedded processor energy consumption due to its long working time for the data intensive computations and the large switching capacitance. It is found that 25% of registers can account for 83% of register file accessing time during many embedded application execution. This fact motivates us to reduce the register file power consumption by partitioning the registers to different regions according to their usage pattern. The most frequently used registers are put in the hot part, and the cold part of register file is rarely accessed. We employ the register file bitline splitting and the drowsy register cell techniques in our design to reduce the overall accessing power of the register file. We propose a novel approach to partition the register file in a way so that the largest power saving can be achieved. We formulate the register file partitioning process into a graph partitioning problem, and apply an effective algorithm to obtain the optimal result. We evaluate our algorithm on MiBench applications, and an average saving of 43.6% in the register file access power consumption over the original non-partitioned register file is achieved for the SimpleScalar PISA system.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124730372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Reconfigurable Viterbi decoder on mesh connected multiprocessor architecture 基于网格连接多处理器架构的可重构维特比解码器
Ritesh Rajore, Ganesh Garga, H. Jamadagni, S. Nandy
{"title":"Reconfigurable Viterbi decoder on mesh connected multiprocessor architecture","authors":"Ritesh Rajore, Ganesh Garga, H. Jamadagni, S. Nandy","doi":"10.1109/ASAP.2008.4580153","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580153","url":null,"abstract":"In modern wireline and wireless communication systems, Viterbi decoder is one of the most compute intensive and essential elements. Each standard requires a different configuration of Viterbi decoder. Hence there is a need to design a flexible reconfigurable Viterbi decoder to support different configurations on a single platform. In this paper we present a reconfigurable Viterbi decoder which can be reconfigured for standards such as WCDMA, CDMA2000, IEEE 802.11, DAB, DVB, and GSM. Different parameters like code rate, constraint length, polynomials and truncation length can be configured to map any of the above mentioned standards. Our design provides higher throughput and scalable power consumption in various configuration of the reconfigurable Viterbi decoder. The power and throughput can also be optimized for different standards.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132612027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An MPSoC architecture for the Multiple Target Tracking application in driver assistant system 多目标跟踪在驾驶员辅助系统中的应用
Jehangir Khan, S. Niar, A. Rivenq, Y. Elhillali, J. Dekeyser
{"title":"An MPSoC architecture for the Multiple Target Tracking application in driver assistant system","authors":"Jehangir Khan, S. Niar, A. Rivenq, Y. Elhillali, J. Dekeyser","doi":"10.1109/ASAP.2008.4580166","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580166","url":null,"abstract":"This article discusses the design of an application specific MPSoC architecture dedicated to multiple target tracking (MTT). This application has its utility in driver assistant systems, more precisely in collision avoidance and warning systems. An automotive-radar is used as the front end sensor in our application. The article examines the tradeoffs that must be taken into consideration in the realization of the entire MTT application in an embedded system. In our implementation of MTT, several independent parallel tasks have been identified and mapped onto a multiprocessor architecture to ensure the deadlines imposed by the application. Our study demonstrates that the joint utilization of reconfigurable circuits (namely FPGA) and MPSoC, facilitates the development of a flexible and efficient MTT system.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125684347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Throughput-scalable hybrid-pipeline architecture for multilevel lifting 2-D DWT of JPEG 2000 coder JPEG 2000编码器多层提升二维DWT的吞吐量可扩展混合管道架构
B. K. Mohanty, P. Meher
{"title":"Throughput-scalable hybrid-pipeline architecture for multilevel lifting 2-D DWT of JPEG 2000 coder","authors":"B. K. Mohanty, P. Meher","doi":"10.1109/ASAP.2008.4580196","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580196","url":null,"abstract":"In this paper, we propose a pipelined-architecture for high-throughput computation of multilevel lifting 2D discrete wavelet transform (DWT). The multilevel DWT computation is shared by the proposed devices based on pyramid algorithm (PA) and recursive pyramid algorithm (RPA), where the PA-based devices compute the lower order subands and the higher order subbands are computed by an RPA-based device. The hardware- and time-complexities of the proposed structure are compared with those of the existing recursive architectures for performance evaluation. Compared with the best of the existing recursive architectures, the proposed one has nearly 16 times less average computation time (ACT) for the 2D DWT of input size 512 x 512 for S=32, where S is half of the input rate of the structure. Moreover, it involves less number of multipliers and adders than the others when normalized for unit throughput rate. The proposed design offers nearly 100% utilization efficiency for S=32, and 94% efficiency for S=8. The latency of the structure is very small (which is of the order of a few cycles), and involves a small on-chip storage and less number of data/pipeline registers.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Managing multi-core soft-error reliability through utility-driven cross domain optimization 通过实用程序驱动的跨域优化管理多核软错误可靠性
Wangyuan Zhang, Tao Li
{"title":"Managing multi-core soft-error reliability through utility-driven cross domain optimization","authors":"Wangyuan Zhang, Tao Li","doi":"10.1109/ASAP.2008.4580167","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580167","url":null,"abstract":"As semiconductor processing technology continues to scale down, managing reliability becomes an increasingly difficult challenge in high-performance microprocessor design. Transient faults, also known as soft errors, corrupt program data at the circuit level and cause incorrect program execution and system crashes. Future processors will consist of billions of transistors organized as multicore microarchitectures. Packaging multiple cores (and hence more transistors) onto the same die exposes more devices to soft error strikes. This paper explores utility-function-driven (benefit driven) cross domain optimization for both performance and reliability. We propose the use of utility-based resource management for individual cores while applying utility-based shared cache partitioning across multiple cores. Moreover, we coordinate the optimization of multiple resources based on their cross domain utility information to achieve attractive performance and reliability tradeoffs. Extensive experimental results show that, on average, our utility-driven cross domain optimization reduces the soft error rate of the most vulnerable core in a chip multiprocessor (CMP) by up to 35% and improves the CMPpsilas overall reliability by 22% with less than 3% performance degradation across 15 investigated workloads.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122694701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Efficient systolization of cyclic convolution for systolic implementation of sinusoidal transforms
P. Meher
{"title":"Efficient systolization of cyclic convolution for systolic implementation of sinusoidal transforms","authors":"P. Meher","doi":"10.1109/ASAP.2008.4580161","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580161","url":null,"abstract":"This paper presents an algorithm to convert composite-length cyclic convolution into a block cyclic convolution sum of small matrix-vector products, even if the co-factors of convolution-length are not mutually prime. It is shown that by using optimal short-length convolution algorithms, the block-convolution could be computed from a few short-length cyclic and cyclic-like convolutions, when one of the co-factors belongs to {2, 3, 4, 6, 8}. A generalized systolic array is derived for cyclic-like convolution, and used that for the computation of long-length convolutions. The proposed structure for convolution-length N= 2L involves nearly the same hardware and half the time-complexity as the direct implementation; and the structure for N= 4L involves sime12.5% more hardware and one-fourth the time-complexity of the latter. The structures for N=2L and N=4L, respectively, have the same and sime12.5% less area-time complexity as the corresponding existing prime-factor systolic structures, but unlike the latter type, do not involve complex input/output mapping; and could be used even if the co-factors of convolution-length are not relatively prime.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126085009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FPGA based singular value decomposition for image processing applications 基于FPGA的奇异值分解图像处理应用
Masih Rahmaty, Mohammad S. Sadri, Mehdi Ataei Naeini
{"title":"FPGA based singular value decomposition for image processing applications","authors":"Masih Rahmaty, Mohammad S. Sadri, Mehdi Ataei Naeini","doi":"10.1109/ASAP.2008.4580176","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580176","url":null,"abstract":"During last decades, singular value decomposition has been widely used in different fields of engineering and science. This makes SVD calculation algorithms and its feasible implementations, an attractive area of research. FPGA implementation of SVD is addressed in some past publications, however, appearance of new primary elements such as dedicated hardware multipliers, block memories and CPU cores inside new FPGA products, such as Xilinx Virtex-4, made it possible to use them in more complicated computation tasks.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125226071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A multi-FPGA application-specific architecture for accelerating a floating point Fourier Integral Operator 用于加速浮点傅里叶积分算子的多fpga应用特定架构
Jason Lee, Lesley Shannon, M. Yedlin, G. Margrave
{"title":"A multi-FPGA application-specific architecture for accelerating a floating point Fourier Integral Operator","authors":"Jason Lee, Lesley Shannon, M. Yedlin, G. Margrave","doi":"10.1109/ASAP.2008.4580178","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580178","url":null,"abstract":"Many complex systems require the use of floating point arithmetic that is exceedingly time consuming to perform on personal computers. However, floating point operators are also hardware resource intensive and require longer latencies than fixed point operators to complete. Due to the reduced logic density of FPGAs relative to ASICs, it is often only possible to accelerate a portion of a floating point application in hardware. This paper presents an application-specific architecture for the hardware acceleration of a complete Fourier Integral Operator (FIO) kernel used in seismic imaging on a multi-FPGA platform. The design utilizes several floating point computing elements (CEs) to calculate the FIO kernel in parallel stages on multiple FPGAs. A detailed study of floating point CEs, including a Fast Fourier Transform (FFT) CE, and a complete FIO prototype implementation on the BEE2 platform is described. The prototype implementation has a 12.4x increase in throughput over an optimized software implementation, and a predicted 15.8x increase in throughput on the BEE3 platform.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126315274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fast custom instruction identification by convex subgraph enumeration 通过凸子图枚举快速自定义指令识别
K. Atasu, O. Mencer, W. Luk, C. Özturan, Günhan Dündar
{"title":"Fast custom instruction identification by convex subgraph enumeration","authors":"K. Atasu, O. Mencer, W. Luk, C. Özturan, Günhan Dündar","doi":"10.1109/ASAP.2008.4580145","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580145","url":null,"abstract":"Automatic generation of custom instruction processors from high-level application descriptions enables fast design space exploration, while offering very favorable performance and silicon area combinations. This work introduces a novel method for adapting the instruction set to match an application captured in a high-level language. A simplified model is used to find the optimal instructions via enumeration of maximal convex subgraphs of application data flow graphs (DFGs). Our experiments involving a set of multimedia and cryptography benchmarks show that an order of magnitude performance improvement can be achieved using only a limited amount of hardware resources. In most cases, our algorithm takes less than a second to execute.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123134417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Security processor with quantum key distribution 量子密钥分发安全处理器
T. Lorünser, E. Querasser, T. Matyus, M. Peev, J. Wolkerstorfer, M. Hutter, Alexander Szekely, I. Wimberger, Christian Pfaffel-Janser, A. Neppach
{"title":"Security processor with quantum key distribution","authors":"T. Lorünser, E. Querasser, T. Matyus, M. Peev, J. Wolkerstorfer, M. Hutter, Alexander Szekely, I. Wimberger, Christian Pfaffel-Janser, A. Neppach","doi":"10.1109/ASAP.2008.4580151","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580151","url":null,"abstract":"We present a fully operable security gateway prototype, integrating quantum key distribution and realised as a system-on-chip. It is implemented on a field-programmable gate array and provides a virtual private network with low latency and gigabit throughput. The seamless hard- and software integration of a quantum key distribution layer enables high key-update rates for the encryption modules. Hence, the amount of data encrypted with one session key can be significantly decreased. We realise a highly modular architecture and make extensive use of software/hardware partitioning. This work is the first approach towards application of a new key distribution technology in dedicated security processors. In particular, it elaborates requirements for the integration of quantum key distribution on a chip level.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130462425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信