ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors最新文献

筛选
英文 中文
Dependability analysis of a countermeasure against fault attacks by means of laser shots onto a SRAM-based FPGA 针对基于sram的FPGA激光攻击的可靠性分析
G. Canivet, P. Maistri, R. Leveugle, F. Valette, J. Clédière, M. Renaudin
{"title":"Dependability analysis of a countermeasure against fault attacks by means of laser shots onto a SRAM-based FPGA","authors":"G. Canivet, P. Maistri, R. Leveugle, F. Valette, J. Clédière, M. Renaudin","doi":"10.1109/ASAP.2010.5540765","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540765","url":null,"abstract":"Laser-based fault injections are currently the most efficient technique that can be used to attack a secure system, since they have very high timing and location precision. Several papers have shown that a secret key may be recovered from ASICs and countermeasures have been proposed. But little research has been addressed at the specific case of secure protected implementations in SRAM-based FPGAs. This paper presents the results of laser-based fault injections on an architecture computing the AES encryption algorithm, protected by an error detection scheme, and implemented on a Virtex device. The results are compared to previous emulated fault injection campaigns and prove the criticality of remnant errors in the configuration of a FPGA used for secure applications. An improved countermeasure is also proposed and validated with a new experimental campaign.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116308005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Hardware-assisted middleware: Acceleration of garbage collection operations 硬件辅助中间件:加速垃圾收集操作
Jie Tang, Shaoshan Liu, Zhimin Gu, Xiao-Feng Li, J. Gaudiot
{"title":"Hardware-assisted middleware: Acceleration of garbage collection operations","authors":"Jie Tang, Shaoshan Liu, Zhimin Gu, Xiao-Feng Li, J. Gaudiot","doi":"10.1109/ASAP.2010.5541011","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5541011","url":null,"abstract":"Although the virtualization technology brings many benefits to cloud computing environments, as the virtual machines provide more features, the middleware layer has become bloated, introducing a high overhead. Our ultimate goal is to provide hardware-assisted solutions to improve the middleware performance in cloud computing environments. As a starting point, in this paper, we design, implement, and evaluate specialized hardware instructions to accelerate GC operations. We select GC because it is a common component in virtual machine designs and it incurs high performance and energy consumption overheads. We performed a profiling study on various GC algorithms to identify the GC performance hotspots, which contribute to more than 50% of the total GC execution time. By moving these hotspot functions into hardware, we managed to achieve an order of magnitude speedup.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132035040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Area optimized H.264 Intra prediction architecture for 1080p HD resolution 针对1080p高清分辨率的区域优化H.264 Intra预测架构
Jimit Shah, K. S. Raghunandan, Kuruvilla Varghese
{"title":"Area optimized H.264 Intra prediction architecture for 1080p HD resolution","authors":"Jimit Shah, K. S. Raghunandan, Kuruvilla Varghese","doi":"10.1109/ASAP.2010.5540989","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540989","url":null,"abstract":"High performance video standards use prediction techniques to achieve high picture quality at low bit rates. The type of prediction decides the bit rates and the image quality. Intra Prediction achieves high video quality with significant reduction in bit rate. This paper present an area optimized architecture for Intra prediction, for H.264 decoding at HDTV resolution with a target of achieving 60 fps. The architecture was validated on Virtex-5 FPGA based platform. The architecture achieves a frame rate of 64 fps. The architecture is based on multi-level memory hierarchy to reduce latency and ensure optimum resources utilization. It removes redundancy by reusing same functional blocks across different modes. The proposed architecture uses only 13% of the total LUTs available on the Xilinx FPGA XC5VLX50T.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114922349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high efficient memory architecture for H.264/AVC motion compensation 一种用于H.264/AVC运动补偿的高效存储器结构
Chunshu Li, Kai Huang, Xiaolang Yan, Jiong Feng, De Ma, Haitong Ge
{"title":"A high efficient memory architecture for H.264/AVC motion compensation","authors":"Chunshu Li, Kai Huang, Xiaolang Yan, Jiong Feng, De Ma, Haitong Ge","doi":"10.1109/ASAP.2010.5540963","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540963","url":null,"abstract":"In H.264/AVC decoding system, motion compensation operation occupies about 80% of the total memory access and becomes the system bottleneck. In this paper, a high efficient memory architecture for H.264/AVC motion compensation is proposed to extremely reduce external memory access bandwidth. A four-level hierarchical memory organization scheme is utilized to explore the reusability of neighboring blocks at an acceptable area cost. To improve the system processing throughput, five optimization techniques are adopted in motion compensation operation, which enable video decoder to achieve real-time decoding of HD 1080p video stream when operating at 110 MHz. Compared with the existing works, the proposed architecture is able to reduce the memory bandwidth requirement in motion compensation progress by 83.7% and performs better in the real-time application.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115031298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Combined scheduling and instruction selection for processors with reconfigurable cell fabric 具有可重构单元结构的处理器的组合调度和指令选择
Antoine Floch, C. Wolinski, K. Kuchcinski
{"title":"Combined scheduling and instruction selection for processors with reconfigurable cell fabric","authors":"Antoine Floch, C. Wolinski, K. Kuchcinski","doi":"10.1109/ASAP.2010.5540997","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540997","url":null,"abstract":"This paper presents a new method, based on constraint programming, for modeling and solving scheduling and instruction selection for processors extended with a functionally reconfigurable cell fabric. Our method models parallel reconfigurable architectures, the selection of application specific computational patterns and application scheduling. It takes also into account architectural constraints. The method provides efficient design space exploration that selects existing processor instructions and new instructions implementing computational patterns on a reconfigurable cell fabric. All instructions are scheduled enabling parallel instruction execution. Our method can be used directly for VLIW architectures by relaxing constraints concerning cell-processor data transfers. MediaBench and MiBench benchmarks have been used for evaluation and we obtained optimal results in many cases.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124797086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs Smith-Waterman算法在cuda兼容gpu上的高效映射
Keisuke Dohi, K. Benkrid, Cheng Ling, T. Hamada, Yuichiro Shibata
{"title":"Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs","authors":"Keisuke Dohi, K. Benkrid, Cheng Ling, T. Hamada, Yuichiro Shibata","doi":"10.1109/ASAP.2010.5540796","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540796","url":null,"abstract":"This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SW) algorithm on graphic processing units (GPUs) with NVIDIA corporation's Compute Unified Device Architecture (CUDA). Central to this is a divide and conquer approach which divides the computation of a whole pairwise sequence alignment matrix into multiple sub-matrices (or parallelograms) each running efficiently on the available hardware resources of the GPU in hand, with temporary intermediate data stored in global memory. Moreover, we use thread warps and padding techniques in order to decrease the cost of thread synchronization, as well as loop unrolling in order to reduce the cost of conditional branches. While intermediate data is stored in global memory for large queries, the most inner loop in our implementation will only access shared memory and registers. As a result of these optimizations, our implementation of the SW algorithm achieves a throughput ranging between 9.09 GCUPS (Giga Cell Update per Second) and 12.71 GCUPS on a single-GPU version, and a throughput between 29.46 GCUPS and 43.05 GCUPS on a quad-GPU platform. Compared with the best GPU implementation of the SW algorithm reported to date, our implementation achieves up to 46 % improvement in speed. The source code of our implementation is available in the public domain for Bioinformaticians to benefit from its performance.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125061474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Function flattening for lease-based, information-leak-free systems 功能扁平化为基于租赁,信息泄漏无系统
Xun Li, Mohit Tiwari, T. Sherwood, F. Chong
{"title":"Function flattening for lease-based, information-leak-free systems","authors":"Xun Li, Mohit Tiwari, T. Sherwood, F. Chong","doi":"10.1109/ASAP.2010.5540946","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540946","url":null,"abstract":"Recent research has proposed security-critical real-time embedded systems with provably-strong information containment through the use of hardware-enforced execution leases. Execution leases bound the time and address space used by functions to prevent information leakage between functions. Nested functions, however, require a relatively expensive hardware stack of execution leases. We introduce techniques to flatten nested functions and reduce overhead of the hardware stack. We note that while function flattening is impractical for conventional systems, avoiding information leakage results in constraints on program control that also make flattening possible in this setting. Through a combination of code hoisting and function splitting, we find that leases for nested functions can be substantially flattened in several practical examples. We note that some nested loop and function structures can lead to exponential growth in code size due to flattening, but that our techniques give system designers the ability to trade code size with hardware cost.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"124 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120918562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using shared library interposing for transparent application acceleration in systems with heterogeneous hardware accelerators 在异构硬件加速器系统中使用共享库插入实现透明的应用程序加速
Tobias Beisel, Manuel Niekamp, Christian Plessl
{"title":"Using shared library interposing for transparent application acceleration in systems with heterogeneous hardware accelerators","authors":"Tobias Beisel, Manuel Niekamp, Christian Plessl","doi":"10.1109/ASAP.2010.5540798","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540798","url":null,"abstract":"Todays computer systems increasingly comprise het-erogenous computing elements like multi-core processors, graphics processing units, and specialized co-processors, which allow parallel processing. Programming applications to utilize such systems is a complex process and needs good knowledge about the hardware architecture. Automatic and transparent use of these resources is a major concern of domain specific software developers and users. We present a new approach of using shared library interposing to replace libraries in binary applications with highly optimized accelerated versions. A plugin-based framework was developed, which allows interposing shared library calls, delegating them to accelerator specific libraries and adapting them to the library specific interface. Accelerator specific plugins can be added with a high degree of automatism. First steps were taken to develop a fast and intelligent selection component, choosing the best possible accelerator for a shared library call. It was shown, that such a framework may be efficiently used to apply shared library interposing to transparently speedup existing applications. The BLAS library for linear algebra was used as an example to develop plugins for an acceleratable library. Runtimes of BLAS functions were measured on different architectures and expose significant differences depending on the used implementation and hardware, showing the potentially high speedups of the approach.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116227752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An optimized NoC architecture for accelerating TSP kernels in breakpoint median problem 断点中值问题中加速TSP核的优化NoC架构
Turbo Majumder, Souradip Sarkar, P. Pande, A. Kalyanaraman
{"title":"An optimized NoC architecture for accelerating TSP kernels in breakpoint median problem","authors":"Turbo Majumder, Souradip Sarkar, P. Pande, A. Kalyanaraman","doi":"10.1109/ASAP.2010.5540797","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540797","url":null,"abstract":"Traveling Salesman Problem (TSP) is a classical NP-complete problem in graph theory. It aims at finding a least-cost Hamiltonian cycle that traverses all vertices of an input edge-weighted graph. One application of TSP is in breakpoint median-based Maximum Parsimony phylogenetic tree reconstruction, wherein a bounded edge-weight model is used. Exponential algorithms that apply efficient heuristics, such as branch-and-bound, to dynamically prune the search space are used. We adopted this approach in an NoC-based implementation for solving TSP targeted towards phylogenetics taking advantage of the fine-grained parallelism and efficient communication network. The largest fraction of the solution time for TSP is accounted for by a particular lower bound calculation operation that uses the graph's adjacency matrix. In this paper, we present the design and implementation of the processing elements with a highly optimized lower bound computation kernel and evaluate its performance. Additionally, we explore two major NoC architectures -mesh and quad-tree - and show that the latter is more suitable for this application domain.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133070629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Bayesian network-based framework with Constraint Satisfaction Problem (CSP) formulations for FPGA system design 基于贝叶斯网络的FPGA系统设计框架与约束满足问题(CSP)公式
A. Azman, A. Bigdeli, Yasir Mohd-Mustafah, M. Biglari-Abhari, B. Lovell
{"title":"A Bayesian network-based framework with Constraint Satisfaction Problem (CSP) formulations for FPGA system design","authors":"A. Azman, A. Bigdeli, Yasir Mohd-Mustafah, M. Biglari-Abhari, B. Lovell","doi":"10.1109/ASAP.2010.5540784","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540784","url":null,"abstract":"In recent years, there has been a growing interest in IP-reuse for SoCs in order to bridge the gap between the silicon capacity and the design productivity. This research work investigates how our proposed methodology can be used to partition and schedule a JPEG encoder IP core onto an FPGA. We will also describe a novel Constraint Satisfaction Problem (CSP) formulations that are used in the proposed framework. At the same time, we will also demonstrate the effectiveness of CSP in the Bayesian Network-based framework.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116575053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信