2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献

筛选
英文 中文
Towards secure cryptographic software implementation against side-channel power analysis attacks 针对侧信道功率分析攻击的安全加密软件实现
Pei Luo, Liwei Zhang, Yunsi Fei, A. Ding
{"title":"Towards secure cryptographic software implementation against side-channel power analysis attacks","authors":"Pei Luo, Liwei Zhang, Yunsi Fei, A. Ding","doi":"10.1109/ASAP.2015.7245722","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245722","url":null,"abstract":"Side-channel attacks have been a real threat against many embedded cryptographic systems. A commonly used algorithmic countermeasure, random masking, incurs large execution delay and resource overhead. The other countermeasure, operation shuffling or permutation, can mitigate side-channel leakage effectively with minimal overhead. In this paper, we target automatically implementing operation shuffling in cryptographic algorithms to resist against side-channel power analysis attacks. We design a tool to detect independence among statements at the source code level and devise an algorithm for automatic operation shuffling. We test our algorithm on the new SHA3 standard, Keccak. Results show that the tool effectively implements operation-shuffling to reduce the side-channel leakage significantly, and therefore can guide automatic secure cryptographic software implementations against differential power analysis attacks.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"83 1","pages":"144-148"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88221012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Custom FPGA-based soft-processors for sparse graph acceleration 自定义基于fpga的稀疏图形加速软处理器
Nachiket Kapre
{"title":"Custom FPGA-based soft-processors for sparse graph acceleration","authors":"Nachiket Kapre","doi":"10.1109/ASAP.2015.7245698","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245698","url":null,"abstract":"FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms. We develop a stripped-down soft processor ISA to implement specific repetitive operations on graph nodes and edges that are commonly observed in sparse graph computations. In the processing core, we provide hardware support for rapidly fetching and processing state of local graph nodes and edges through spatial address generators and zero-overhead loop iterators. We interconnect a 2D array of these lightweight processors with a packet-switched network-on-chip to enable fine-grained operand routing along the graph edges and provide custom send/receive instructions in the soft processor. We develop the processor RTL using Vivado High-Level Synthesis and also provide an assembler and compilation flow to configure the processor instruction and data memories. We outperform a Microblaze (100MHz on Zedboard) and an NIOS-II/f (100MHz on DE2-115) by 6× (single processor design) as well as the ARMv7 dual-core CPU on the Zynq SoCs by as much as 10× on the Xilinx ZC706 board (100 processor design) across a range of matrix datasets.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"20 1","pages":"9-16"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81603801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Hardware acceleration of Private Information Retrieval protocols using GPUs 基于gpu的私有信息检索协议硬件加速
Mihai Maruseac, Gabriel Ghinita, Ming Ouyang, R. Rughinis
{"title":"Hardware acceleration of Private Information Retrieval protocols using GPUs","authors":"Mihai Maruseac, Gabriel Ghinita, Ming Ouyang, R. Rughinis","doi":"10.1109/ASAP.2015.7245719","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245719","url":null,"abstract":"Private Information Retrieval (PIR) protocols allow users to search for data items stored at an untrusted server, without disclosing to the server the search attributes. Several computational PIR protocols provide cryptographic-strength guarantees for the privacy of users, building upon well-known hard mathematical problems, such as factorisation of large integers. Unfortunately, the computational-intensive nature of these solutions results in significant performance overhead, preventing their adoption in practice. In this paper, we employ graphical processing units (GPUs) to speed up the cryptographic operations required by PIR. We identify the challenges that arise when using GPUs for PIR and we propose solutions to address them. To the best of our knowledge, this is the first work to use GPUs for efficient private information retrieval, and an important first step towards GPU-based acceleration of a broader range of secure data operations. Our experimental evaluation shows that GPUs improve performance by more than an order of magnitude.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"4 1","pages":"120-127"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87527543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Atomic stream computation unit based on micro-thread level parallelism 基于微线程级并行的原子流计算单元
Nasim Farahini, A. Hemani
{"title":"Atomic stream computation unit based on micro-thread level parallelism","authors":"Nasim Farahini, A. Hemani","doi":"10.1109/ASAP.2015.7245700","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245700","url":null,"abstract":"The increasing demand for higher resolution of images and communication bandwidth requires the streaming applications to deal with ever increasing size of datasets. Further, with technology scaling the cost of moving data is reducing at a slower pace compared to the cost of computing. These trends have motivated the proposed micro-architectural reorganization of stream processors by dividing the stream computation into functional computation, address constraints computation and address generation and deploying independent, distributed micro-threads to implement them. This scheme is an alternative to parallelizing them at instruction level. The proposed scheme has two benefits: a more efficient sequencer logic and energy savings in address generation and transportation. These benefits are quantified for a set of streaming applications and show average percentage improvement of 39 in silicon efficiency of the sequencer logic and 23 in total computational efficiency.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"28 1","pages":"25-29"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88517005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Application-set driven exploration for custom processor architectures 应用程序集驱动的自定义处理器架构探索
M. A. Arslan, F. Gruian, K. Kuchcinski
{"title":"Application-set driven exploration for custom processor architectures","authors":"M. A. Arslan, F. Gruian, K. Kuchcinski","doi":"10.1109/ASAP.2015.7245710","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245710","url":null,"abstract":"Custom architectures are often adopted as more efficient alternatives to general purpose processors in terms of performance and power. However, the design of such architectures requires experts both in hardware and the application domain. In this paper we propose a method for speeding up the design space exploration. Our method, based on Pareto points, identifies sets of solutions in terms of scalar units and vector units of certain length, fulfilling the throughput constraints for each application in a given set. Architectures can then be selected by combining these solutions, as starting points for a more thorough, model-based evaluation.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"77 1","pages":"70-71"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74048299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Balance power leakage to fight against side-channel analysis at gate level in FPGAs 平衡功率泄漏对抗fpga门级旁道分析
Xin Fang, Pei Luo, Yunsi Fei, M. Leeser
{"title":"Balance power leakage to fight against side-channel analysis at gate level in FPGAs","authors":"Xin Fang, Pei Luo, Yunsi Fei, M. Leeser","doi":"10.1109/ASAP.2015.7245724","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245724","url":null,"abstract":"Side-channel attacks have been a serious threat to the security of embedded cryptographic systems, and various countermeasures have been devised to mitigate the leakages. Power balance technologies such as wave dynamic differential logic (WDDL) aim to balance the power by introducing differential logic. However, different routing length leads to different capacitance of wire, and this hampers the strength of the power balance countermeasure. In this paper, we further balance the power of differential signals by manipulating the lower level primitives and placement constraints on a Field Programmable Gate Array (FPGA). We choose Advanced Encryption Standard (AES) as the encryption algorithm and apply Hamming weight model to demonstrate the amount of leakage for different implementations. Results show that our method not only efficiently mitigates the side-channel leakage but also saves FPGA logic block resources and dynamic power consumption.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"126 1","pages":"154-155"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78295747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multi-task support for security-enabled embedded processors 支持多任务的安全嵌入式处理器
Tedy Thomas, Arman Pouraghily, Kekai Hu, R. Tessier, T. Wolf
{"title":"Multi-task support for security-enabled embedded processors","authors":"Tedy Thomas, Arman Pouraghily, Kekai Hu, R. Tessier, T. Wolf","doi":"10.1109/ASAP.2015.7245721","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245721","url":null,"abstract":"Embedded systems require low overhead security approaches to ensure that they are protected from attacks. In this paper, we propose a hardware-based approach to secure the operation of an embedded processor instruction-by-instruction, where deviations from expected program behavior are detected within the execution of an instruction. These security-enabled embedded processors provide effective defenses against common attacks, such as stack smashing. Previous work in this area has focused on monitoring a single task on a CPU while here we present a novel hardware monitoring system that can monitor multiple active tasks in an operating-system-based platform. The hardware monitor is able to track context switches that occur in the operating system and ensure that monitoring is performed continuously, thus ensuring system security. We present the design of our system and results obtained from a prototype implementation of the system on an Altera DE4 FPGA board. We demonstrate in hardware that applications can be monitored at the instruction level without execution slowdown and stack smashing attacks can be defeated using our system.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"1 1","pages":"136-143"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86000535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths 具有多个本机矢量宽度的VLIW体系结构的混合长度SIMD代码生成
Erkan Diken, M. O'Riordan, Roel Jordans, L. Józwiak, H. Corporaal, D. Moloney
{"title":"Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths","authors":"Erkan Diken, M. O'Riordan, Roel Jordans, L. Józwiak, H. Corporaal, D. Moloney","doi":"10.1109/ASAP.2015.7245732","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245732","url":null,"abstract":"The degree of DLP parallelism in applications is not fixed and varies due to different computational characteristics of applications. On the contrary, most of the processors today include single-width SIMD (vector) hardware to exploit DLP. However, single-width SIMD architectures may not be optimal to serve applications with varying DLP and they may cause performance and energy inefficiency. We propose the usage of VLIW processors with multiple native vector-widths to better serve applications with changing DLP. SHAVE is an example of such VLIW processor and provides hardware support for the native 32-bit and 128-bit wide vector operations. This paper researches and implements the mixed-length SIMD code generation support for SHAVE processor. More specifically, we target generating 32-bit and 128/64-bit SIMD code for the native 32-bit and 128-bit wide vector units of SHAVE processor. In this way, we improved the performance of compiler generated SIMD code by reducing the number of overhead operations and by increasing the SIMD hardware utilization. Experimental results demonstrated that our methodology implemented in the compiler improves the performance of synthetic benchmarks up to 47%.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"9 1","pages":"181-188"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88614205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Accelerating bootstrapping in FHEW using GPUs 使用gpu加速FHEW的引导
M. Lee, Yongje Lee, J. Cheon, Y. Paek
{"title":"Accelerating bootstrapping in FHEW using GPUs","authors":"M. Lee, Yongje Lee, J. Cheon, Y. Paek","doi":"10.1109/ASAP.2015.7245720","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245720","url":null,"abstract":"Recently, the usage of GPU is not limited to the jobs associated with graphics and a wide variety of applications take advantage of the flexibility of GPUs to accelerate the computing performance. Among them, one of the most emerging applications is the fully homomorphic encryption (FHE) scheme, which enables arbitrary computations on encrypted data. Despite much research effort, it cannot be considered as practical due to the enormous amount of computations, especially in the bootstrapping procedure. In this paper, we accelerate the performance of the recently suggested fast bootstrapping method in FHEW scheme using GPUs, as a case study of a FHE scheme. In order to optimize, we explored the reference code and carried out profiling to find out candidates for performance acceleration. Based on the profiling results, combined with more flexible tradeoff method, we optimized the bootstrapping algorithm in FHEW using GPU and CUDA's programming model. The empirical result shows that the bootstrapping of FHEW ciphertext can be done in less than 0.11 second after optimization.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"12 1","pages":"128-135"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90235270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Comparative analysis of OpenCL vs. HDL with image-processing kernels on Stratix-V FPGA 基于Stratix-V FPGA的OpenCL与HDL图像处理内核的对比分析
K. Hill, S. Craciun, A. George, H. Lam
{"title":"Comparative analysis of OpenCL vs. HDL with image-processing kernels on Stratix-V FPGA","authors":"K. Hill, S. Craciun, A. George, H. Lam","doi":"10.1109/ASAP.2015.7245733","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245733","url":null,"abstract":"Application development with hardware description languages (HDLs) such as VHDL or Verilog involves numerous productivity challenges, limiting the potential impact of reconfigurable computing (RC) with FPGAs in high-performance computing. Major challenges with HDL design include steep learning curves, large and complex codes, long compilation times, and lack of development standards across platforms. A relative newcomer to RC, the Open Computing Language (OpenCL) reduces productivity hurdles by providing a platform-independent, C-based programming language. In this study, we conduct a performance and productivity comparison between three image-processing kernels (Canny edge detector, Sobel filter, and SURF feature-extractor) developed using Altera's SDK for OpenCL and traditional VHDL. Our results show that VHDL designs achieved a more efficient use of resources (59% to 70% less logic), however, both OpenCL and VHDL designs resulted in similar timing constraints (255MHz <; fmax <; 325MHz). Furthermore, we observed a 6× increase in productivity when using OpenCL development tools, as well as the ability to efficiently port the same OpenCL designs without change to three different RC platforms, with similar performance in terms of frequency and resource utilization.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"1 1","pages":"189-193"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81013115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信