2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献

KV-FTL: A novel key-value based FTL scheme for large scale SSDs KV-FTL:一种新的基于键值的大型ssd的FTL方案

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-12-01 DOI: 10.1109/HPCC-SmartCity-DSS.2017.14

Juan Li, Zhengguo Chen, Zhiguang Chen, Nong Xiao, Fang Liu, Wei Chen

{"title":"KV-FTL: A novel key-value based FTL scheme for large scale SSDs","authors":"Juan Li, Zhengguo Chen, Zhiguang Chen, Nong Xiao, Fang Liu, Wei Chen","doi":"10.1109/HPCC-SmartCity-DSS.2017.14","DOIUrl":"https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.14","url":null,"abstract":"Both traditional coarse-grained and fine-grained Flash Translation Layer schemes are unsuitable for ultra-large SSDs. They produce overmuch mapping entries which fail to be kept in embedded DRAM completely and can suffer severely from low spatial and temporal localities. In this paper, we propose a novel KV-FTL for ultra-large SSDs, which mostly maps logical addresses to physical addresses via a simple hash function, while handles hash collisions and out-of-place data updates by the traditional manner, i.e., the mapping table. Our KV-FTL can accelerate address translation by avoiding loading mapping table from flash memory to DRAM, thus improve performance; as well as reduce the write-traffic incurred by the mapping table, thus extend the lifespan of SSDs. Experimental results show that our KV-FTL facilitates SSDs to survive longer lifespan by a factor of up to 18.7% with an average of 13.6%; improves read performance ranging from 18.4% to 50.7% with an average of 39% with optimization, and in the case of extremely intensive requests, improves the access performance for requests with an average of 47%.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133300231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

DoSGuard: Protecting pipelined MPSoCs against hardware Trojan based DoS attacks DoSGuard:保护流水线mpsoc免受基于硬件木马的DoS攻击

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-10 DOI: 10.1109/ASAP.2017.7995258

Amin Malekpour, R. Ragel, A. Ignjatović, S. Parameswaran

{"title":"DoSGuard: Protecting pipelined MPSoCs against hardware Trojan based DoS attacks","authors":"Amin Malekpour, R. Ragel, A. Ignjatović, S. Parameswaran","doi":"10.1109/ASAP.2017.7995258","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995258","url":null,"abstract":"Billions of transistors on a chip and the power wall made embedded systems to be designed with Multiprocessor System-on-Chip (MPSoC) architectures. One utilization of MPSoCs is the Pipelined MPSoCs (PMPSoCs). As many reliable and safety critical systems are deployed with MPSoCs, denying their service would have adverse effects. One such possibility is the insertion of a hardware Trojan that performs Denial of Service (DoS) attacks. DoSGuard present a novel PMPSoC architecture that continues its execution in the presence of DoS Trojans in Third Party Intellectual Property (3PIP) cores. DoSGuard deploys two methods; one can detect the presence of Trojans and recover, and the other can also identify the 3PIPs under attack using buffer delays. While the state of the art incurs 3× area and power overheads, DoSGuard consumes 1.5M+3 area and leakage power (M is the number of cores in the base system) and a small (the power consumption of the monitoring system) dynamic power overheads. On a cycle accurate commercial multiprocessor simulator, DoSGuard takes 531 clock cycles to detect a DoS attack. With DoSGuard the throughput reduction due to a DoS attack varies with the application and the monitoring interval but is negligible (< 10−3%) for real world scenarios, where millions of iterations take place.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129088744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Hierarchical Dataflow Model for efficient programming of clustered manycore processors 集群多核处理器高效编程的分层数据流模型

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-10 DOI: 10.1109/ASAP.2017.7995270

J. Hascoet, K. Desnos, J. Nezan, B. Dinechin

引用次数: 11

An efficient embedded multi-ported memory architecture for next-generation FPGAs 用于下一代fpga的高效嵌入式多端口存储器架构

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-10 DOI: 10.1109/ASAP.2017.7995263

S. N. Shahrouzi, D. Perera

{"title":"An efficient embedded multi-ported memory architecture for next-generation FPGAs","authors":"S. N. Shahrouzi, D. Perera","doi":"10.1109/ASAP.2017.7995263","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995263","url":null,"abstract":"In recent years, there has been a dramatic increase in utilization of FPGAs to enhance the speed-performance of many real-time compute and data intensive applications on embedded platforms. FPGA-based designs leverage parallelism in computations to achieve high speed-performance. Parallel computations require multi-ported memories to provide any number of ports for simultaneous multiple read/write (R/W) operations. Although several multi-ported memories are proposed in the literature, these designs become complex due to the extra logic and routing used for techniques/architectures to provide an arbitrary number of R/W ports. In this research work, we introduce a novel and efficient multi-ported memory architecture utilizing simple dual-port BRAMs, to provide an arbitrary number of R/W ports. Apart from the BRAMs, our proposed multi-ported memory design only consists of the Decision Making Modules and a counter, thus simplifying the design process. The R/W operations within our architecture are also straightforward. Experiments are performed to evaluate the feasibility and efficiency of our multi-ported memory architecture. We also evaluate our architecture with the most recently proposed multi-ported memory designs, implemented using LVT and XOR techniques, from the existing literature. FPGA manufacturers could employ our multi-ported memory architecture to accelerate real-time compute/data intensive applications with their next-generation FPGAs. Due to lower design complexity compared to the existing designs, our simplified memory architecture would enable seamless integration to the existing FPGA-based CAD tools with minimal design cost.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122389523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Hardware support for embedded operating system security 对嵌入式操作系统安全性的硬件支持

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-10 DOI: 10.1109/ASAP.2017.7995260

Arman Pouraghily, T. Wolf, R. Tessier

引用次数: 5

Fast and efficient implementation of Convolutional Neural Networks on FPGA 卷积神经网络在FPGA上的快速高效实现

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-10 DOI: 10.1109/ASAP.2017.7995253

Abhinav Podili, Chi Zhang, V. Prasanna

{"title":"Fast and efficient implementation of Convolutional Neural Networks on FPGA","authors":"Abhinav Podili, Chi Zhang, V. Prasanna","doi":"10.1109/ASAP.2017.7995253","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995253","url":null,"abstract":"State-of-the-art CNN models for Image recognition use deep networks with small filters instead of shallow networks with large filters, because the former requires fewer weights. In the light of above trend, we present a fast and efficient FPGA based convolution engine to accelerate CNN models over small filters. The convolution engine implements Winograd minimal filtering algorithm to reduce the number of multiplications by 38% to 55% for state-of-the-art CNNs. We exploit the parallelism of the Winograd convolution engine to scale the overall performance. We show that our overall design sustains the peak throughput of the convolution engines. We propose a novel data layout to reduce the required memory bandwidth of our design by half. One noteworthy feature of our Winograd convolution engine is that it hides the computation latency of the pooling layer. As a case study we implement VGG16 CNN model and compare it with previous approaches. Compared with the state-of-the-art reduced precision VGG16 implementation, our implementation achieves 1.2× improvement in throughput by using 3× less multipliers and 2× less on-chip memory without impacting the classification accuracy. The improvements in throughput per multiplier and throughput per unit on-chip memory are 3.7× and 2.47× respectively, compared with the state-of-the-art design.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114994991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

OpenCL-based design pattern for line rate packet processing 基于opencl的线速率数据包处理设计模式

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-10 DOI: 10.1109/ASAP.2017.7995278

Jehandad Khan, P. Athanas, S. Booth, John Marshall

引用次数: 1

High-performance FPGA implementation of equivariant adaptive separation via independence algorithm for Independent Component Analysis 基于独立分析算法的等变自适应分离的高性能FPGA实现

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-06 DOI: 10.1109/ASAP.2017.7995255

M. Nazemi, Shahin Nazarian, Massoud Pedram

{"title":"High-performance FPGA implementation of equivariant adaptive separation via independence algorithm for Independent Component Analysis","authors":"M. Nazemi, Shahin Nazarian, Massoud Pedram","doi":"10.1109/ASAP.2017.7995255","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995255","url":null,"abstract":"Independent Component Analysis (ICA) is a dimensionality reduction technique that can boost efficiency of machine learning models that deal with probability density functions, e.g. Bayesian neural networks. Algorithms that implement adaptive ICA converge slower than their nonadaptive counterparts, however, they are capable of tracking changes in underlying distributions of input features. This intrinsically slow convergence of adaptive methods combined with existing hardware implementations that operate at very low clock frequencies necessitate fundamental improvements in both algorithm and hardware design. This paper presents an algorithm that allows efficient hardware implementation of ICA. Compared to previous work, our FPGA implementation of adaptive ICA improves clock frequency by at least one order of magnitude and throughput by at least two orders of magnitude. Our proposed algorithm is not limited to ICA and can be used in various machine learning problems that use stochastic gradient descent optimization.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121924340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Design and implementation of adaptive signal processing systems using Markov decision processes 马尔可夫决策过程自适应信号处理系统的设计与实现

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995275

Lin Li, A. Sapio, Jiahao Wu, Yanzhou Liu, Kyunghun Lee, M. Wolf, S. Bhattacharyya

引用次数: 2

reMinMin: A novel static energy-centric list scheduling approach based on real measurements remmin:一种基于实际测量的新型静态以能量为中心的列表调度方法

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995272

Achim Lösch, M. Platzner

{"title":"reMinMin: A novel static energy-centric list scheduling approach based on real measurements","authors":"Achim Lösch, M. Platzner","doi":"10.1109/ASAP.2017.7995272","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995272","url":null,"abstract":"Heterogeneous compute nodes in form of CPUs with attached GPU and FPGA accelerators have strongly gained interested in the last years. Applications differ in their execution characteristics and can therefore benefit from such heterogeneous resources in terms of performance or energy consumption. While performance optimization has been the only goal for a long time, nowadays research is more and more focusing on techniques to minimize energy consumption due to rising electricity costs. This paper presents reMinMin, a novel static list scheduling approach for optimizing the total energy consumption for a set of tasks executed on a heterogeneous compute node. reMinMin bases on a new energy model that differentiates between static and dynamic energy components and covers effects of accelerator tasks on the host CPU. The required energy values are retrieved by measurements on the real computing system. In order to evaluate reMinMin, we compare it with two reference implementations on three task sets with different degrees of heterogeneity. In our experiments, MinMin is consistently better than a scheduler optimizing for dynamic energy only, which requires up to 19.43% more energy, and very close to optimal schedules.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127587051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5