2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献

筛选
英文 中文
Implementing and Parallelizing Real-time Lane Detection on Heterogeneous Platforms 异构平台上实时车道检测的实现与并行化
Xiebing Wang, C. Kiwus, Canhao Wu, Biao Hu, Kai Huang, A. Knoll
{"title":"Implementing and Parallelizing Real-time Lane Detection on Heterogeneous Platforms","authors":"Xiebing Wang, C. Kiwus, Canhao Wu, Biao Hu, Kai Huang, A. Knoll","doi":"10.1109/ASAP.2018.8445110","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445110","url":null,"abstract":"Lane detection is a cardinal functionality in state-of-the-art Advanced Driver Assistant Systems (ADAS). However, it is still not straightforward to fulfill the real-time performance demand of processing High Definition (HD) images with high robustness and scalability. To address this problem, we propose an improved lane detection algorithm based on top-view image transformation and two-stage RANdom SAmple Consensus (RANSAC) model fitting. By virtue of off-line affine homography matrix adaption to bound an adaptive Region Of Interest (ROI) for subsequent on-line Warp Perspective Mapping (WPM) transformation, the algorithm can analyze arbitrary on-road videos and generate adaptive ROI without priori knowledge about camera parameter. To ensure the scalability, we present a comprehensive parallel design of the application in a heterogeneous system consisting of multi-core CPU, GPU and FPGA. We show in detail how the potentially parallel task loads are implemented and optimized so that they can be mapped to the most suitable processor so as to achieve optimal performance. Experimental results reveal that our improved algorithm can robustly process the video streams with a higher accuracy. Moreover, the heterogeneous executions are capable of processing HD $mathbf{1920}times mathbf{1080}$ images with runtime performance of 81.6 fps and 47.9 fps, respectively, on an AMD FirePro W7100 GPU and a Terasic Arria 10 FPGA.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Adaptively Banded Smith-Waterman Algorithm for Long Reads and Its Hardware Accelerator 长读自适应带状Smith-Waterman算法及其硬件加速
Yi-Lun Liao, Yu-Cheng Li, Nae-Chyun Chen, Yi-Chang Lu
{"title":"Adaptively Banded Smith-Waterman Algorithm for Long Reads and Its Hardware Accelerator","authors":"Yi-Lun Liao, Yu-Cheng Li, Nae-Chyun Chen, Yi-Chang Lu","doi":"10.1109/ASAP.2018.8445105","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445105","url":null,"abstract":"In this paper, we propose hardware-compatible Adaptively Banded Smith-Waterman algorithm (ABSW) to align long genomic sequences. By utilizing banded Smith-Waterman algorithm to align subsequences of fixed lengths, ABSW finds alignment of a pair of arbitrarily long sequences with constant memory. In addition, a heuristic algorithm, dynamic overlapping, is proposed to make overlaps of bands of subsequences to improve accuracy. To enable hardware acceleration of ABSW, we further propose the hardware architecture of banded Smith-Waterman with traceback. Experiments show that ABSW produces near optimal alignment scores for sequences with up to 40% error rates. Our hardware implementation of ABSW demonstrates more than $pmb{200}times$ sneedun over software imnlementation.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124797350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Fast Energy Estimation Through Partial Execution of HPC Applications 通过部分执行HPC应用的快速能量估计
Juan Carlos Salinas-Hilburg, Marina Zapater, Jose M. Moya, J. Ayala
{"title":"Fast Energy Estimation Through Partial Execution of HPC Applications","authors":"Juan Carlos Salinas-Hilburg, Marina Zapater, Jose M. Moya, J. Ayala","doi":"10.1109/ASAP.2018.8445089","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445089","url":null,"abstract":"In order to optimize the energy use of servers in Data Centers, techniques such as power capping or power budgeting are usually deployed. These techniques rely on the prediction of the power and execution time of applications. These data are obtained via dynamic profiling which requires a full execution of the application. This is not feasible in High Performance Computing (HPC) applications with long execution times. In this paper, we present a methodology to estimate the dynamic CPU and memory energy consumption of an application without executing it completely. Our methodology merges static code analysis information and dynamic profiling via the partial execution of the application. We do so by leveraging the concept of application signature, defined as a reduced version of the application in terms of execution time and power profile. We validate our methodology with a set of CPU -intensive, memory-intensive benchmarks and multi-threaded applications in a presently shipping enterprise server. Our energy estimation methodology shows an overall error below 8.0% when compared to the dynamic energy of the whole execution of the application. Also, our energy estimation methodology allows to estimate the energy of multi-threaded applications with an RMSE equal to 12.7% when compared to the dynamic energy from the complete parallel execution.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127045179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Unified Backend for Targeting FPGAs from DSLs 从dsl中定位fpga的统一后端
Emanuele Del Sozzo, Riyadh Baghdadi, Saman P. Amarasinghe, M. Santambrogio
{"title":"A Unified Backend for Targeting FPGAs from DSLs","authors":"Emanuele Del Sozzo, Riyadh Baghdadi, Saman P. Amarasinghe, M. Santambrogio","doi":"10.1109/ASAP.2018.8445108","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445108","url":null,"abstract":"The major flaw of Field Programmable Gate Arrays (FPGAs) is their hard programmability and steep learning curve. Even though High-Level Synthesis (HLS) tools may alleviate this task by providing directives to optimize the hardware design, as well as supporting languages like C/C++ and OpenCL, the development of efficient designs for FPGA is still a challenging and time-consuming task. In this context, Domain Specific Languages (DSLs) represent an emerging solution to generate efficient code to target FPGAs. However, the support for these languages towards FPGA is still limited, and only few DSLs provide FPGA backends. This paper describes FROST, a unified backend for targeting FPGAs from DSLs. FROST takes as input an algorithm described in one of the supported DSLs and generates an optimized design suitable for HLS tools. To this end, FROST exposes a high-level scheduling co-language to drive many aspects of the optimization process, like the resulting architecture, the level of parallelism, and so on. We evaluated FROST on a set of image processing kernels, developed in Halide and TIRAMISU, and compared the results against a hand-tuned FPGA library. The experimental results demonstrate that FROST designs are able to match the performance of such library (exploiting the same level of parallelism), and surpass it by a factor of 10X when combining FROST and the frontends scheduling commands.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121487959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Real-Time Learning-Based Super-Resolution System Using Direct Simple Functions 基于直接简单函数的实时学习超分辨率系统
Daolu Zha, Xi Jin, Rui Shang, Pengfei Yang
{"title":"A Real-Time Learning-Based Super-Resolution System Using Direct Simple Functions","authors":"Daolu Zha, Xi Jin, Rui Shang, Pengfei Yang","doi":"10.1109/ASAP.2018.8445121","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445121","url":null,"abstract":"This paper proposes a real-time super-resolution (SR) system. The proposed system performs a fast SR algorithm that generates a high-resolution image from a low-resolution image using direct regression functions. The system implemented on a Xilinx Virtex 7 field programmable gate array achieves output resolution of 3840 × 2160 (UHD) at 200 fps and 2000Mpixels/s throughput. Experimental results show that the proposed system provides high image quality for real-time applications.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128092115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-Time High-Quality Stereo Matching System on a GPU 基于GPU的实时高质量立体匹配系统
Qiong Chang, T. Maruyama
{"title":"Real-Time High-Quality Stereo Matching System on a GPU","authors":"Qiong Chang, T. Maruyama","doi":"10.1109/ASAP.2018.8445111","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445111","url":null,"abstract":"In this paper, we propose a low error rate and realtime stereo vision system on G PU. Many stereo vision systems on G PU have been proposed to date. In those systems, the error rates and the processing speed are in trade-off relationship. We propose a real-time stereo vision system on GPU for the high resolution images. This system also maintains a low error rate compared to other fast systems. In our approach, we have implemented the cost aggregation (CA), cross-checking and median filter on GPU in order to realize the real-time processing. Its processing speed is 40 fps for $1436times 992$ pixels images when the maximum disparity is 145, and its error rate is the lowest among the GPU systems which are faster than 30 fps.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Soft Dual-Processor System with a Partially Run-Time Reconfigurable Shared 128-Bit SIMD Engine 具有部分运行时可重构共享128位SIMD引擎的软双处理器系统
J. Ordaz, Dirk Koch
{"title":"A Soft Dual-Processor System with a Partially Run-Time Reconfigurable Shared 128-Bit SIMD Engine","authors":"J. Ordaz, Dirk Koch","doi":"10.1109/ASAP.2018.8445115","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445115","url":null,"abstract":"In this work, we present a soft dual-processor system that, as a distinctive feature, seamlessly integrates a partially run-time reconfigurable 128-bit SIMD engine. Importantly, the SIMD engine is tightly coupled to both scalar CPUs and it is shared amongst them with the purpose of drastically improving overall area utilization. We show that the proposed SIMD engine increases performance-per-area and that it can be used to substantially accelerate time consuming kernels for a set of media applications.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control 面向特定应用机器人控制的硬件加速强化学习
Shengjia Shao, Jason Tsai, Michal Mysior, W. Luk, T. Chau, Alexander Warren, B. Jeppesen
{"title":"Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control","authors":"Shengjia Shao, Jason Tsai, Michal Mysior, W. Luk, T. Chau, Alexander Warren, B. Jeppesen","doi":"10.1109/ASAP.2018.8445099","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445099","url":null,"abstract":"Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its longterm cumulative reward. This paper presents a novel approach which has showon promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124688687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Five-point algorithm: An efficient cloud-based FPGA implementation 五点算法:一个高效的基于云的FPGA实现
Marco Rabozzi, Emanuele Del Sozzo, Lorenzo Di Tucci, M. Santambrogio
{"title":"Five-point algorithm: An efficient cloud-based FPGA implementation","authors":"Marco Rabozzi, Emanuele Del Sozzo, Lorenzo Di Tucci, M. Santambrogio","doi":"10.1109/ASAP.2018.8445097","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445097","url":null,"abstract":"The 5-point relative pose problem is to identify the possible relative camera motions given five matching points from two calibrated views. Several algorithms for solving this problem have been presented in the literature providing different tradeoffs in terms of computational complexity and accuracy of the results. Indeed, the research in this field is driven mostly by the need for accurate solutions and high performance to cope with real-time requirements. In this work we propose an implementation to solve the 5-point relative pose problem accelerated on Field Programmable Gate Array (FPGA). The proposed architecture implements the classical Nister's algorithm as a deep pipeline deployed on a AWS F1 instance and outperforms software implementations by a factor ranging from 7.2X to 233X. Furthermore, it achieves a speedup of 64.2X compared to the Nister's software implementation with comparable accuracy.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122965026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Reading Comprehension Style Question Answering Model Based On Attention Mechanism 基于注意机制的阅读理解式问答模型
Linlong Xiao, Nanzhi Wang, Guocai Yang
{"title":"A Reading Comprehension Style Question Answering Model Based On Attention Mechanism","authors":"Linlong Xiao, Nanzhi Wang, Guocai Yang","doi":"10.1109/ASAP.2018.8445117","DOIUrl":"https://doi.org/10.1109/ASAP.2018.8445117","url":null,"abstract":"In recent years, research on reading-compr question and answering has drawn intense attention in Language Processing. However, it is still a key issue to the high-level semantic vector representation of quest paragraph. Drawing inspiration from DrQA [1], wh question and answering system proposed by Facebook, tl proposes an attention-based question and answering 11 adds the binary representation of the paragraph, the par; attention to the question, and the question's attentioi paragraph. Meanwhile, a self-attention calculation m proposed to enhance the question semantic vector reption. Besides, it uses a multi-layer bidirectional Lon: Term Memory(BiLSTM) networks to calculate the h semantic vector representations of paragraphs and q Finally, bilinear functions are used to calculate the pr of the answer's position in the paragraph. The expe results on the Stanford Question Answering Dataset(SQl development set show that the F1 score is 80.1% and tl 71.4%, which demonstrates that the performance of the is better than that of the model of DrQA, since they inc 2% and 1.3% respectively.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129865378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信