2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献_第2页

Model checking cloud rendering system for the QoS evaluation 模型检查云绘制系统的QoS评价

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995284

Haoyu Liu, Huahu Xu, Honghao Gao, Danqi Chu

引用次数: 0

Massive spatial query on the Kepler architecture 对开普勒架构的大量空间查询

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995267

Yili Gong, Jia Tang, Wenhai Li, Zihui Ye

引用次数: 0

CGRA-ME: A unified framework for CGRA modelling and exploration CGRA- me: CGRA建模与探索的统一框架

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995277

S. Chin, N. Sakamoto, A. Rui, Jim Zhao, Jin Hee Kim, Yuko Hara-Azumi, J. Anderson

引用次数: 80

A fast and accurate logarithm accelerator for scientific applications 一个快速和准确的对数加速器的科学应用

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995283

Jing Chen, Xue Liu

引用次数: 1

Design and comparative evaluation of GPGPU- and FPGA-based MPSoC ECU architectures for secure, dependable, and real-time automotive CPS 基于GPGPU和fpga的MPSoC ECU架构的设计和比较评估，用于安全、可靠和实时的汽车CPS

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995256

B. Poudel, N. Giri, Arslan Munir

引用次数: 16

High performance hardware architectures for Intra Block Copy and Palette Coding for HEVC screen content coding extension 高性能硬件架构内块复制和调色板编码HEVC屏幕内容编码扩展

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995274

Rishan Senanayake, Namitha Liyanage, Sasindu Wijeratne, Sachille Atapattu, Kasun Athukorala, P. Tharaka, G. Karunaratne, R. Senarath, Ishantha Perera, Ashen Ekanayake, A. Pasqual

{"title":"High performance hardware architectures for Intra Block Copy and Palette Coding for HEVC screen content coding extension","authors":"Rishan Senanayake, Namitha Liyanage, Sasindu Wijeratne, Sachille Atapattu, Kasun Athukorala, P. Tharaka, G. Karunaratne, R. Senarath, Ishantha Perera, Ashen Ekanayake, A. Pasqual","doi":"10.1109/ASAP.2017.7995274","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995274","url":null,"abstract":"Screen content coding (SCC) extension to High Efficiency Video Coding (HEVC) offers substantial compression efficiency over the existing HEVC standard for computer generated content. However, this gain in compression efficiency is achieved at the expense of further computational complexity with several resource hungry coding tools. Hence, extension of SCC to HEVC hardware encoders can be challenging. This paper presents resource efficient hardware designs for two key SCC tools, Intra Block Copy and Palette Coding. Moreover, a new hash search approach is proposed for Intra Block Copy, while a hardware friendly palette indices coding scheme is suggested for Palette Coding. These designs are targeted to achieve the throughput necessary for an 1080p 30 frames/s encoder, and incurs coding loss of 11.4% and 5.1% respectively in all intra configurations. The designs are synthesized for a Virtex-7 VC707 evaluation platform.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114730983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PFSI.sw: A programming framework for sea ice model algorithms based on Sunway many-core processor PFSI。基于神威多核处理器的海冰模型算法编程框架

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995268

Binyang Li, Bo Li, D. Qian

{"title":"PFSI.sw: A programming framework for sea ice model algorithms based on Sunway many-core processor","authors":"Binyang Li, Bo Li, D. Qian","doi":"10.1109/ASAP.2017.7995268","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995268","url":null,"abstract":"Sea ice model is a typical high performance computing problem. CPU and GPU based parallel method has been proposed to accelerate the simulation process, but it is still hard to meet the large-scale calculation demand due to the compute-intensive nature of the model. Sunway TaihuLight supercomputer use the SW26010 processor as its computing unit and achieves high performance for large-scale scientific computing. In this paper we present a programming framework (PFSI.sw) for sea ice model algorithms based on Sunway many-core processor. Based on this framework, programmer can exploit the parallelism of existing sea ice model algorithms and achieve good performance. Several strategies are introduced to this framework, data dividing, data transfer as well as the load balance are the main aspects we currently concerned. This framework has been implemented and tested with two sea ice model algorithms by using real world dataset on Sunway many-core processors. The experiment demonstrates comparable performance to the traditional parallel implementation on Sunway many-core processor and our framework improves the performance up to 40%.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116245906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Acceleration of Frequent Itemset Mining on FPGA using SDAccel and Vivado HLS 基于SDAccel和Vivado HLS的FPGA频繁项集挖掘加速

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995279

V. Dang, K. Skadron

{"title":"Acceleration of Frequent Itemset Mining on FPGA using SDAccel and Vivado HLS","authors":"V. Dang, K. Skadron","doi":"10.1109/ASAP.2017.7995279","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995279","url":null,"abstract":"Frequent itemset mining (FIM) is a widely-used data-mining technique for discovering sets of frequently-occurring items in large databases. However, FIM is highly time-consuming when datasets grow in size. FPGAs have shown great promise for accelerating computationally-intensive algorithms, but they are hard to use with traditional HDL-based design methods. The recent introduction of Xilinx SDAccel development environment for the C/C++/OpenCL languages allows developers to utilize FPGA's potential without long development periods and extensive hardware knowledge. This paper presents an optimized implementation of an FIM algorithm on FPGA using SDAccel and Vivado HLS. Performance and power consumption are measured with various datasets. When compared to state-of-the-art solutions, this implementation offers up to 3.2× speedup over a 6-core CPU, and has a better energy efficiency as compared with a GPU. Our preliminary results on the new XCKU115 FPGA are even more promising: they demonstrate a comparable performance with a state-of-the-art HDL FPGA implementation and better performance compared to the GPU.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126328164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Efficiency in ILP processing by using orthogonality 利用正交性处理ILP的效率

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995282

Marcel Brand, Frank Hannig, Alexandru Tanase, J. Teich

{"title":"Efficiency in ILP processing by using orthogonality","authors":"Marcel Brand, Frank Hannig, Alexandru Tanase, J. Teich","doi":"10.1109/ASAP.2017.7995282","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995282","url":null,"abstract":"For the next generations of Processor-Arrays-on-Chip (e. g., coarse-grained reconfigurable or programmable arrays)—including more than 100s to 1000s of processing elements—it is very important to keep the on-chip configuration/instruction memories as small as possible. Hence, compilers must take into account the scarceness of available instruction memory and create the code as compact as possible [1]. However, Very Long Instruction Word (VLIW) processors have the well-known problem that compilers typically produce lengthy codes. A lot of unnecessary code is produced due to unused Functional Units (FUs) or repeating operations for single FUs in instruction sequences. Techniques like software pipelining can be used to improve the utilization of the FUs, yet with the risk of code explosion [2] due to the overlapped scheduling of multiple loop iterations or other control flow statements. This is, where our proposed Orthogonal Instruction Processing (OIP) architecture (see Fig. 1) shows benefits in reducing the code size of compute-intensive loop programs. The idea is, contrary to lightweight VLIW processors used in arrays like Tightly Coupled Processor Arrays (TCPAs) [4], to equip each FU with its own instruction memory, branch unit, and program counter, but still let the FUs share the register files as well as input and output signals. This enables a processor to orthogonally execute a loop program. Each FU can execute its own sub-program while exchanging data over the register files. The branch unit and its instruction format have to be slightly changed by introducing a counter to each instruction that determines how often the instruction is repeated until the specified branch is executed. This enables repeating instructions without repeating them in the code. Those kind of processors have to be carefully programmed, e. g., to not run into data dependency problems while optimizing throughput. For solving this resource-constrained modulo scheduling problem, we use techniques based on mixed integer linear programming [5], [3].","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132944435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hardware design and analysis of efficient loop coarsening and border handling for image processing 图像处理中高效环粗化和边界处理的硬件设计与分析

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995273

M. A. Ozkan, Oliver Reiche, Frank Hannig, J. Teich

{"title":"Hardware design and analysis of efficient loop coarsening and border handling for image processing","authors":"M. A. Ozkan, Oliver Reiche, Frank Hannig, J. Teich","doi":"10.1109/ASAP.2017.7995273","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995273","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) excel at the implementation of local operators in terms of throughput per energy since the off-chip communication can be reduced with an application-specific on-chip memory configuration. Furthermore, data-level parallelism can efficiently be exploited through socalled loop coarsening, which processes multiple horizontal pixels simultaneously. Moreover, existing solutions for proper border handling in hardware show considerable resource overheads. In this paper, we first propose novel architectures for image border handling and loop coarsening, which can significantly reduce area. Second, we present a systematic analysis of these architectures including the formulation of analytical models for their area usage. Based on these models, we provide an algorithm for suggesting the most efficient hardware architecture for a given specification. Finally, we evaluate several implementations of our proposed architectures obtained through Vivado High-Level Synthesis (HLS). The synthesis results show that the proposed coarsening architecture uses 32% less registers for a 5-by-5 convolution with a 64 coarsening factor compared to previous works, whereas the proposed border handling architectures facilitate a decrease in the Look-up Table (LUT) usage by 36 %.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114182345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7