2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献_第3页

Power and performance trade-offs for Space Time Adaptive Processing 时空自适应处理的功率和性能权衡

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245703

Nitin Gawande, J. Manzano, Antonino Tumeo, Nathan R. Tallent, D. Kerbyson, A. Hoisie

引用次数: 4

A soft-core processor array for relational operators 关系运算符的软核处理器数组

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245699

R. Polig, Heiner Giefers, W. Stechele

{"title":"A soft-core processor array for relational operators","authors":"R. Polig, Heiner Giefers, W. Stechele","doi":"10.1109/ASAP.2015.7245699","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245699","url":null,"abstract":"Despite the performance and power efficiency gains achieved by FPGAs for text analytics queries, analysis shows a low utilization of the custom hardware operator modules. Furthermore the long synthesis times limit the accelerator's use in enterprise systems to static queries. To overcome these limitations we propose the use of an overlay architecture to share area resources among multiple operators and reduce compilation times. In this paper we present a novel soft-core architecture tailored to efficiently perform relational operations of text analytics queries on multiple virtual streams. It combines the ability to perform efficient streaming based operations while adding the flexibility of an instruction programmable core. It is used as a processing element in an array of cores to execute large query graphs and has access to shared co-processors to perform string-and context-based operations. We evaluate the core architecture in terms of area and performance compared to the custom hardware modules, and show how a minimum number of cores can be calculated to avoid stalling the document processing.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"10 1","pages":"17-24"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89859993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An FPGA implementation of a Restricted Boltzmann Machine classifier using stochastic bit streams 使用随机比特流的受限玻尔兹曼机分类器的FPGA实现

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245709

Bingzhe Li, M. Najafi, D. Lilja

引用次数: 27

Loop coarsening in C-based High-Level Synthesis c基高阶合成中的环粗化

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245730

Moritz Schmid, Oliver Reiche, Frank Hannig, J. Teich

{"title":"Loop coarsening in C-based High-Level Synthesis","authors":"Moritz Schmid, Oliver Reiche, Frank Hannig, J. Teich","doi":"10.1109/ASAP.2015.7245730","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245730","url":null,"abstract":"Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP), the support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines, consisting of point and local operators. In addition to well known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows to process multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework HIPAcc by loop coarsening and compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPUs), all generated from the exact same code base. Moreover, we demonstrate the advantages of code generation for algorithm development by outlining how design space exploration enabled by HIPAcc can yield a more efficient implementation than hand-coded VHDL.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"41 1","pages":"166-173"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85806849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Dynamic pipeline-partitioned video decoding on symmetric stream multiprocessors 对称流多处理器上的动态流水线分割视频解码

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245716

Ming-Ju Wu, Yan-Ting Chen, Chun-Jen Tsai

引用次数: 0

Automatic frame rate-based DVFS of game 自动基于帧率的游戏DVFS

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245726

Zhinan Cheng, Xi Li, Beilei Sun, Ce Gao, Jiachen Song

{"title":"Automatic frame rate-based DVFS of game","authors":"Zhinan Cheng, Xi Li, Beilei Sun, Ce Gao, Jiachen Song","doi":"10.1109/ASAP.2015.7245726","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245726","url":null,"abstract":"The rapid development of mobile games highlights the power consumption problem in the mobile platform. Most of the power saving techniques use the prediction-based dynamic voltage frequency scaling (DVFS) scheme. However, the prediction could be inaccurate resulting from the frequent interactions of user when playing games. We have observed that frame rate is near-linear to CPU frequency, but there is a bottleneck, frame rate will not increase as CPU frequency increases when CPU frequency reaches this threshold. Moreover, previous research has shown that utilizing the information of game state can reduce the influence of game interactive characterization to DVFS policy. We explore a method to automatically detect the game state. We propose the Automatic Frame Rate-Based DVFS policy, which can learn the threshold of frame rate online and utilize the information of game state and frame rate to scale the frequency without prediction. Our evaluation result shows that, compared with the prediction-based Android default Interactive DVFS policy, our policy saves more power in all the testing games. Up to 15.2% more power can be saved by Automatic Frame Rate-Based DVFS policy.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"393 1","pages":"158-159"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76430526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Efficient implementation of structured long block-length LDPC codes 高效实现结构化长块长度LDPC码

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245739

A. J. Wong, S. Hemati, W. Gross

引用次数: 3

Dual-rail active protection system against side-channel analysis in FPGAs fpga双轨主动保护系统侧道分析

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245707

W. He, Dirmanto Jap

{"title":"Dual-rail active protection system against side-channel analysis in FPGAs","authors":"W. He, Dirmanto Jap","doi":"10.1109/ASAP.2015.7245707","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245707","url":null,"abstract":"The security of the implemented cryptographic module in hardware has seen severe vulnerabilities against Side-Channel Attack (SCA), which is capable of retrieving hidden things by observing the pattern or quantity of unintentional information leakage. Dual-rail Precharge Logic (DPL) theoretically thwarts side-channel analyses by its low-level compensation manner, while the security reliability of DPLs can only be achieved at high resource expenses and degraded performance. In this paper, we present a dynamic protection system for selectively configuring the security-sensitive crypto modules to SCA-resistant dual-rail style in the scenario that the real-time threat is detected. The threat-response mechanism helps to dynamically balance the security and cost. The system is driven by a set of automated dual-rail conversion APIs for partially transforming the cryptographic module into its dual-rail format, particularly to a highly secure symmetric and interleaved placement. The elevated security grade from the safe to threat mode is validated by EM based mutual information analysis using fine-grained surface scan to a decapsulated Virtex-5 FPGA on SASEBO GII board.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"32 1","pages":"64-65"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90593447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Range reduction based on Pythagorean triples for trigonometric function evaluation 基于三角函数评估的毕达哥拉斯三元组的范围缩减

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245712

Hugues de Lassus Saint-Geniès, D. Defour, G. Revy

引用次数: 5

Automatic design of domain-specific instructions for low-power processors 低功耗处理器领域特定指令的自动设计

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI: 10.1109/ASAP.2015.7245697

Cecilia González-Alvarez, Jennifer B. Sartor, C. Álvarez, Daniel Jiménez-González, L. Eeckhout

{"title":"Automatic design of domain-specific instructions for low-power processors","authors":"Cecilia González-Alvarez, Jennifer B. Sartor, C. Álvarez, Daniel Jiménez-González, L. Eeckhout","doi":"10.1109/ASAP.2015.7245697","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245697","url":null,"abstract":"This paper explores hardware specialization of low-power processors to improve performance and energy efficiency. Our main contribution is an automated framework that analyzes instruction sequences of applications within a domain at the loop body level and identifies exactly and partially-matching sequences across applications that can become custom instructions. Our framework transforms sequences to a new code abstraction, a Merging Diagram, that improves similarity identification, clusters alike groups of potential custom instructions to effectively reduce the search space, and selects merged custom instructions to efficiently exploit the available customizable area. For a set of 11 media applications, our fast framework generates instructions that significantly improve the energy-delay product and speed-up, achieving more than double the savings as compared to a technique analyzing sequences within basic blocks. This paper shows that partially-matched custom instructions, which do not significantly increase design time, are crucial to achieving higher energy efficiency at limited hardware areas.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"146 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88091323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8