2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献

筛选
英文 中文
Noxim: An open, extensible and cycle-accurate network on chip simulator 一个开放的,可扩展的,周期精确的网络芯片模拟器
V. Catania, Andrea Mineo, Salvatore Monteleone, M. Palesi, Davide Patti
{"title":"Noxim: An open, extensible and cycle-accurate network on chip simulator","authors":"V. Catania, Andrea Mineo, Salvatore Monteleone, M. Palesi, Davide Patti","doi":"10.1109/ASAP.2015.7245728","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245728","url":null,"abstract":"Emerging on-chip communication technologies like wireless Networks-on-Chip (WiNoCs) have been proposed as candidate solutions for addressing the scalability limitations of conventional multi-hop NoC architectures. In a WiNoC, a subset of network nodes are equipped with a wireless interface which allows them long-range communication in a single hop. This paper presents Noxim, an open, configurable, extendible, cycle-accurate NoC simulator developed in SystemC which allows to analyze the performance and power figures of both conventional wired NoC and emerging WiNoC architectures.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"48 1","pages":"162-163"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82311856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 236
Programmable RNS lattice-based parallel cryptographic decryption 基于可编程RNS格的并行密码解密
P. Martins, L. Sousa, J. Eynard, J. Bajard
{"title":"Programmable RNS lattice-based parallel cryptographic decryption","authors":"P. Martins, L. Sousa, J. Eynard, J. Bajard","doi":"10.1109/ASAP.2015.7245723","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245723","url":null,"abstract":"Should quantum computing become viable, current public-key cryptographic schemes will no longer be valid. Since cryptosystems take many years to mature, research on post-quantum cryptography is now more important than ever. Herein, lattice-based cryptography is focused on, as an alternative post-quantum cryptosystem, to improve its efficiency. We put together several theoretical developments so as to produce an efficient implementation that solves the Closest Vector Problem (CVP) on Goldreich-Goldwasser-Halevi (GGH)-like cryptosystems based on the Residue Number System (RNS). We were able to produce speed-ups of up to 5.9 and 11.2 on the GTX 780 Ti and i7 4770K devices, respectively, when compared to a single-core optimized implementation. Finally, we show that the proposed implementation is a competitive alternative to the Rivest-Shamir-Adleman (RSA).","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"27 1","pages":"149-153"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90691957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Stochastic circuit design and performance evaluation of vector quantization 随机电路设计及矢量量化性能评价
Ran Wang, Jie Han, B. Cockburn, D. Elliott
{"title":"Stochastic circuit design and performance evaluation of vector quantization","authors":"Ran Wang, Jie Han, B. Cockburn, D. Elliott","doi":"10.1109/ASAP.2015.7245717","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245717","url":null,"abstract":"Vector quantization (VQ) is a general data compression technique that has a scalable implementation complexity and potentially a high compression ratio. In this paper, a novel implementation of VQ using stochastic circuits is proposed and its performance is evaluated. The stochastic and binary designs are compared for the same compression quality and the circuits are synthesized for an industrial 28-nm cell library. The effects of varying the sequence length of the stochastic design are studied with respect to the performance metric of throughput per area (TPA). When a shortened 512-bit encoding sequence is used to obtain a lower quality compression, the TPA is about 2.60 times that of the binary implementation with the same quality as that of the stochastic implementation measured by the L1 norm error (i.e., the first-order error). Thus, the stochastic implementation outperforms the conventional binary design in terms of TPA for a relatively low compression quality. By exploiting the progressive precision feature of a stochastic circuit, a readily scalable processing quality can be attained by simply halting the computation after different numbers of clock cycles.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"38 1","pages":"111-115"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74640030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Large-scale packet classification on FPGA 基于FPGA的大规模分组分类
Shijie Zhou, Yun Qu, V. Prasanna
{"title":"Large-scale packet classification on FPGA","authors":"Shijie Zhou, Yun Qu, V. Prasanna","doi":"10.1109/ASAP.2015.7245738","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245738","url":null,"abstract":"Packet classification is a key network function enabling a variety of network applications, such as network security, Quality of Service (QoS) routing, and other value-added services. Routers perform packet classification based on a predefined rule set. Packet classification faces two challenges: (1) the data rate of the network traffic keeps increasing, and (2) the size of the rule sets are becoming very large. In this paper, we propose an FPGA-based packet classification engine for large rule sets. We present a decomposition-based approach, where each field of the packet header is searched separately. Then we merge the partial search results from all the fields using a merging network. Experimental results show that our design can achieve a throughput of 147 Million Packets Per Second (MPPS), while supporting upto 256K rules on a state-of-the-art FPGA. Compared to the prior works on FPGA or multi-core processors, our design demonstrates significant performance improvements.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"37 1","pages":"226-233"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81127622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs LightSpMV:在支持cuda的gpu上更快的基于csr的稀疏矩阵向量乘法
Yongchao Liu, B. Schmidt
{"title":"LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs","authors":"Yongchao Liu, B. Schmidt","doi":"10.1109/ASAP.2015.7245713","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245713","url":null,"abstract":"Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors. In LightSpMV, two dynamic row distribution approaches have been investigated at the vector and warp levels with atomic operations and warp shuffle functions as the fundamental building blocks. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the state-of-the-art CUSP and cuSPARSE libraries. Performance evaluation reveals that on the same Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single and double precision, respectively. LightSpMV is available at http://lightspmv.sourceforge.net.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"12 1","pages":"82-89"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82739057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
An IEEE 754 double-precision floating-point multiplier for denormalized and normalized floating-point numbers 用于非规范化和规范化浮点数的ieee754双精度浮点乘法器
S. Thompson, J. Stine
{"title":"An IEEE 754 double-precision floating-point multiplier for denormalized and normalized floating-point numbers","authors":"S. Thompson, J. Stine","doi":"10.1109/ASAP.2015.7245706","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245706","url":null,"abstract":"This paper discusses an optimized double-precision floating-point multiplier that can handle both denormalized and normalized IEEE 754 floating-point numbers. Discussions of the optimizations are given and compared versus similar implementations, however, the main objective is keeping compliant for denormalized IEEE 754 floating-point numbers while still maintaining high performance operations for normalized numbers.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"254 1","pages":"62-63"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73194842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An efficient architecture solution for low-power real-time background subtraction 一种低功耗实时背景减法的高效架构解决方案
H. Tabkhi, Majid Sabbagh, G. Schirner
{"title":"An efficient architecture solution for low-power real-time background subtraction","authors":"H. Tabkhi, Majid Sabbagh, G. Schirner","doi":"10.1109/ASAP.2015.7245737","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245737","url":null,"abstract":"Embedded vision is a rapidly growing market with a host of challenging algorithms. Among vision algorithms, Mixture of Gaussian (MoG) background subtraction is a frequently used kernel involving massive computation and communication. Tremendous challenges need to be reolved to provide MoG's high computation and communication demands with minimal power consumption allowing its embedded deployment. This paper proposes a customized architecture for power-efficient realization of MoG background subtraction operating at Full-HD resolution. Our design process benefits from system-level design principles. An SLDL-captured specification (result of high-level explorations) serves as a specification for architecture realization and hand-crafted RTL design. To optimize the architecture, this paper employs a set of optimization techniques including parallelism extraction, algorithm tuning, operation width sizing and deep pipelining. The final MoG implementation consists of 77 pipeline stages operating at 148.5 MHz implemented on a Zynq-7000 SoC. Furthermore, our background subtraction solution is flexible allowing end users to adjust algorithm parameters according to scene complexity. Our results demonstrate a very high efficiency for both indoor and outdoor scenes with 145 mW on-chip power consumption and more than 600× speedup over software execution on ARM Cortex A9 core.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"13 1","pages":"218-225"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88200441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
How can Garbage Collection be energy efficient by dynamic offloading? 垃圾收集如何通过动态卸载实现节能?
Jie Tang, Chen Liu, J. Gaudiot
{"title":"How can Garbage Collection be energy efficient by dynamic offloading?","authors":"Jie Tang, Chen Liu, J. Gaudiot","doi":"10.1109/ASAP.2015.7245725","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245725","url":null,"abstract":"Garbage Collection (GC) is still a major issue in JVM for both mobile and cluster computing. GC offloading is proposed to improve the performance of GC by delivering part or all of the operations into another dedicated GC hardware. However, the traditional offloading just offloads directly not considering the phase change of GC behavior, which can be classified into two different groups: minor GC and major GC. The minor GC is fast and frequently invoked, while major GC is expensive in terms of time but seldom takes place. The direct offloading made GC workload frequently hopping between main processor and GC hardware, introduced a noticeable overhead and offset any possible benefits of workload loading. To solve this issue, we propose to offload GC dynamically by a careful selection of profitable and harmful GC operations. We also made a case study on Apache Spark, a lightning-fast cluster computing platform. It shows dynamic offloading can yield nearly 42.6% performance improvement with a concurrent 32.1% in energy cost reduction.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"11 1","pages":"156-157"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83548740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixed-signal implementation of differential decoding using binary message passing algorithms 用二进制消息传递算法实现差分解码的混合信号
G. Cowan, Kevin Cushon, W. Gross
{"title":"Mixed-signal implementation of differential decoding using binary message passing algorithms","authors":"G. Cowan, Kevin Cushon, W. Gross","doi":"10.1109/ASAP.2015.7245718","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245718","url":null,"abstract":"This paper presents the mixed-signal circuit implementation of reduced complexity algorithms for decoding low-density parity check (LDPC) codes. Based on modified differential decoding using binary message passing (MDD-BMP), binary addition using discrete-time digital circuits is replaced by continuous-time analog-current summation. Potential degradation due to the mismatch between current sources, P/N strength mismatch and inverter-threshold mismatch is considered in behavioural simulation and shown to be tolerable. Area estimates suggest a reduction from 0.27 mm2 to 0.11 mm2 for the FG(273, 191) code. Finally, transistor level simulation of the FG(273, 191) code using TSMC 65 nm technology shows an efficiency of 0.56 pJ/bit.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"49 1","pages":"116-119"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79777986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On-demand fault-tolerant loop processing on massively parallel processor arrays 大规模并行处理器阵列上的按需容错循环处理
Alexandru Tanase, Michael Witterauf, J. Teich, Frank Hannig, Vahid Lari
{"title":"On-demand fault-tolerant loop processing on massively parallel processor arrays","authors":"Alexandru Tanase, Michael Witterauf, J. Teich, Frank Hannig, Vahid Lari","doi":"10.1109/ASAP.2015.7245734","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245734","url":null,"abstract":"We present a compilation-based technique for providing on-demand structural redundancy for massively parallel processor arrays. Thereby, application programmers gain the capability to trade throughput for reliability according to application requirements. To protect parallel loop computations against errors, we propose to apply the well-known fault tolerance schemes dual modular redundancy (DMR) and triple modular redundancy (TMR) to a whole region of the processor array rather than individual processing elements. At the source code level, the compiler realizes these replication schemes with a program transformation that: (1) replicates a parallel loop program two or three times for DMR or TMR, respectively, and (2) introduces appropriate voting operations whose frequency and location may be chosen from three proposed variants. Which variant to choose depends, for example, on the error resilience needs of the application or the expected soft error rates. Finally, we explore the different tradeoffs of these variants in terms of performance overheads and error detection latency.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"70 1","pages":"194-201"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90563008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信