2017 IEEE High Performance Extreme Computing Conference (HPEC)最新文献

筛选
英文 中文
Computing structural controllability of linearly-coupled complex networks 线性耦合复杂网络结构可控性的计算
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091064
R. Rajaei, A. Ramezani, B. Shafai
{"title":"Computing structural controllability of linearly-coupled complex networks","authors":"R. Rajaei, A. Ramezani, B. Shafai","doi":"10.1109/HPEC.2017.8091064","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091064","url":null,"abstract":"Structural controllability, as a generic structure-based property in determining the ability of a complex network to reach the desired configuration, is addressed in this work. Using a robust measure derived from robust control theory, this paper deals with structural controllability of a type of weighted network of networks (NetoNets) involving linear couplings between its corresponding networks and clusters. Unlike the structural controllability degrees rooted in graph theory, this paper takes the advantage of uncertain systems to define the notion of structural controllability in a straightforward and less computationally complex way. Moreover, the spectrum of required energy is discussed. Eventually, the results for the proposed measure of structural controllability of scale-free networks are given to justify the proposed measure of an efficient and effective guarantee for fully controllability of the NetoNets in exposure to cluster and network-dependency connections. The proposed measure is an optimal solution according to structural energy-related control of the NetoNet where the upper bound of the required energy is illustrated an efficient measure for structural controllability of the class of NetoNet. Arbitrarily connectivity of low connected vertices to their higher connected counterparts in clusters results in effective controllability. In the same direction, as seminal works in structural controllability of complex networks to avoid the highly-connected nodes, the larger the cluster/network connectivity degree is, the less fully controllability of NetoNet is guaranteed.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115887573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lossy compression on IoT big data by exploiting spatiotemporal correlation 利用时空相关性对物联网大数据进行有损压缩
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091030
Aekyeung Moon, Jaeyoung Kim, Jialing Zhang, S. Son
{"title":"Lossy compression on IoT big data by exploiting spatiotemporal correlation","authors":"Aekyeung Moon, Jaeyoung Kim, Jialing Zhang, S. Son","doi":"10.1109/HPEC.2017.8091030","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091030","url":null,"abstract":"As the volume of data generated by various deployed IoT devices increases, storing and processing IoT big data becomes a huge challenge. While compression, especially lossy ones, can drastically reduce data volume, finding an optimal balance between the volume reduction and the information loss is not an easy task given that the data collected by diverse sensors exhibit different characteristics. Motivated by this, we present a feasibility analysis of lossy compression on agricultural sensor data by comparing fidelity of reconstructed data from various signal processing algorithms and temporal difference encoding. Specifically, we evaluated five real-world sensor data from weather stations as one of major IoT applications. Our experimental results indicate that Discrete Cosine Transform (DCT) and Fast Walsh-Hadamard Transform (FWHT) generate higher compression ratios than others. In terms of information loss, Lossy Delta Encoding (LDE) significantly outperforms others nonetheless. We also observe that, as compression factor is increased, error rates for all compression algorithms also increase. However, the impact of introduced error is much severe in DCT and FWHT while LDE was able to maintain a relatively lower error rate than other methods.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115938974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Static graph challenge on GPU GPU上的静态图形挑战
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091034
M. Bisson, M. Fatica
{"title":"Static graph challenge on GPU","authors":"M. Bisson, M. Fatica","doi":"10.1109/HPEC.2017.8091034","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091034","url":null,"abstract":"This paper presents the details of a CUDA implementation of the Subgraph Isomorphism Graph Challenge, a new effort aimed at driving progress in the graph analytics field. challenge consists of two graph analytics: triangle counting and k-truss. We present our CUDA implementation of the graph triangle counting operation and of the k-truss subgraph decomposition. Both implementations share the same codebase taking advantage of a set intersection operation implemented via bitmaps. The analytics are implemented in four kernels optimized for different types of graphs. At runtime, lightweight heuristics are used to select the kernel to run based on the specific graph taken as input.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128878235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Quickly finding a truss in a haystack 在干草堆里迅速找到一个桁架
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091038
Oded Green, James Fox, Euna Kim, F. Busato, N. Bombieri, Kartik Lakhotia, Shijie Zhou, Shreyas G. Singapura, Hanqing Zeng, R. Kannan, V. Prasanna, David A. Bader
{"title":"Quickly finding a truss in a haystack","authors":"Oded Green, James Fox, Euna Kim, F. Busato, N. Bombieri, Kartik Lakhotia, Shijie Zhou, Shreyas G. Singapura, Hanqing Zeng, R. Kannan, V. Prasanna, David A. Bader","doi":"10.1109/HPEC.2017.8091038","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091038","url":null,"abstract":"The k-truss of a graph is a subgraph such that each edge is tightly connected to the remaining elements in the k-truss. The k-truss of a graph can also represent an important community in the graph. Finding the k-truss of a graph can be done in a polynomial amount of time, in contrast finding other subgraphs such as cliques. While there are numerous formulations and algorithms for finding the maximal k-truss of a graph, many of these tend to be computationally expensive and do not scale well. Many algorithms are iterative and use static graph triangle counting in each iteration of the graph. In this work we present a novel algorithm for finding both the k-truss of the graph (for a given k), as well as the maximal k-truss using a dynamic graph formulation. Our algorithm has two main benefits. 1) Unlike many algorithms that rerun the static graph triangle counting after the removal of non-conforming edges, we use a new dynamic graph formulation that only requires updating the edges affected by the removal. As our updates are local, we only do a fraction of the work compared to the other algorithms. 2) Our algorithm is extremely scalable and is able to concurrently detect deleted triangles in contrast to past sequential approaches. While our algorithm is architecture independent, we show a CUDA based implementation for NVIDIA GPUs. In numerous instances, our new algorithm is anywhere from 100X-10000X faster than the Graph Challenge benchmark. Furthermore, our algorithm shows significant speedups, in some cases over 70X, over a recently developed sequential and highly optimized algorithm.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"77 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121915968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Efficient and accurate Word2Vec implementations in GPU and shared-memory multicore architectures 在GPU和共享内存多核架构中实现高效准确的Word2Vec
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091076
Trevor M. Simonton, G. Alaghband
{"title":"Efficient and accurate Word2Vec implementations in GPU and shared-memory multicore architectures","authors":"Trevor M. Simonton, G. Alaghband","doi":"10.1109/HPEC.2017.8091076","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091076","url":null,"abstract":"Word2Vec is a popular set of machine learning algorithms that use a neural network to generate dense vector representations of words. These vectors have proven to be useful in a variety of machine learning tasks. In this work, we propose new methods to increase the speed of the Word2Vec skip gram with hierarchical softmax architecture on multi-core shared memory CPU systems, and on modern NVIDIA GPUs with CUDA. We accomplish this on multi-core CPUs by batching training operations to increase thread locality and to reduce accesses to shared memory. We then propose new heterogeneous NVIDIA GPU CUDA implementations of both the skip gram hierarchical softmax and negative sampling techniques that utilize shared memory registers and in-warp shuffle operations for maximized performance. Our GPU skip gram with negative sampling approach produces a higher quality of word vectors than previous GPU implementations, and our flexible skip gram with hierarchical softmax implementation achieves a factor of 10 speedup of the existing methods.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132188992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
OpenCL for HPC with FPGAs: Case study in molecular electrostatics 用fpga实现高性能计算的OpenCL:分子静电学的案例研究
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091078
Chen Yang, Jiayi Sheng, Rushi Patel, A. Sanaullah, Vipin Sachdeva, M. Herbordt
{"title":"OpenCL for HPC with FPGAs: Case study in molecular electrostatics","authors":"Chen Yang, Jiayi Sheng, Rushi Patel, A. Sanaullah, Vipin Sachdeva, M. Herbordt","doi":"10.1109/HPEC.2017.8091078","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091078","url":null,"abstract":"FPGAs have emerged as a cost-effective accelerator alternative in clouds and clusters. Programmability remains a challenge, however, with OpenCL being generally recognized as a likely part of the solution. In this work we seek to advance the use of OpenCL for HPC on FPGAs in two ways. The first is by examining a core HPC application, Molecular Dynamics. The second is by examining a fundamental design pattern that we believe has not yet been described for OpenCL: passing data from a set of producer datapaths to a set of consumer datapaths, in particular, where the producers generate data non-uniformly. We evaluate several designs: single level versions in Verilog and in OpenCL, a two-level Verilog version with optimized arbiter, and several two-level OpenCL versions with different arbitration and hand-shaking mechanisms, including one with an embedded Verilog module. For the Verilog designs, we find that FPGAs retain their high-efficiency with a factor of 50 χ to 80 χ performance benefit over a single core. We also find that OpenCL may be competitive with HDLs for the straightline versions of the code, but that for designs with more complex arbitration and hand-shaking, relative performance is substantially diminished.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132319988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Evaluating critical bits in arithmetic operations due to timing violations 在算术运算中由于时间冲突而计算关键位
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091090
Sungseob Whang, Tymani Rachford, Dimitra Papagiannopoulou, T. Moreshet, R. I. Bahar
{"title":"Evaluating critical bits in arithmetic operations due to timing violations","authors":"Sungseob Whang, Tymani Rachford, Dimitra Papagiannopoulou, T. Moreshet, R. I. Bahar","doi":"10.1109/HPEC.2017.8091090","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091090","url":null,"abstract":"Various error models are being used in simulation of voltage-scaled arithmetic units to examine application-level tolerance of timing violations. The selection of an error model needs further consideration, as differences in error models drastically affect the performance of the application. Specifically, floating point arithmetic units (FPUs) have architectural characteristics that characterize its behavior. We examine the architecture of FPUs and design a new error model, which we call Critical Bit. We run selected benchmark applications with Critical Bit and other widely used error injection models to demonstrate the differences.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124625485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Hybrid flash arrays for HPC storage systems: An alternative to burst buffers 用于高性能计算存储系统的混合闪存阵列:突发缓冲区的替代方案
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091092
T. Petersen, John Bent
{"title":"Hybrid flash arrays for HPC storage systems: An alternative to burst buffers","authors":"T. Petersen, John Bent","doi":"10.1109/HPEC.2017.8091092","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091092","url":null,"abstract":"Cloud and high-performance computing storage systems are comprised of thousands of physical storage devices and uses software that organize them into multiple data tiers based on access frequency. The characteristics of these devices lend themselves well to these tiers as devices have differing ratios of performance to capacity. Due to this, these systems have, for the past several years, incorporated a mix of flash devices and mechanical spinning hard disk drives. Although a single media type will be ideal, the economic reality is that a hybrid system must use flash for performance and disk for capacity. Within the high-performance computing community, flash has been used to create a new tier called burst buffers which are typically software managed, user visible, wed to a particular file system, and buffer all IO traffic before subsequent migration to disk. In this paper, we propose an alternative architecture that is hardware managed, user transparent, file system agnostic, and that only buffers small IO while allowing large sequential IO to access the disks directly. Our evaluation of this alternative architecture finds that it achieves comparable results to the reported burst buffer numbers and improves on systems comprised solely of disks by several orders of magnitude for a fraction of the cost.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116026109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
GPU accelerated gigabit level BCH and LDPC concatenated coding system GPU加速千兆级BCH和LDPC连接编码系统
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091021
Selcuk Keskin, T. Koçak
{"title":"GPU accelerated gigabit level BCH and LDPC concatenated coding system","authors":"Selcuk Keskin, T. Koçak","doi":"10.1109/HPEC.2017.8091021","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091021","url":null,"abstract":"Increasing data traffic and multimedia services in recent years have paved the way for the development of optical transmission methods to be used in high bandwidth communications systems. In order to meet the very high throughput requirements, dedicated application specific integrated circuit and field programmable gate array solutions for low-density parity-check decoding are proposed in recent years. Conversely, software solutions are less expensive, scalable, and flexible and have shorter development cycle. A natural solution to lower the error floor is to concatenate the LDPC code with an algebraic outer code to clean up the residual errors. In this paper, we present the design and parallel software implementation of a major computation algorithm for LDPC decoding on general purpose graphics processing units as inner code and BCH decoding algorithm as outer code to achieve excellent error-correcting performance. The experimental results show that the proposed GPU-based concatenated decoder achieves the maximum decoding throughput of 1.82Gbps at 10 iterations with low bit-error rate (BER).","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129302962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
High-performance low-energy implementation of cryptographic algorithms on a programmable SoC for IoT devices 在物联网设备的可编程SoC上实现高性能低功耗加密算法
2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091062
Boyou Zhou, Manuel Egele, A. Joshi
{"title":"High-performance low-energy implementation of cryptographic algorithms on a programmable SoC for IoT devices","authors":"Boyou Zhou, Manuel Egele, A. Joshi","doi":"10.1109/HPEC.2017.8091062","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091062","url":null,"abstract":"Due to severe power and timing constraints of the \"things\" in the Internet of things (IoT), cryptography is expensive for these devices. Custom hardware provides a viable solution. However, implementations of cryptographic algorithms in the devices need to be upgraded frequently compared to the longevity of these \"things\". Therefore, there is a critical need for reconfigurable, low-power and high-performance cryptography implementations for IoT devices. In this paper, we propose to use an FPGA as the reconfigurable substrate for cryptographic operations. We demonstrate our proposed approach on a Zedboard, which has two ARM cores and a Zynq FPGA. The implemented cryptographic algorithms include symmetric cryptography, asymmetric cryptography, and secure hash functions. We also integrate our cryptographic engines with the OpenSSL library to inherit the library's support for block cipher modes. Our approach shows that the FPGA-based reconfigurable cryptographic components consume between 1.8× and 4033× less energy and run between 1.6× and 2983× faster than the software implementation. At the same time, the FPGA implementation of cryptographic operations is more flexible compared to custom hardware implementations of cryptographic components.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132554262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信