2019 IEEE High Performance Extreme Computing Conference (HPEC)最新文献

筛选
英文 中文
Embedded Processor-In-Memory Architecture for Accelerating Arithmetic Operations 加速算术运算的嵌入式内存处理器体系结构
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916496
Richard Muri, P. Fortier
{"title":"Embedded Processor-In-Memory Architecture for Accelerating Arithmetic Operations","authors":"Richard Muri, P. Fortier","doi":"10.1109/HPEC.2019.8916496","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916496","url":null,"abstract":"A processor-in-memory (PIM) computer architecture is any design that performs some subset of logical operations in the same location as memory. The traditional model of computing involves a processor loading data from memory to perform operations, with a bus connecting the processor and memory. While this technique works well in many situations, a growing gap between memory performance and processor performance has led some researchers to develop alternative architectures.This paper details the implementation of a PIM architecture in a soft core microcontroller used to accelerate applications limited by register file size. Using an Artix-7 FPGA, an ATmega103 microcontroller soft core is modified to include a PIM core as an accelerator. The sample application of AES encryption provides a comparison between the baseline processor and the PIM enhanced machine. AES encryption using the modified microcontroller requires 38% fewer clock cycles without relying on application specific improvements, at the expense of increased program memory size and FPGA fabric utilization.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130637086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improving Scheduling for Irregular Applications with Logarithmic Radix Binning 用对数基数分形改进不规则应用的调度
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916333
James Fox, Alok Tripathy, Oded Green
{"title":"Improving Scheduling for Irregular Applications with Logarithmic Radix Binning","authors":"James Fox, Alok Tripathy, Oded Green","doi":"10.1109/HPEC.2019.8916333","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916333","url":null,"abstract":"Effective scheduling and load balancing of applications on massively multi-threading systems remains challenging despite decades of research, especially for irregular and data dependent problems where the execution control path is unknown until run-time. One of the most widely used load-balancing schemes used for data dependent problems is a parallel prefix sum (PPS) array over the expected amount of work per task, followed by a partitioning of tasks to threads. While sufficient for many systems, it is not ideal for massively multithreaded systems with SIMD/SIMT execution, such as GPUs. More fine-grained load-balancing is needed to effectively utilize SIMD/SIMT units. In this paper we introduce Logarithmic Radix Binning (LRB) as a more suitable alternative to parallel prefix summation for load-balancing on such systems. We show that LRB has better scalability than PPS for high thread counts on Intel’s Knight’s Landing processor and comparable scalability on NVIDIA Volta GPUs. On the application side, we show how LRB improves the performance of PageRank up to 1.75X using the branch-avoiding model. We also show how to better load-balance segmented sort and improve performance on the GPU.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126816068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Cyber Baselining: Statistical properties of cyber time series and the search for stability 网络基线:网络时间序列的统计特性和对稳定性的追求
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916350
A. Schulz, Ethan Aubin, P. Trepagnier, A. Wollaber
{"title":"Cyber Baselining: Statistical properties of cyber time series and the search for stability","authors":"A. Schulz, Ethan Aubin, P. Trepagnier, A. Wollaber","doi":"10.1109/HPEC.2019.8916350","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916350","url":null,"abstract":"Many predictive cyber analytics assume, implicitly or explicitly, that the underlying statistical processes they treat have simple properties. Often statistics predicated on Wiener processes are used, but even if not, assumptions on statistical stationarity, ergodicity, and memorylessness are often present. We present here empirical observations of several common network time series, and demonstrate that these assumptions are false; the series are non-stationary, non-ergodic, and possess complicated correlation structures. We compute several statistical tests, borrowed from other disciplines, for the evaluation of network time series. We discuss the implications of these results on the larger goal of constructing a meaningful cyber baseline of a network or host, intended to establish the bounds of “normal” behavior. For many common network observables used in defensive cyber operations, it may prove to be unrealistic to establish such a baseline, or detect significant deviations from it.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114243765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient and Composable Parallel Task Programming Library 一个高效且可组合的并行任务编程库
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916447
Chun-Xun Lin, Tsung-Wei Huang, Guannan Guo, Martin D. F. Wong
{"title":"An Efficient and Composable Parallel Task Programming Library","authors":"Chun-Xun Lin, Tsung-Wei Huang, Guannan Guo, Martin D. F. Wong","doi":"10.1109/HPEC.2019.8916447","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916447","url":null,"abstract":"Composability is a key component to improve programmers’ productivity in writing fast market-expanding applications such as parallel machine learning algorithms and big data analytics. These applications exhibit both regular and irregular compute patterns, and are often combined with other functions or libraries to compose a larger program. However, composable parallel processing has taken a back seat in many existing parallel programming libraries, making it difficult to achieve modularity in large-scale parallel programs. In this paper, we introduce a new parallel task programming library using composable tasking graphs. Our library efficiently supports task parallelism together with an intuitive task graph construction and flexible execution API set to enable reusable and composable task dependency graphs. Developers can quickly compose a large parallel program from small and modular parallel building blocks, and easily deploy the program on a multicore machine. We have evaluated our library on real-world applications. Experimental results showed our library can achieve comparable performance to Intel Threading Building Blocks with less coding effort.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123018424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Applying Neuromorphic Computing to Compressive Sensing 神经形态计算在压缩感知中的应用
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916531
R. Scrofano, Douglas Enright, G. Valley
{"title":"Applying Neuromorphic Computing to Compressive Sensing","authors":"R. Scrofano, Douglas Enright, G. Valley","doi":"10.1109/HPEC.2019.8916531","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916531","url":null,"abstract":"As the computing community moves toward processing at the edge, there is a need for computing systems that are both high performance and power efficient. Neuromorphic computing systems have the potential to fill this need.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132497705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Sparse Deep Neural Networks on FPGAs fpga上的稀疏深度神经网络加速
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916419
Sitao Huang, Carl Pearson, R. Nagi, Jinjun Xiong, Deming Chen, Wen-mei W. Hwu
{"title":"Accelerating Sparse Deep Neural Networks on FPGAs","authors":"Sitao Huang, Carl Pearson, R. Nagi, Jinjun Xiong, Deming Chen, Wen-mei W. Hwu","doi":"10.1109/HPEC.2019.8916419","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916419","url":null,"abstract":"Deep neural networks (DNNs) have been widely adopted in many domains, including computer vision, natural language processing, and medical care. Recent research reveals that sparsity in DNN parameters can be exploited to reduce inference computational complexity and improve network quality. However, sparsity also introduces irregularity and extra complexity in data processing, which make the accelerator design challenging. This work presents the design and implementation of a highly flexible sparse DNN inference accelerator on FPGA. Our proposed inference engine can be easily configured to be used in both mobile computing and high-performance computing scenarios. Evaluation shows our proposed inference engine effectively accelerates sparse DNNs and outperforms CPU solution by up to 4.7 $times$ in terms of energy efficiency.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133552034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
QxSQA: GPGPU-Accelerated Simulated Quantum Annealer within a Non-Linear Optimization and Boltzmann Sampling Framework 在非线性优化和玻尔兹曼采样框架下的gpgpu加速模拟量子退火
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916450
Dan Padilha, Serge Weinstock, Mark Hodson
{"title":"QxSQA: GPGPU-Accelerated Simulated Quantum Annealer within a Non-Linear Optimization and Boltzmann Sampling Framework","authors":"Dan Padilha, Serge Weinstock, Mark Hodson","doi":"10.1109/HPEC.2019.8916450","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916450","url":null,"abstract":"We introduce QxSQA, a GPGPU-Accelerated Simulated Quantum Annealer based on Path-Integral Monte Carlo (PIMC). QxSQA is tuned for finding low-energy solutions to integer, non-linear optimization problems of up to 214 (16,384) binary variables with quadratic interactions on a single GPU instance. Experimental results demonstrate QxSQA can solve Maximum Clique test problems of 8,100 binary variables with planted solutions in under one minute, with linear scaling against key optimization parameters on other large-scale problems. Through the PIMC formulation, QxSQA also functions as an accurate sampler of Boltzmann distributions for machine learning applications. Experimental characterization of Boltzmann sampling results for a reinforcement learning problem showed good convergence performance at useful scales. Our implementation integrates as a solver within our QxBranch developer platform, positioning developers to efficiently develop applications using QxSQA, and then test the same application code on a quantum annealer or universal quantum computer hardware platform such as those from D-Wave Systems, IBM, or Rigetti Computing.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131414001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hardware IP Classification through Weighted Characteristics 基于加权特征的硬件IP分类
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916225
Brendan McGeehan, Flora Smith, Thao Le, Hunter Nauman, Jia Di
{"title":"Hardware IP Classification through Weighted Characteristics","authors":"Brendan McGeehan, Flora Smith, Thao Le, Hunter Nauman, Jia Di","doi":"10.1109/HPEC.2019.8916225","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916225","url":null,"abstract":"Today’s business model for hardware designs frequently incorporates third-party Intellectual Property (IP) mainly due to economic motivations. However, allowing third-party involvement also increases the possibility of malicious attacks, such as hardware Trojan insertion, which is a particularly dangerous security threat because functional testing can often leave the Trojan undetected. This research provides an improvement on a Trojan detection method and tool known as Structural Checking which analyzes Register-Transfer Level (RTL) soft IPs. Given an unknown IP, the tool will break down the design and label ports and signals with assets. Analyzing the asset patterns reveals how the IP is structured and provides information about its overall functionality. The tool incorporates a library of known designs referred to as the Golden Reference Library (GRL). All entries in the library, grouped into known-clean and know-infested, are analyzed in the same manner. A weighted percent match for each library entry against the unknown IP is calculated. A report is generated detailing all mismatched locations where users need to take a closer look. Due to the structural variability of soft IP designs, it is vital to provide the best possible weighting to best match the unknown IP to the most similar library entry. This paper provides a statistical approach to finding the best weights to optimize the tool’s matching algorithm.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134395303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
H-INDEX: Hash-Indexing for Parallel Triangle Counting on GPUs H-INDEX: gpu上并行三角形计数的哈希索引
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916492
Santosh Pandey, X. Li, A. Buluç, Jiejun Xu, Hang Liu
{"title":"H-INDEX: Hash-Indexing for Parallel Triangle Counting on GPUs","authors":"Santosh Pandey, X. Li, A. Buluç, Jiejun Xu, Hang Liu","doi":"10.1109/HPEC.2019.8916492","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916492","url":null,"abstract":"Triangle counting is a graph algorithm that calculates the number of triangles involving each vertex in a graph. Briefly, a triangle encompasses three vertices from a graph, where every vertex possesses at least one incidental edge to the other two vertices from the triangle. Consequently, list intersection, which identifies the incidental edges, becomes the core algorithm for triangle counting. At the meantime, attracted by the enormous parallel computing potential of Graphics Processing Units (GPUs), numerous efforts have been devoted to deploy triangle counting algorithms on GPUs.While state-of-the-art intersection algorithms, such as merge-path and binary-search, perform well on traditional multi-core CPU systems, deploying them on massively parallel GPUs turns out to be challenging. In particular, merge-path based approach experiences the hardship of evenly distributing the workload across vast GPU threads and irregular memory accesses. Binary-search based approach often suffers from the potential problem of high time complexity. Furthermore, both approaches require sorted neighbor lists from the input graphs, which involves nontrivial preprocessing overhead. To this end, we introduce H-INDEX, a hash-indexing assisted triangle counting algorithm that overcomes all the aforementioned shortcomings. Notably, HINDEX achieves 141.399 billion TEPS computing rate on a Protein K-mer V2a graph with 64 GPUs. To the best of our knowledge, this is the first work that advances triangle counting beyond the 100 billion TEPS rate.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129962513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Performance of Training Sparse Deep Neural Networks on GPUs 稀疏深度神经网络在gpu上的训练性能
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916506
Jianzong Wang, Zhangcheng Huang, Lingwei Kong, Jing Xiao, Pengyu Wang, Lu Zhang, Chao Li
{"title":"Performance of Training Sparse Deep Neural Networks on GPUs","authors":"Jianzong Wang, Zhangcheng Huang, Lingwei Kong, Jing Xiao, Pengyu Wang, Lu Zhang, Chao Li","doi":"10.1109/HPEC.2019.8916506","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916506","url":null,"abstract":"Deep neural networks have revolutionized the field of machine learning by dramatically improving the state-of-the-art in various domains. The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to fast store and train them. Over the past few decades, researches have explored the prospect of sparse DNNs before, during, and after training by pruning edges from the underlying topology. After the above operation, the generated neural network is known as a sparse neural network. More recent works have demonstrated the remarkable results that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. Although existing methods ease the situation that high demand for computation resources severely hinders the deployment of large-scale DNNs in resource-constrained devices, DNNs can be trained at a faster speed and lower cost. In this work, we propose a Fine-tune Structured Sparsity Learning (FSSL) method to regularize the structures of DNNs and accelerate the training of DNNs. FSSL can: (1) learn a compact structure from large sparse DNN to reduce computation cost; (2) obtain a hardware-friendly to accelerate the DNNs evaluation efficiently. Experimental results of the training time and the compression rate show that superior performance and efficiency than the Matlab example code. These speedups are about twice speedups of non-structured sparsity.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121641754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信