2019 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第3页

A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA 基于FPGA的部分重构自适应分层卷积神经网络设计

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916237

Mohammad Farhadi, Mehdi Ghasemi, Yezhou Yang

{"title":"A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA","authors":"Mohammad Farhadi, Mehdi Ghasemi, Yezhou Yang","doi":"10.1109/HPEC.2019.8916237","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916237","url":null,"abstract":"Nowadays most research in visual recognition using Convolutional Neural Networks (CNNs) follows the “deeper model with deeper confidence” belief to gain a higher recognition accuracy. At the same time, deeper model brings heavier computation. On the other hand, for a large chunk of recognition challenges, a system can classify images correctly using simple models or so-called shallow networks. Moreover, the implementation of CNNs faces with the size, weight, and energy constraints on the embedded devices. In this paper, we implement the adaptive switching between shallow and deep networks to reach the highest throughput on a resource-constrained MPSoC with CPU and FPGA. To this end, we develop and present a novel architecture for the CNNs where a gate makes the decision whether using the deeper model is beneficial or not. Due to resource limitation on FPGA, the idea of partial reconfiguration has been used to accommodate deep CNNs on the FPGA resources. We report experimental results on CIFAR-10, CIFAR-100, and SVHN datasets to validate our approach. Using confidence metric as the decision making factor, only 69.8%, 71.8%, and 43.8% of the computation in the deepest network is done for CIFAR10, CIFAR-100, and SVHN while it can maintain the desired accuracy with the throughput of around 400 images per second for SVHN dataset. https://github.com/mfarhadi/AHCNN.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121533225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS SNP Profiles IdPrism:使用MPS SNP档案快速分析法医DNA样本

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916521

D. Ricke, James Watkins, Philip Fremont-Smith, Adam Michaleas

引用次数: 4

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/hpec.2019.8916557

引用次数: 0

Singularity for Machine Learning Applications - Analysis of Performance Impact 机器学习应用的奇点-性能影响分析

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916443

B. R. Jordan, David Barrett, David Burke, Patrick Jardin, Amelia Littrell, P. Monticciolo, Michael Newey, J. Piou, Kara Warner

引用次数: 2

Skip the Intersection: Quickly Counting Common Neighbors on Shared-Memory Systems 跳过交叉点:快速计算共享内存系统上的共同邻居

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916307

Xiaojing An, Kasimir Gabert, James Fox, Oded Green, David A. Bader

引用次数: 2

ECG Feature Processing Performance Acceleration on SLURM Compute Systems 基于SLURM计算系统的心电特征处理性能加速

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916397

Michael Nolan, Mark Hernandez, Philip Fremont-Smith, A. Swiston, K. Claypool

{"title":"ECG Feature Processing Performance Acceleration on SLURM Compute Systems","authors":"Michael Nolan, Mark Hernandez, Philip Fremont-Smith, A. Swiston, K. Claypool","doi":"10.1109/HPEC.2019.8916397","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916397","url":null,"abstract":"Electrocardiogram (ECG) signal features (e.g. Heart rate, intrapeak interval times) are data commonly used in physiological assessment. Commercial off-the-shelf (COTS) software solutions for ECG data processing are available, but are often developed for serialized data processing which scale poorly for large datasets. To address this issue, we’ve developed a Matlab code library for parallelized ECG feature generation. This library uses the pMatlab and MatMPI interfaces to distribute computing tasks over supercomputing clusters using the Simple Linux Utility for Resource Management (SLURM). To profile its performance as a function of parallelization scale, the ECG processing code was executed on a non-human primate dataset on the Lincoln Laboratory Supercomputing TXGreen cluster. Feature processing jobs were deployed over a range of processor counts and processor types to assess the overall reduction in job computation time. We show that individual process times decrease according to a 1/n relationship to the number of processors used, while total computation times accounting for deployment and data aggregation impose diminishing returns of time against processor count. A maximum mean reduction in overall file processing time of 99% is shown.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124857800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Introducing DyMonDS-as-a-Service (DyMaaS) for Internet of Things 为物联网引入DyMonDS-as-a-Service (DyMaaS)

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916560

M. Ilić, Rupamathi Jaddivada

{"title":"Introducing DyMonDS-as-a-Service (DyMaaS) for Internet of Things","authors":"M. Ilić, Rupamathi Jaddivada","doi":"10.1109/HPEC.2019.8916560","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916560","url":null,"abstract":"With recent trends in computation and communication architecture, it is becoming possible to simulate complex networked dynamical systems by employing high-fidelity models. The inherent spatial and temporal complexity of these systems, however, still acts as a roadblock. It is thus desirable to have adaptive platform design facilitating zooming-in and out of the models to emulate time-evolution of processes at a desired spatial and temporal granularity. In this paper, we propose new computing and networking abstractions, that can embrace physical dynamics and computations in a unified manner, by taking advantage of the inherent structure. We further design multi-rate numerical methods that can be implemented by computing architectures to facilitate adaptive zooming-in and out of the models spanning multiple spatial and temporal layers. These methods are all embedded in a platform called Dynamic Monitoring and Decision Systems (DyMonDS). We introduce a new service model of cloud computing called DyMonDS-as-a-Service (DyMaas), for use by operators at various spatial granularities to efficiently emulate the interconnection of IoT devices. The usage of this platform is described in the context of an electric microgrid system emulation.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131278892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Fast Stochastic Block Partitioning via Sampling 基于采样的快速随机块分区

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916542

Frank Wanye, Vitaliy Gleyzer, Wu-chun Feng

引用次数: 5

Target-based Resource Allocation for Deep Learning Applications in a Multi-tenancy System 基于目标的多租户深度学习应用资源分配

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916403

Wenjia Zheng, Yun Song, Zihao Guo, Yongcheng Cui, Suwen Gu, Ying Mao, Long Cheng

{"title":"Target-based Resource Allocation for Deep Learning Applications in a Multi-tenancy System","authors":"Wenjia Zheng, Yun Song, Zihao Guo, Yongcheng Cui, Suwen Gu, Ying Mao, Long Cheng","doi":"10.1109/HPEC.2019.8916403","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916403","url":null,"abstract":"The neural-network based deep learning is the key technology that enables many powerful applications, which include self-driving vehicles, computer vision, and natural language processing. Although various algorithms focus on different directions, generally, they mainly employ an iteration by iteration training and evaluating the process. Each iteration aims to find a parameter set, which minimizes a loss function defined by the learning model. When completing the training process, the global minimum is achieved with a set of optimized parameters. At this stage, deep learning applications can be shipped with a trained model to provide services. While deep learning applications are reshaping our daily life, obtaining a good learning model is an expensive task. Training deep learning models is, usually, time-consuming and requires lots of resources, e.g. CPU and GPU. In a multi-tenancy system, however, limited resources are shared by multiple clients that lead to severe resource contention. Therefore, a carefully designed resource management scheme is required to improve the overall performance. In this project, we propose a target based scheduling scheme named TRADL. In TRADL, developers have options to specify a two-tier target. If the accuracy of the model reaches a target, it can be delivered to clients while the training is still going on to continue improving the quality. The experiments show that TRADL is able to significantly reduce the time cost, as much as 48.2%, for reaching the target.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115043470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Towards Improving Rate-Distortion Performance of Transform-Based Lossy Compression for HPC Datasets 改进基于变换的HPC数据有损压缩的率失真性能研究

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916286

Jialing Zhang, Aekyeung Moon, Xiaoyan Zhuo, S. Son

{"title":"Towards Improving Rate-Distortion Performance of Transform-Based Lossy Compression for HPC Datasets","authors":"Jialing Zhang, Aekyeung Moon, Xiaoyan Zhuo, S. Son","doi":"10.1109/HPEC.2019.8916286","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916286","url":null,"abstract":"As the size and amount of data produced by high-performance computing (HPC) applications grow exponentially, an effective data reduction technique is becoming critical to mitigating time and space burden. Lossy compression techniques, which have been widely used in image and video compression, hold promise to fulfill such data reduction need. However, they are seldom adopted in HPC datasets because of their difficulty in quantifying the amount of information loss and data reduction. In this paper, we explore a lossy compression strategy by revisiting the energy compaction properties of discrete transforms on HPC datasets. Specifically, we apply block-based transforms to HPC datasets, obtain the minimum number of coefficients containing the maximum energy (or information) compaction rate, and quantize remaining non-dominant coefficients using a binning mechanism to minimize information loss expressed in a distortion measure. We implement the proposed approach and evaluate it using six real-world HPC datasets. Our experimental results show that, on average, only 6.67 bits are required to preserve an optimal energy compaction rate on our evaluated datasets. Moreover, our knee detection algorithm improves the distortion in terms of peak signal-to-noise ratio by 2.46 dB on average.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133765375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8