2019 IEEE High Performance Extreme Computing Conference (HPEC)最新文献

筛选
英文 中文
A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA 基于FPGA的部分重构自适应分层卷积神经网络设计
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916237
Mohammad Farhadi, Mehdi Ghasemi, Yezhou Yang
{"title":"A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA","authors":"Mohammad Farhadi, Mehdi Ghasemi, Yezhou Yang","doi":"10.1109/HPEC.2019.8916237","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916237","url":null,"abstract":"Nowadays most research in visual recognition using Convolutional Neural Networks (CNNs) follows the “deeper model with deeper confidence” belief to gain a higher recognition accuracy. At the same time, deeper model brings heavier computation. On the other hand, for a large chunk of recognition challenges, a system can classify images correctly using simple models or so-called shallow networks. Moreover, the implementation of CNNs faces with the size, weight, and energy constraints on the embedded devices. In this paper, we implement the adaptive switching between shallow and deep networks to reach the highest throughput on a resource-constrained MPSoC with CPU and FPGA. To this end, we develop and present a novel architecture for the CNNs where a gate makes the decision whether using the deeper model is beneficial or not. Due to resource limitation on FPGA, the idea of partial reconfiguration has been used to accommodate deep CNNs on the FPGA resources. We report experimental results on CIFAR-10, CIFAR-100, and SVHN datasets to validate our approach. Using confidence metric as the decision making factor, only 69.8%, 71.8%, and 43.8% of the computation in the deepest network is done for CIFAR10, CIFAR-100, and SVHN while it can maintain the desired accuracy with the throughput of around 400 images per second for SVHN dataset. https://github.com/mfarhadi/AHCNN.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121533225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS SNP Profiles IdPrism:使用MPS SNP档案快速分析法医DNA样本
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916521
D. Ricke, James Watkins, Philip Fremont-Smith, Adam Michaleas
{"title":"IdPrism: Rapid Analysis of Forensic DNA Samples Using MPS SNP Profiles","authors":"D. Ricke, James Watkins, Philip Fremont-Smith, Adam Michaleas","doi":"10.1109/HPEC.2019.8916521","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916521","url":null,"abstract":"Massively parallel sequencing (MPS) of large single nucleotide polymorphism (SNP) panels enables identification, analysis of complex DNA mixture samples, and extended kinship predictions. Computational challenges related to SNP allele calling, probability of random man not excluded calculations, and both reference and complex mixture sample comparisons to tens of millions of reference profiles were encountered and resolved when scaling up from thousands to tens of thousands of SNP loci. A MPS SNP analysis pipeline is described for rapid analysis of forensic deoxyribonucleic acid (DNA) samples for thousands to tens of thousands of SNP loci against tens of millions of reference profiles. This pipeline is part of the MIT Lincoln Laboratory (MITLL) IdPrism advanced DNA forensic system.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127927347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
[HPEC 2019 Copyright notice] [HPEC 2019版权声明]
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/hpec.2019.8916557
{"title":"[HPEC 2019 Copyright notice]","authors":"","doi":"10.1109/hpec.2019.8916557","DOIUrl":"https://doi.org/10.1109/hpec.2019.8916557","url":null,"abstract":"","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114012282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Singularity for Machine Learning Applications - Analysis of Performance Impact 机器学习应用的奇点-性能影响分析
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916443
B. R. Jordan, David Barrett, David Burke, Patrick Jardin, Amelia Littrell, P. Monticciolo, Michael Newey, J. Piou, Kara Warner
{"title":"Singularity for Machine Learning Applications - Analysis of Performance Impact","authors":"B. R. Jordan, David Barrett, David Burke, Patrick Jardin, Amelia Littrell, P. Monticciolo, Michael Newey, J. Piou, Kara Warner","doi":"10.1109/HPEC.2019.8916443","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916443","url":null,"abstract":"Software deployments in general, and deep learning applications in particular, suffer from difficulty in reproducible results. The use of containers to mitigate these issues is becoming a common practice. Singularity is a container technology which targets the unique issues present in High Performance Computing (HPC) Centers. This paper characterizes the impact of using Singularity for both Training and Inference on deep learning applications.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115552403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Skip the Intersection: Quickly Counting Common Neighbors on Shared-Memory Systems 跳过交叉点:快速计算共享内存系统上的共同邻居
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916307
Xiaojing An, Kasimir Gabert, James Fox, Oded Green, David A. Bader
{"title":"Skip the Intersection: Quickly Counting Common Neighbors on Shared-Memory Systems","authors":"Xiaojing An, Kasimir Gabert, James Fox, Oded Green, David A. Bader","doi":"10.1109/HPEC.2019.8916307","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916307","url":null,"abstract":"Counting common neighbors between all vertex pairs in a graph is a fundamental operation, with uses in similarity measures, link prediction, graph compression, community detection, and more. Current shared-memory approaches either rely on set intersections or are not readily parallelizable. We introduce a new efficient and parallelizable algorithm to count common neighbors: starting at a wedge endpoint, we iterate through all wedges in the graph, and increment the common neighbor count for each endpoint pair. This exactly counts the common neighbors between all pairs without using set intersections, and as such attains an asymptotic improvement in runtime. Furthermore, our algorithm is simple to implement and only slight modifications are required for existing implementations to use our results. We provide an OpenMP implementation and evaluate it on real-world and synthetic graphs, demonstrating no loss of scalability and an asymptotic improvement. We show intersections are neither necessary nor helpful for computing all pairs common neighbor counts.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126588492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ECG Feature Processing Performance Acceleration on SLURM Compute Systems 基于SLURM计算系统的心电特征处理性能加速
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916397
Michael Nolan, Mark Hernandez, Philip Fremont-Smith, A. Swiston, K. Claypool
{"title":"ECG Feature Processing Performance Acceleration on SLURM Compute Systems","authors":"Michael Nolan, Mark Hernandez, Philip Fremont-Smith, A. Swiston, K. Claypool","doi":"10.1109/HPEC.2019.8916397","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916397","url":null,"abstract":"Electrocardiogram (ECG) signal features (e.g. Heart rate, intrapeak interval times) are data commonly used in physiological assessment. Commercial off-the-shelf (COTS) software solutions for ECG data processing are available, but are often developed for serialized data processing which scale poorly for large datasets. To address this issue, we’ve developed a Matlab code library for parallelized ECG feature generation. This library uses the pMatlab and MatMPI interfaces to distribute computing tasks over supercomputing clusters using the Simple Linux Utility for Resource Management (SLURM). To profile its performance as a function of parallelization scale, the ECG processing code was executed on a non-human primate dataset on the Lincoln Laboratory Supercomputing TXGreen cluster. Feature processing jobs were deployed over a range of processor counts and processor types to assess the overall reduction in job computation time. We show that individual process times decrease according to a 1/n relationship to the number of processors used, while total computation times accounting for deployment and data aggregation impose diminishing returns of time against processor count. A maximum mean reduction in overall file processing time of 99% is shown.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124857800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introducing DyMonDS-as-a-Service (DyMaaS) for Internet of Things 为物联网引入DyMonDS-as-a-Service (DyMaaS)
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916560
M. Ilić, Rupamathi Jaddivada
{"title":"Introducing DyMonDS-as-a-Service (DyMaaS) for Internet of Things","authors":"M. Ilić, Rupamathi Jaddivada","doi":"10.1109/HPEC.2019.8916560","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916560","url":null,"abstract":"With recent trends in computation and communication architecture, it is becoming possible to simulate complex networked dynamical systems by employing high-fidelity models. The inherent spatial and temporal complexity of these systems, however, still acts as a roadblock. It is thus desirable to have adaptive platform design facilitating zooming-in and out of the models to emulate time-evolution of processes at a desired spatial and temporal granularity. In this paper, we propose new computing and networking abstractions, that can embrace physical dynamics and computations in a unified manner, by taking advantage of the inherent structure. We further design multi-rate numerical methods that can be implemented by computing architectures to facilitate adaptive zooming-in and out of the models spanning multiple spatial and temporal layers. These methods are all embedded in a platform called Dynamic Monitoring and Decision Systems (DyMonDS). We introduce a new service model of cloud computing called DyMonDS-as-a-Service (DyMaas), for use by operators at various spatial granularities to efficiently emulate the interconnection of IoT devices. The usage of this platform is described in the context of an electric microgrid system emulation.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131278892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fast Stochastic Block Partitioning via Sampling 基于采样的快速随机块分区
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916542
Frank Wanye, Vitaliy Gleyzer, Wu-chun Feng
{"title":"Fast Stochastic Block Partitioning via Sampling","authors":"Frank Wanye, Vitaliy Gleyzer, Wu-chun Feng","doi":"10.1109/HPEC.2019.8916542","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916542","url":null,"abstract":"Community detection in graphs, also known as graph partitioning, is a well-studied NP-hard problem. Various heuristic approaches have been adopted to tackle this problem in polynomial time. One such approach, as outlined in the IEEE HPEC Graph Challenge, is Bayesian statistics-based stochastic block partitioning. This method delivers high-quality partitions in sub-quadratic runtime, but it fails to scale to very large graphs. In this paper, we present sampling as an avenue for speeding up the algorithm on large graphs. We first show that existing sampling techniques can preserve a graph’s community structure. We then show that sampling for stochastic block partitioning can be used to produce a speedup of between $2.18 times$ and $7.26 times$ for graph sizes between 5,000 and 50,000 vertices without a significant loss in the accuracy of community detection.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130442985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Target-based Resource Allocation for Deep Learning Applications in a Multi-tenancy System 基于目标的多租户深度学习应用资源分配
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916403
Wenjia Zheng, Yun Song, Zihao Guo, Yongcheng Cui, Suwen Gu, Ying Mao, Long Cheng
{"title":"Target-based Resource Allocation for Deep Learning Applications in a Multi-tenancy System","authors":"Wenjia Zheng, Yun Song, Zihao Guo, Yongcheng Cui, Suwen Gu, Ying Mao, Long Cheng","doi":"10.1109/HPEC.2019.8916403","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916403","url":null,"abstract":"The neural-network based deep learning is the key technology that enables many powerful applications, which include self-driving vehicles, computer vision, and natural language processing. Although various algorithms focus on different directions, generally, they mainly employ an iteration by iteration training and evaluating the process. Each iteration aims to find a parameter set, which minimizes a loss function defined by the learning model. When completing the training process, the global minimum is achieved with a set of optimized parameters. At this stage, deep learning applications can be shipped with a trained model to provide services. While deep learning applications are reshaping our daily life, obtaining a good learning model is an expensive task. Training deep learning models is, usually, time-consuming and requires lots of resources, e.g. CPU and GPU. In a multi-tenancy system, however, limited resources are shared by multiple clients that lead to severe resource contention. Therefore, a carefully designed resource management scheme is required to improve the overall performance. In this project, we propose a target based scheduling scheme named TRADL. In TRADL, developers have options to specify a two-tier target. If the accuracy of the model reaches a target, it can be delivered to clients while the training is still going on to continue improving the quality. The experiments show that TRADL is able to significantly reduce the time cost, as much as 48.2%, for reaching the target.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115043470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Towards Improving Rate-Distortion Performance of Transform-Based Lossy Compression for HPC Datasets 改进基于变换的HPC数据有损压缩的率失真性能研究
2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916286
Jialing Zhang, Aekyeung Moon, Xiaoyan Zhuo, S. Son
{"title":"Towards Improving Rate-Distortion Performance of Transform-Based Lossy Compression for HPC Datasets","authors":"Jialing Zhang, Aekyeung Moon, Xiaoyan Zhuo, S. Son","doi":"10.1109/HPEC.2019.8916286","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916286","url":null,"abstract":"As the size and amount of data produced by high-performance computing (HPC) applications grow exponentially, an effective data reduction technique is becoming critical to mitigating time and space burden. Lossy compression techniques, which have been widely used in image and video compression, hold promise to fulfill such data reduction need. However, they are seldom adopted in HPC datasets because of their difficulty in quantifying the amount of information loss and data reduction. In this paper, we explore a lossy compression strategy by revisiting the energy compaction properties of discrete transforms on HPC datasets. Specifically, we apply block-based transforms to HPC datasets, obtain the minimum number of coefficients containing the maximum energy (or information) compaction rate, and quantize remaining non-dominant coefficients using a binning mechanism to minimize information loss expressed in a distortion measure. We implement the proposed approach and evaluate it using six real-world HPC datasets. Our experimental results show that, on average, only 6.67 bits are required to preserve an optimal energy compaction rate on our evaluated datasets. Moreover, our knee detection algorithm improves the distortion in terms of peak signal-to-noise ratio by 2.46 dB on average.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133765375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信