2022 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第8页

Design and Implementation of a Real-time Parallel FFT for a Direction-Finding System on an FPGA 基于FPGA的测向系统实时并行FFT的设计与实现

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926310

Bheema Lakshmi Pradeep, R. Anand, Pavan Vadakattu, Syed Azemuddin, A. Ahmed

{"title":"Design and Implementation of a Real-time Parallel FFT for a Direction-Finding System on an FPGA","authors":"Bheema Lakshmi Pradeep, R. Anand, Pavan Vadakattu, Syed Azemuddin, A. Ahmed","doi":"10.1109/HPEC55821.2022.9926310","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926310","url":null,"abstract":"An FFT is an essential algorithm for radar signal processing in a radar system. Due to increase in computational power of FPGAs, it is possible to perform FFT operation onboard in an airborne vehicle. However, the FPGA resources have become a limitation for processing real-time signals using conventional methods. To address this issue, we have proposed a parallel pipelined FFT architecture that can achieve very high throughput with very low latency, making it capable of processing real-time continuous data. This architecture is implemented in a radar system, which works from L band to Ku band. In this radar system, the received RF signal is downconverted into an IF signal of 1 GHz frequency with a 500 MHz bandwidth and converted to digital data using a 10-bit ADC. On the converted digital data, a 512-point FFT is implemented on a Xilinx Virtex-7 XC7VX485T FPGA using 8 parallel channels with 64 data frames and is compared with the conventional IP core-based architecture. The proposed architecture takes 1.307µs to implement FFT, which is 5.15 times faster than the IP core-based architecture and requires fewer arithmetic computations. The overall total number of complex multiplications, complex additions, multipliers & adders were reduced by 10.42%, 30.64%, 10.42% & 23.90% respectively. Apart from very low latency and fewer arithmetic operations, the proposed parallel FFT architecture achieved a throughput of 1.350 Giga Samples per second (Gsps).","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134570310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

C2QA - Bosonic Qiskit

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926318

Timothy J Stavenger, E. Crane, Kevin C. Smith, Christopher Kang, S. Girvin, N. Wiebe

{"title":"C2QA - Bosonic Qiskit","authors":"Timothy J Stavenger, E. Crane, Kevin C. Smith, Christopher Kang, S. Girvin, N. Wiebe","doi":"10.1109/HPEC55821.2022.9926318","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926318","url":null,"abstract":"The practical benefits of hybrid quantum information processing hardware that contains continuous-variable objects (bosonic modes such as mechanical or electromagnetic oscillators) in addition to traditional (discrete-variable) qubits have recently been demonstrated by experiments with bosonic codes that reach the break-even point for quantum error correction [1-5] and by efficient Gaussian boson sampling simulation of the Franck-Condon spectra of triatomic molecules [6] that is well beyond the capabilities of current qubit-only hardware. The goal of this Co-design Center for Quantum Advantage (C2QA) project is to develop an instruction set architecture (ISA) for hybrid qubit/bosonic mode systems that contains an inventory of the fundamental operations and measurements that are possible in such hardware. The corresponding abstract machine model (AMM) would also contain a description of the appropriate error models associated with the gates, measurements and time evolution of the hardware. This information has been implemented as an extension of Qiskit. Qiskit is an open-source software development toolkit (SDK) for simulating the quantum state of a quantum circuit on a system with Python 3.7+ and for running the same circuits on prototype hardware within the IBM Quantum Lab. We introduce the Bosonic Qiskit software to enable the simulation of hybrid qubit/bosonic systems using the existing Qiskit software development kit [7]. This implementation can be used for simulating new hybrid systems, verifying proposed physical systems, and modeling systems larger than can currently be constructed. We also cover tutorials and example use cases included within the software to study Jaynes-Cummings models, bosonic Hubbard models, plotting Wigner functions and animations, and calculating maximum likelihood estimations using Wigner functions.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132626009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Apple Silicon Performance in Scientific Computing 苹果硅在科学计算中的性能

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926315

Connor Kenyon, Collin Capano

引用次数: 5

DASH: Scheduling Deep Learning Workloads on Multi-Generational GPU-Accelerated Clusters DASH:在多代gpu加速集群上调度深度学习工作负载

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926390

Baolin Li, Tirthak Patel, V. Gadepally, K. Gettings, S. Samsi, Devesh Tiwari

引用次数: 0

Parallel Computing with DNA Forensics Data DNA取证数据的并行计算

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926352

Adam Michaleas, Philip Fremont-Smith, Chelsea Lennartz, D. Ricke

引用次数: 0

FAST: A Scalable Subgraph Matching Framework over Large Graphs FAST:一个大图的可伸缩子图匹配框架

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926298

Jiezhong He, Zhouyang Liu, Yixing Chen, H. Pan, Zhen Huang, Dongsheng Li

{"title":"FAST: A Scalable Subgraph Matching Framework over Large Graphs","authors":"Jiezhong He, Zhouyang Liu, Yixing Chen, H. Pan, Zhen Huang, Dongsheng Li","doi":"10.1109/HPEC55821.2022.9926298","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926298","url":null,"abstract":"As one of the most fundamental operations in graph analysis, subgraph matching is widely used in various fields such as social network analysis, knowledge graph query, and fraud detection. Due to its NP-complete complexity, sub-graph matching is challenging on large graphs. Previous work is limited on either scalability or the types of queries that can be handled. To address these problems, we propose a fast, scalable subgraph matching framework that consists of filtering, ordering, and enumeration stages. We exploit the parallelism in the filtering stage, and design a learning-based filtering method to remove false matching candidates; propose heuristic constraint and ordering generation methods to improve the matching efficiency; devise a distributed enumeration algorithm that is further optimized with the introduction of graph cache. Our learning- based filtering method delivers over 90% accuracy for basic queries. Compared with Prune.luice, our matching framework achieves 2–8 x speedup in triangle enumeration and up to 3–4 orders of magnitude higher throughput on generic query enumeration. The caching mechanism further boosts the performance by about 1.5 x to 2.5 x on average. Experiments also demonstrate the scalability of our framework.11This work is supported by the Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory (PDL). The grant number is WDZC2020SS00101., 22The source code is available at https://github.com/yixinchen200S/FAST.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"419 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122795987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Arachne: An Arkouda Package for Large-Scale Graph Analytics Arachne:用于大规模图分析的Arkouda软件包

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9991947

Oliver Alvarado Rodriguez, Zhihui Du, J. Patchett, Fuhuan Li, David A. Bader

{"title":"Arachne: An Arkouda Package for Large-Scale Graph Analytics","authors":"Oliver Alvarado Rodriguez, Zhihui Du, J. Patchett, Fuhuan Li, David A. Bader","doi":"10.1109/HPEC55821.2022.9991947","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9991947","url":null,"abstract":"Due to the emergence of massive real-world graphs, whose sizes may extend to terabytes, new tools must be developed to enable data scientists to handle such graphs efficiently. These graphs may include social networks, computer networks, and genomes. In this paper, we propose a novel graph package Arachne to make large-scale graph analytics more effortless and more efficient based on the open-source Arkouda framework, which has been developed to allow users to perform massively parallel computations on distributed data with an interface similar to NumPy. In this package, we developed a fundamental sparse graph data structure and several useful graph algorithms around our data structure to build a basic algorithmic library. Benchmarks and tools have also been developed to evaluate and demonstrate the provided graph algorithms. The graph algorithms we have implemented thus far include breadth-first search (BFS), connected components (CC), k-Truss (KT), Jaccard coefficients (JC), triangle counting (TC), and triangle centrality (TCE). Their corresponding experimental results based on realworld and synthetic graphs are presented. Arachne is organized as an Arkouda extension package and is publicly available on GitHub (https://github.com/Bears-R-Us/arkouda-njit).","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123274985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Towards a Generic UVM 走向通用UVM

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926321

Kholoud Mahmoud, Randa Ahmed, Karim M. Ayman, Mostafa Aymau, Waleed Taie, Yasser Ibrahim, H. Mostafa, K. Salah

{"title":"Towards a Generic UVM","authors":"Kholoud Mahmoud, Randa Ahmed, Karim M. Ayman, Mostafa Aymau, Waleed Taie, Yasser Ibrahim, H. Mostafa, K. Salah","doi":"10.1109/HPEC55821.2022.9926321","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926321","url":null,"abstract":"In the modern era of CPU complexity advancements, Processor verification has always been an ever-increasing challenge. The gap between what a verification plan can offer nowadays and the current technology requirements is constantly widened. Despite many efforts on perfecting “Golden-verification-models” during the design phase, and adoption of object-oriented programming into the whole process; many industry experts still consider solo verification test benches as an extreme, time-consuming barricade that leads to a longer time-to-market and a questionable continuity of the current verification process. The Universal Verification Methodology (UVM), came in action as a literal savior to the whole verification community, by offering a merge between System Verilog and SystemC into one environment that is completely standardized, constrained, and reusable, allowing a powerful verification methodology to a wide range of design sizes and types. The main contribution that this project introduces is implementing a generic UVM, in other words, building one verification environment that can be used to accommodate many RTL designs (Soft Processors), having not only different Instruction Set Architectures (ISAs) -of the same categories-, but also different techniques/mechanisms handling the pipeline infrastructures. The proposed generic UVM (GUVM) structure permits the targeted user to attach any soft processor (core) having nearly the same micro-architecture to our test bench, and monitor both: CPU internal behavior and the complete flow of all supported instructions.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128582464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hypersparse Network Flow Analysis of Packets with GraphBLAS 基于GraphBLAS的数据包超稀疏网络流分析

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-13 DOI: 10.1109/HPEC55821.2022.9926320

Tyler H. Trigg, C. Meiners, Sandeep Pisharody, Hayden Jananthan, Michael Jones, Adam Michaleas, Tim Davis, Erik Welch, W. Arcand, David Bestor, William Bergeron, C. Byun, V. Gadepally, Micheal Houle, M. Hubbell, Anna Klein, P. Michaleas, Lauren Milechin, J. Mullen, Andrew Prout, A. Reuther, Antonio Rosa, S. Samsi, Douglas Stetson, Charles Yee, J. Kepner

{"title":"Hypersparse Network Flow Analysis of Packets with GraphBLAS","authors":"Tyler H. Trigg, C. Meiners, Sandeep Pisharody, Hayden Jananthan, Michael Jones, Adam Michaleas, Tim Davis, Erik Welch, W. Arcand, David Bestor, William Bergeron, C. Byun, V. Gadepally, Micheal Houle, M. Hubbell, Anna Klein, P. Michaleas, Lauren Milechin, J. Mullen, Andrew Prout, A. Reuther, Antonio Rosa, S. Samsi, Douglas Stetson, Charles Yee, J. Kepner","doi":"10.1109/HPEC55821.2022.9926320","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926320","url":null,"abstract":"Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multi-temporal spatial analyses are then performed on each sub range to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link).","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"82 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113940879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

An Evaluation of Low Overhead Time Series Preprocessing Techniques for Downstream Machine Learning 用于下游机器学习的低开销时间序列预处理技术评价

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-12 DOI: 10.1109/HPEC55821.2022.9926406

Matthew L. Weiss, Joseph McDonald, David Bestor, Charles Yee, Daniel Edelman, Michael Jones, Andrew Prout, Andrew Bowne, Lindsey McEvoy, V. Gadepally, S. Samsi

{"title":"An Evaluation of Low Overhead Time Series Preprocessing Techniques for Downstream Machine Learning","authors":"Matthew L. Weiss, Joseph McDonald, David Bestor, Charles Yee, Daniel Edelman, Michael Jones, Andrew Prout, Andrew Bowne, Lindsey McEvoy, V. Gadepally, S. Samsi","doi":"10.1109/HPEC55821.2022.9926406","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926406","url":null,"abstract":"In this paper we address the application of pre-processing techniques to multi-channel time series data with varying lengths, which we refer to as the alignment problem, for downstream machine learning. The misalignment of multi-channel time series data may occur for a variety of reasons, such as missing data, varying sampling rates, or inconsistent collection times. We consider multi-channel time series data collected from the MIT SuperCloud High Performance Computing (HPC) center, where different job start times and varying run times of HPC jobs result in misaligned data. This misalignment makes it challenging to build AI/ML approaches for tasks such as compute workload classification. Building on previous supervised classification work with the MIT SuperCloud Dataset, we address the alignment problem via three broad, low overhead approaches: sampling a fixed subset from a full time series, performing summary statistics on a full time series, and sampling a subset of coefficients from time series mapped to the frequency domain. Our best performing models achieve a classification accuracy greater than 95%, outperforming previous approaches to multi-channel time series classification with the MIT SuperCloud Dataset by 5 %. These results indicate our low overhead approaches to solving the alignment problem, in conjunction with standard machine learning techniques, are able to achieve high levels of classification accuracy, and serve as a baseline for future approaches to addressing the alignment problem, such as kernel methods.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125221328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0