{"title":"LDT: Lightweight Dirty Tracking of Memory Pages for x86 Systems","authors":"Rohit Singh, K. P. Arun, Debadatta Mishra","doi":"10.1109/HiPC56025.2022.00023","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00023","url":null,"abstract":"Incremental memory checkpointing is a crucial primitive required by applications such as live migration, cloning, debugging etc. In many implementations of incremental check-pointing, the memory modifications are tracked by restricting write access to memory pages using the support provided in the memory management unit (MMU) hardware. Disabling write access impacts the performance of applications because of the page faults induced in the form of permission violation on memory store operations by the applications.In this paper, we propose LDT, a light-weight memory write monitoring mechanism to support efficient incremental check-pointing. LDT is designed to work in systems with MMU support for page dirty indicators (such as dirty-bit in x86 systems) by enabling polymorphic use of the indicators such that no other subsystem is impacted because of LDT. We design and implement LDT in the Linux kernel as an alternate to the existing write-restriction based technique. We establish the correctness and comparative efficiency of LDT through extensive experimental analysis. The results show that under write-heavy workloads, LDT outperforms write-restriction based technique by a factor of 2x in execution time. For real-world workload benchmarks such as Redis, LDT results in 2% to 8% throughput improvement compared to the state-of-the-art dirty tracking technique.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126855346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Churn Prediction in Telecommunications Industry Based on Conditional Wasserstein GAN","authors":"Chang Su, Linglin Wei, Xianzhong Xie","doi":"10.1109/HiPC56025.2022.00034","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00034","url":null,"abstract":"In recent years, with the globalization and advancement of the telecommunications industry, the competition in the telecommunications market has become more intense, accompanied by high customer churn rates. Therefore, telecom operators urgently need to formulate effective marketing strategies to prevent the churning of customers. Customer churn prediction is an important means to prevent customer churn, but due to the imbalance of data in the telecommunications industry, the prediction results are always unsatisfactory. To improve prediction performance, the most common method is to oversample the minority class. Standard methods such as SMOTE usually only focus on the minority class samples, and it is easy to ignore the connection between the minority class samples and the majority class samples. In addition, in the case of high-dimensional, complex data distribution, the Euclidean distance used in the SMOTE algorithm is not particularly meaningful and tend to underperform. While Generative Adversarial Networks (GANs) are able to model complex distributions and can in principle be used to generate minority class cases. Therefore, this paper adopts a comprehensive GAN model (CWGAN) based on Wasserstein GAN with Gradient Penalty (WGANGP) and Conditional GAN (CGAN) to handle the imbalanced data in the telecom industry. This is also the first time that GAN has been used to deal with the data imbalance problem in the telecom industry. At the same time, this paper also introduces a hybrid attention mechanism (CBAM) to further assist the generator to focus on features related to classification tasks. Afterwards, the effectiveness of the adopted method is demonstrated on four commonly used machine learning classifiers.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128329563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise Parallel FEM-based Interactive Cutting Simulation of Deformable Bodies","authors":"Harshvardhan Das, Suraj Kumar, Subodh Kumar","doi":"10.1109/HiPC56025.2022.00036","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00036","url":null,"abstract":"This paper presents a novel scalable parallel algorithm for cutting tetrahedral meshes for surgery simulation. Built upon the finite element method (FEM), it focusses on accurate incremental collision detection and efficient topology modification on GPU systems. The overall simulation comprises a small sequence of steps, and each is well parallelized using lock-free data structures. The only synchronization necessary are the few barriers between steps. Our experiments show that the entire simulation runs in real-time for large meshes (of sizes exceeding 1.5 million tetrahedra) and retains mesh quality during the simulation. We compare favorably with the state-of-the-art.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130984048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tania Banerjee, J. Choi, Jaemoon Lee, Qian Gong, Ruonan Wang, S. Klasky, A. Rangarajan, Sanjay Ranka
{"title":"An Algorithmic and Software Pipeline for Very Large Scale Scientific Data Compression with Error Guarantees","authors":"Tania Banerjee, J. Choi, Jaemoon Lee, Qian Gong, Ruonan Wang, S. Klasky, A. Rangarajan, Sanjay Ranka","doi":"10.1109/HiPC56025.2022.00039","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00039","url":null,"abstract":"Efficient data compression is becoming increasingly critical for storing scientific data because many scientific applications produce vast amounts of data. This paper presents an end-to-end algorithmic and software pipeline for data compression that guarantees both error bounds on primary data (PD) and derived data, known as Quantities of Interest (QoI).We demonstrate the effectiveness of the pipeline by compressing fusion data generated by a large-scale fusion code, XGC, which produces tens of petabytes of data in a single day. We demonstrate that the compression is conducted by setting aside computational resources known as staging nodes, and does not impact the simulation performance. For efficient parallel I/O, the pipeline uses ADIOS2, which many codes such as XGC already use for their parallel I/O. We show that our approach can compress the data by two orders of magnitude while guaranteeing high accuracy on both the PD and the QoIs. Further, the amount of resources required by compression is a few percent of the resources required by simulation while ensuring that the compression time for each stage is less than the corresponding simulation time.This pipeline consists of three main steps. The first step decomposes the data using domain decomposition into small subdomains. Each subdomain is then compressed independently to achieve a high level of parallelism. The second step uses existing techniques that guarantee error bounds on the primary data for each subdomain. The third step uses a post-processing optimization technique based on Lagrange multipliers to reduce the QoI errors for data corresponding to each subdomain. The Lagrange multipliers generated can be further quantized or truncated to increase the compression level. All of the above characteristics of our approach make it highly practical to apply on-the-fly compression while guaranteeing errors on QoIs that are critical to the scientists.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132015876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arjun Menon Vadakkeveedu, Debabrata Mandal, Pradeep Ramachandran, N. Chandrachoodan
{"title":"Split-Knit Convolution: Enabling Dense Evaluation of Transpose and Dilated Convolutions on GPUs","authors":"Arjun Menon Vadakkeveedu, Debabrata Mandal, Pradeep Ramachandran, N. Chandrachoodan","doi":"10.1109/HiPC56025.2022.00014","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00014","url":null,"abstract":"Transpose convolutions occur in several image-based neural network applications, especially those involving segmentation or image generation. Unlike regular (forward) convolutions, they result in data access and computation patterns that are less regular, and generally have poorer performance when implemented in software. We present split-knit convolution (SKConv) – a technique to replace transpose convolutions with multiple regular convolutions followed by interleaving. We show how existing software frameworks for GPU implementation of deep neural networks can be adapted to realize this computation, and compare against the standard techniques used by such frameworks.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132799687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reet Barik, Marco Minutoli, M. Halappanavar, A. Kalyanaraman
{"title":"IMpart: A Partitioning-based Parallel Approach to Accelerate Influence Maximization","authors":"Reet Barik, Marco Minutoli, M. Halappanavar, A. Kalyanaraman","doi":"10.1109/HiPC56025.2022.00028","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00028","url":null,"abstract":"Influence maximization (IM) is a fundamental operation among graph problems that involve simulating a stochastic diffusion process on real-world networks. Given a graph G(V, E), the objective is to identify a small set of key influential \"seeds\"— i.e., a fixed-size set of k nodes, which when influenced is likely to lead to the maximum number of nodes in the network getting influenced. The problem has numerous applications including (but not limited to) viral marketing in social networks, epidemic control in contact networks, and in finding influential proteins in molecular networks. Despite its importance, application of influence maximization at scale continues to pose significant challenges. While the problem is NP-hard, efficient approximation algorithms that use greedy hill climbing are used in practice. However those algorithms consume hours of multithreaded execution time even on modest-sized inputs with hundreds of thousands of nodes. In this paper, we present IMpart, a partitioning-based approach to accelerate greedy hill climbing based IM approaches on both shared and distributed memory computers. In particular, we present two parallel algorithms— one that uses graph partitioning (IMpart-metis) and another that uses community-aware partitioning (IMpart-gratis)— with provable guarantees on the quality of approximation. Experimental results show that our approaches are able to deliver two to three orders of magnitude speedup over a state-of-the-art multithreaded hill climbing implementation with negligible loss in quality. For instance, on one of the modest-sized inputs (Slashdot: 73K nodes; 905K edges), our partitioning-based shared memory implementation yields 4610× speedup, reducing the runtime from 9h 36m to 7 seconds on 128 threads. Furthermore, our distributed memory implementation enhances problem size reach to graph inputs with ×106 nodes and ×108 edges and enables sub-minute computation of IM solutions.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhihui Du, J. Patchett, Oliver Alvarado Rodriguez, Fuhuan Li, David A. Bader
{"title":"High-Performance Truss Analytics in Arkouda","authors":"Zhihui Du, J. Patchett, Oliver Alvarado Rodriguez, Fuhuan Li, David A. Bader","doi":"10.1109/HiPC56025.2022.00026","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00026","url":null,"abstract":"In graph analytics, a truss is a cohesive subgraph based on the number of triangles supporting each edge. It is widely used for community detection applications such as social networks and security analysis, and the performance of truss analytics highly depends on its triangle counting method. This paper proposes a novel triangle counting kernel named Minimum Search (MS). Minimum Search can select two smaller adjacency lists out of three and uses fine-grained parallelism to improve the performance of triangle counting. Then, two basic algorithms, MS-based triangle counting, and MS-based support updating are developed. Based on the novel triangle counting kernel and the two basic algorithms above, three fundamental parallel truss analytics algorithms are designed and implemented to enable different kinds of graph truss analysis. These truss algorithms include an optimized K-Truss algorithm, a Max-Truss algorithm, and a Truss Decomposition algorithm. Moreover, all proposed algorithms have been implemented in the parallel language Chapel and integrated into an open-source framework, Arkouda. Through Arkouda, data scientists can efficiently con-duct graph analysis through an easy-to-use Python interface and handle large-scale graph data in powerful back-end computing resources. Experimental results show that the proposed methods can significantly improve the performance of truss analysis on real-world graphs compared with the existing and widely adopted list intersection-based method. The implemented code is publicly available from GitHub (https://github.com/Bears-R-Us/arkouda-njit).","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"30 18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123444990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HiPC 2022 Steering Committee","authors":"","doi":"10.1109/hipc56025.2022.00008","DOIUrl":"https://doi.org/10.1109/hipc56025.2022.00008","url":null,"abstract":"","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130227282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keynote 3: Jack Dongarra","authors":"","doi":"10.1109/hipc56025.2022.00012","DOIUrl":"https://doi.org/10.1109/hipc56025.2022.00012","url":null,"abstract":"","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123941968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Ramesh, Qinghua Zhou, A. Shafi, M. Abduljabbar, H. Subramoni, D. Panda
{"title":"Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries","authors":"B. Ramesh, Qinghua Zhou, A. Shafi, M. Abduljabbar, H. Subramoni, D. Panda","doi":"10.1109/HiPC56025.2022.00024","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00024","url":null,"abstract":"The emergence of trillion-parameter models in AI, and the deployment of dense Graphics Processing Unit (GPU) systems with high-bandwidth inter-GPU and network interconnects underscores the need to design efficient architecture-aware large message communication operations. GPU-based on-the-fly compression communication designs help reduce the amount of data transferred across processes, thereby improving large message communication performance. In this paper, we first analyze bottlenecks in state-of-the-art on-the-fly compression-based MPI implementations for blocking as well as non-blocking point-to-point communication operations. We then propose efficient point-to-point designs that improve upon state-of-the-art implementations through fine-grained overlap of copy, compression and communication operations. We demonstrate the efficacy of our proposed designs by comparing against state-of-the-art communication runtimes using micro-benchmarks and candidate communication patterns. Our proposed designs deliver 28.7% improvements in latency, 49.7% in bandwidth, and 36% in bi-directional bandwidth using micro-benchmarks, and up to 16.5% improvements for 3D stencil-based communication patterns over state-of-the-art designs.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"373 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122500668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}