Joel C. Adams, Richard A. Brown, Suzanne J. Matthews, E. Shoop
{"title":"Teaching PDC in the Time of COVID: Hands-on Materials for Remote Learning","authors":"Joel C. Adams, Richard A. Brown, Suzanne J. Matthews, E. Shoop","doi":"10.1109/IPDPSW52791.2021.00061","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00061","url":null,"abstract":"In response to shifts in the hardware foundations of computing, parallel and distributed computing (PDC) is now a key piece of the core CS curriculum. For CS educators, the COVID-19 pandemic and the resulting switch to remote-learning add new challenges to the tasks of helping learners understand abstract PDC concepts and equipping them with hands-on practical skills. This paper presents several novel teaching materials for teaching PDC remotely, including: (i) using a Runestone Interactive \"virtual\" handout to learn how to run OpenMP multithreaded programs on a Raspberry Pi, and (ii) using Google Colab and Jupyter notebooks to run mpi4py instances on remote systems and thus learn about MPI distributed multiprocessing. The authors piloted these strategies during a multi-day faculty development workshop on teaching PDC. Assessment data indicates that the materials greatly aided professional development and preparedness to teach PDC.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130974849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kuldeep R. Kurte, N. Imam, R. Kannan, S. Hasan, Srikanth B. Yoginath
{"title":"Co-design of Advanced Architectures for Graph Analytics using Machine Learning","authors":"Kuldeep R. Kurte, N. Imam, R. Kannan, S. Hasan, Srikanth B. Yoginath","doi":"10.1109/IPDPSW52791.2021.00053","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00053","url":null,"abstract":"A graph is an excellent way of representing relationships among entities. We can use graph analytics to synthesize and analyze such relational data, and extract relevant features that are useful for various tasks such as machine learning. Considering the crucial role of graph analytics in various domains, it is important and timely to investigate the right hardware configurations that can achieve optimal performance for graph workloads on future high-performance computing systems. Design space exploration studies facilitate the selection of appropriate configurations (e.g. memory) to achieve a desired system performance. Recently, the approach of accelerating graph analytics using persistent non-volatile memory has gained a lot of attention. Traditional system simulators such as Gem5 and NVMain can be used to explore the design space of these advanced memory architectures for graph workloads. However, these simulators are slow in execution thus limiting the efficiency of design space exploration studies. To overcome this challenge, we proposed a machine learning based approach to co-design advanced memory architectures for graph workloads. We tested our approach with DRAM, non-volatile memory, and hybrid memory (DRAM+NVM) using a breadth first search benchmark algorithm. Our results showed the applicability of the proposed machine learning based approach to the co-design of the advanced memory architectures. In this paper, we provide recommendations on selecting advanced memory architectures to achieve desired performance for graph workloads. We also discuss the performances of different machine learning models that were considered in this study.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132737236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the ScaDL 2021 Workshop Chairs","authors":"","doi":"10.1109/ipdpsw52791.2021.00135","DOIUrl":"https://doi.org/10.1109/ipdpsw52791.2021.00135","url":null,"abstract":"","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133397055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enrique Nueve, Sean Shahkarami, Seongha Park, N. Ferrier
{"title":"Addressing the Constraints of Active Learning on the Edge","authors":"Enrique Nueve, Sean Shahkarami, Seongha Park, N. Ferrier","doi":"10.1109/IPDPSW52791.2021.00126","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00126","url":null,"abstract":"The design of machine learning methodology often does not take into account the limitations of edge computing. In particular, active learning approaches have not considered the constraints of the edge, such as separate data locations (labeled data is on the cloud whereas unlabeled data is on the edge), cold starting or low initial model performance, limited budget sizes due to bandwidth constraints, and computational constraints due to edge hardware. Active learning on the edge could help decide what data to cache on the edge and what data to prioritize for offloading, facilitating efficient use of memory and bandwidth resources. Active learning on the edge would also allow for a machine learning model to be trained using a minimal amount of data. In this work, we examine the constraints of performing active learning on the edge, propose an active learning method that seeks to address these constraints, and discuss advances needed at large to improve active learning on the edge.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"522 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133697335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Brock, A. Buluç, T. Mattson, Scott McMillan, J. Moreira
{"title":"Introduction to GraphBLAS 2.0","authors":"Benjamin Brock, A. Buluç, T. Mattson, Scott McMillan, J. Moreira","doi":"10.1109/IPDPSW52791.2021.00047","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00047","url":null,"abstract":"The GraphBLAS is a set of basic building blocks for constructing graph algorithms in terms of linear algebra. They are first and foremost defined mathematically with the goal that language bindings will be produced for a wide range of programming languages. We started with the C programming language and over the last four years have produced multiple versions of the GraphBLAS C API specification. In this paper, we describe our next version of the C GraphBLAS specification. It introduces a number of major changes including support for multithreading, import/export functionality, and functions that use the indices of matrix/vector elements. Since some of these changes introduce small backwards compatibility issues, this is a major release we call GraphBLAS 2.0.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121058386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory Efficient Edge Addition Designs for Large and Dynamic Social Networks","authors":"Eunice E. Santos, Vairavan Murugappan, John Korah","doi":"10.1109/IPDPSW52791.2021.00155","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00155","url":null,"abstract":"The availability of large volumes of social network data from a variety of social and socio-technical networks has greatly increased. These networks provide critical insights into understanding various domains including business, healthcare, and disaster management. The relationships and interactions between different entities represented in most of these data sources are constantly evolving. Graph processing and analysis methodologies that can effectively integrate data changes while minimizing recomputations are needed to handle these dynamic networks. In addition, the size of these information sources is constantly increasing, therefore we need designs that can perform analysis that are memory efficient in order to address resource constraints. In this paper, we show how our anytime anywhere framework can be used to construct memory-efficient closeness centrality algorithms. In particular, we will show how dynamic edge additions can be efficiently handled in the proposed scheme.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129045178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caio Salvador Rohwedder, J. P. L. Carvalho, J. N. Amaral, G. Araújo, Giancarlo Colmenares, Kai-Ting Amy Wang
{"title":"Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions","authors":"Caio Salvador Rohwedder, J. P. L. Carvalho, J. N. Amaral, G. Araújo, Giancarlo Colmenares, Kai-Ting Amy Wang","doi":"10.1109/IPDPSW52791.2021.00016","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00016","url":null,"abstract":"Image-to-column (Im2col) and column-to-image (Col2im) are data transformations extensively used to map convolution to matrix multiplication. These transformations rearrange the inputs of convolution to avoid its strided memory access pattern, thus providing a friendlier data layout for CPUs and GPUs. In artificial intelligence (AI) accelerators, these transformations allow convolution to be computed in matrix-multiplier units. Implemented in software, however, they impose a significant overhead that must be compensated by the efficiency gains of matrix multipliers. DaVinci is an AI accelerator architecture that introduces instructions to optimize Im2col and Col2im. Another core layer of convolutional neural networks that presents a strided memory access pattern is pooling. This paper explores the specialized Im2col and Col2im instructions to accelerate pooling layers in DaVinci. An experimental evaluation reveals that the proposed pooling implementations can yield speedups of up to 5.8 times compared to a baseline that does not use these specialized instructions. The speedups follow from an improved memory layout in the inputs of pooling, as this layout leads to better utilization of the vector processing unit in DaVinci.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129249293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User Allocation for Real-Time Applications with State Sharing in Fog Computing Networks","authors":"Ryohei Sato, Hidetoshi Kawaguchi, Yuichi Nakatani","doi":"10.1109/IPDPSW52791.2021.00123","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00123","url":null,"abstract":"Applications in which multiple users share the states in real-time over a network have been rapidly spreading, but network latency degrades their quality of service (QoS) and quality of experience (QoE). Although Fog computing effectively mitigates this problem, user allocation methods suitable for these applications with strict latency requirements have not yet been studied. Therefore, this paper proposes both offline and online methods that assume state sharing for user allocation in Fog environments. These methods not only reduce the mean of delays within each group composed of users who share the same states but also guarantee fairness among the users. The simulations demonstrate that our methods complete the allocation in a realistic time and outperform the baseline methods and architectures.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"345 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116052800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}