Exploring MPI Collective I/O and File-per-process I/O for Checkpointing a Logical Inference Task

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI:10.1109/IPDPSW52791.2021.00153

Ke Fan, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar

{"title":"Exploring MPI Collective I/O and File-per-process I/O for Checkpointing a Logical Inference Task","authors":"Ke Fan, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar","doi":"10.1109/IPDPSW52791.2021.00153","DOIUrl":null,"url":null,"abstract":"We present a scalable parallel I/O system for a logical-inferencing application built atop a deductive database. Deductive databases can make logical deductions (i.e. conclude additional facts), based on a set of program rules, derived from facts already in the database. Datalog is a language or family of languages commonly used to specify rules and queries for a deductive database. Applications built using Datalog can range from graph mining (such as computing transitive closure or k-cliques) to program analysis (control and data-flow analysis). In our previous papers, we presented the first implementation of a data-parallel Datalog built using MPI. In this paper, we present a parallel I/O system used to checkpoint and restart applications built on top of our Datalog system. State of the art Datalog implementations, such as Soufflé, only support serial I/O, mainly because the implementation itself does not support many-node parallel execution.Computing the transitive closure of a graph is one of the simplest logical-inferencing applications built using Datalog; we use it as a micro-benchmark to demonstrate the efficacy of our parallel I/O system. Internally, we use a nested B-tree data-structure to facilitate fast and efficient in-memory access to relational data. Our I/O system therefore involves two steps, converting the application data-layout (a nested B-tree) to a stream of bytes followed by the actual parallel I/O. We explore two popular I/O techniques POSIX I/O and MPI collective I/O. For extracting performance out of MPI Collective I/O we use adaptive striping, and for POSIX I/O we use file-per-process I/O. We demonstrate the scalability of our system at up to 4,096 processes on the Theta supercomputer at the Argonne National Laboratory.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

We present a scalable parallel I/O system for a logical-inferencing application built atop a deductive database. Deductive databases can make logical deductions (i.e. conclude additional facts), based on a set of program rules, derived from facts already in the database. Datalog is a language or family of languages commonly used to specify rules and queries for a deductive database. Applications built using Datalog can range from graph mining (such as computing transitive closure or k-cliques) to program analysis (control and data-flow analysis). In our previous papers, we presented the first implementation of a data-parallel Datalog built using MPI. In this paper, we present a parallel I/O system used to checkpoint and restart applications built on top of our Datalog system. State of the art Datalog implementations, such as Soufflé, only support serial I/O, mainly because the implementation itself does not support many-node parallel execution.Computing the transitive closure of a graph is one of the simplest logical-inferencing applications built using Datalog; we use it as a micro-benchmark to demonstrate the efficacy of our parallel I/O system. Internally, we use a nested B-tree data-structure to facilitate fast and efficient in-memory access to relational data. Our I/O system therefore involves two steps, converting the application data-layout (a nested B-tree) to a stream of bytes followed by the actual parallel I/O. We explore two popular I/O techniques POSIX I/O and MPI collective I/O. For extracting performance out of MPI Collective I/O we use adaptive striping, and for POSIX I/O we use file-per-process I/O. We demonstrate the scalability of our system at up to 4,096 processes on the Theta supercomputer at the Argonne National Laboratory.

查看原文本刊更多论文

探索MPI集体I/O和文件每进程I/O检查点逻辑推理任务

我们提出了一个可扩展的并行I/O系统，用于构建在演绎数据库之上的逻辑推理应用程序。演绎数据库可以根据一组程序规则，从数据库中已经存在的事实中推导出逻辑演绎(即推断出额外的事实)。Datalog是一种或一系列语言，通常用于为演绎数据库指定规则和查询。使用Datalog构建的应用程序可以从图挖掘(例如计算传递闭包或k-cliques)到程序分析(控制和数据流分析)。在我们之前的论文中，我们介绍了第一个使用MPI构建的数据并行Datalog的实现。在本文中，我们提出了一个并行I/O系统，用于检查点和重新启动构建在Datalog系统之上的应用程序。最先进的Datalog实现，如souffl，只支持串行I/O，主要是因为实现本身不支持多节点并行执行。计算图的传递闭包是使用Datalog构建的最简单的逻辑推理应用程序之一;我们使用它作为微基准来演示并行I/O系统的效率。在内部，我们使用嵌套的b树数据结构来促进对关系数据的快速和有效的内存访问。因此，我们的I/O系统涉及两个步骤，将应用程序数据布局(一个嵌套的b树)转换为字节流，然后是实际的并行I/O。我们将探讨两种流行的I/O技术POSIX I/O和MPI集体I/O。为了从MPI Collective I/O中提取性能，我们使用自适应条带，对于POSIX I/O，我们使用每进程文件I/O。我们在阿贡国家实验室的Theta超级计算机上展示了我们系统的可扩展性，最多可达4,096个进程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量