探索MPI集体I/O和文件每进程I/O检查点逻辑推理任务

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI:10.1109/IPDPSW52791.2021.00153

Ke Fan, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar

{"title":"探索MPI集体I/O和文件每进程I/O检查点逻辑推理任务","authors":"Ke Fan, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar","doi":"10.1109/IPDPSW52791.2021.00153","DOIUrl":null,"url":null,"abstract":"We present a scalable parallel I/O system for a logical-inferencing application built atop a deductive database. Deductive databases can make logical deductions (i.e. conclude additional facts), based on a set of program rules, derived from facts already in the database. Datalog is a language or family of languages commonly used to specify rules and queries for a deductive database. Applications built using Datalog can range from graph mining (such as computing transitive closure or k-cliques) to program analysis (control and data-flow analysis). In our previous papers, we presented the first implementation of a data-parallel Datalog built using MPI. In this paper, we present a parallel I/O system used to checkpoint and restart applications built on top of our Datalog system. State of the art Datalog implementations, such as Soufflé, only support serial I/O, mainly because the implementation itself does not support many-node parallel execution.Computing the transitive closure of a graph is one of the simplest logical-inferencing applications built using Datalog; we use it as a micro-benchmark to demonstrate the efficacy of our parallel I/O system. Internally, we use a nested B-tree data-structure to facilitate fast and efficient in-memory access to relational data. Our I/O system therefore involves two steps, converting the application data-layout (a nested B-tree) to a stream of bytes followed by the actual parallel I/O. We explore two popular I/O techniques POSIX I/O and MPI collective I/O. For extracting performance out of MPI Collective I/O we use adaptive striping, and for POSIX I/O we use file-per-process I/O. We demonstrate the scalability of our system at up to 4,096 processes on the Theta supercomputer at the Argonne National Laboratory.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Exploring MPI Collective I/O and File-per-process I/O for Checkpointing a Logical Inference Task\",\"authors\":\"Ke Fan, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar\",\"doi\":\"10.1109/IPDPSW52791.2021.00153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a scalable parallel I/O system for a logical-inferencing application built atop a deductive database. Deductive databases can make logical deductions (i.e. conclude additional facts), based on a set of program rules, derived from facts already in the database. Datalog is a language or family of languages commonly used to specify rules and queries for a deductive database. Applications built using Datalog can range from graph mining (such as computing transitive closure or k-cliques) to program analysis (control and data-flow analysis). In our previous papers, we presented the first implementation of a data-parallel Datalog built using MPI. In this paper, we present a parallel I/O system used to checkpoint and restart applications built on top of our Datalog system. State of the art Datalog implementations, such as Soufflé, only support serial I/O, mainly because the implementation itself does not support many-node parallel execution.Computing the transitive closure of a graph is one of the simplest logical-inferencing applications built using Datalog; we use it as a micro-benchmark to demonstrate the efficacy of our parallel I/O system. Internally, we use a nested B-tree data-structure to facilitate fast and efficient in-memory access to relational data. Our I/O system therefore involves two steps, converting the application data-layout (a nested B-tree) to a stream of bytes followed by the actual parallel I/O. We explore two popular I/O techniques POSIX I/O and MPI collective I/O. For extracting performance out of MPI Collective I/O we use adaptive striping, and for POSIX I/O we use file-per-process I/O. We demonstrate the scalability of our system at up to 4,096 processes on the Theta supercomputer at the Argonne National Laboratory.\",\"PeriodicalId\":170832,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW52791.2021.00153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

我们提出了一个可扩展的并行I/O系统，用于构建在演绎数据库之上的逻辑推理应用程序。演绎数据库可以根据一组程序规则，从数据库中已经存在的事实中推导出逻辑演绎(即推断出额外的事实)。Datalog是一种或一系列语言，通常用于为演绎数据库指定规则和查询。使用Datalog构建的应用程序可以从图挖掘(例如计算传递闭包或k-cliques)到程序分析(控制和数据流分析)。在我们之前的论文中，我们介绍了第一个使用MPI构建的数据并行Datalog的实现。在本文中，我们提出了一个并行I/O系统，用于检查点和重新启动构建在Datalog系统之上的应用程序。最先进的Datalog实现，如souffl，只支持串行I/O，主要是因为实现本身不支持多节点并行执行。计算图的传递闭包是使用Datalog构建的最简单的逻辑推理应用程序之一;我们使用它作为微基准来演示并行I/O系统的效率。在内部，我们使用嵌套的b树数据结构来促进对关系数据的快速和有效的内存访问。因此，我们的I/O系统涉及两个步骤，将应用程序数据布局(一个嵌套的b树)转换为字节流，然后是实际的并行I/O。我们将探讨两种流行的I/O技术POSIX I/O和MPI集体I/O。为了从MPI Collective I/O中提取性能，我们使用自适应条带，对于POSIX I/O，我们使用每进程文件I/O。我们在阿贡国家实验室的Theta超级计算机上展示了我们系统的可扩展性，最多可达4,096个进程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring MPI Collective I/O and File-per-process I/O for Checkpointing a Logical Inference Task

We present a scalable parallel I/O system for a logical-inferencing application built atop a deductive database. Deductive databases can make logical deductions (i.e. conclude additional facts), based on a set of program rules, derived from facts already in the database. Datalog is a language or family of languages commonly used to specify rules and queries for a deductive database. Applications built using Datalog can range from graph mining (such as computing transitive closure or k-cliques) to program analysis (control and data-flow analysis). In our previous papers, we presented the first implementation of a data-parallel Datalog built using MPI. In this paper, we present a parallel I/O system used to checkpoint and restart applications built on top of our Datalog system. State of the art Datalog implementations, such as Soufflé, only support serial I/O, mainly because the implementation itself does not support many-node parallel execution.Computing the transitive closure of a graph is one of the simplest logical-inferencing applications built using Datalog; we use it as a micro-benchmark to demonstrate the efficacy of our parallel I/O system. Internally, we use a nested B-tree data-structure to facilitate fast and efficient in-memory access to relational data. Our I/O system therefore involves two steps, converting the application data-layout (a nested B-tree) to a stream of bytes followed by the actual parallel I/O. We explore two popular I/O techniques POSIX I/O and MPI collective I/O. For extracting performance out of MPI Collective I/O we use adaptive striping, and for POSIX I/O we use file-per-process I/O. We demonstrate the scalability of our system at up to 4,096 processes on the Theta supercomputer at the Argonne National Laboratory.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量