Shikha Singh, S. Madaminov, M. A. Bender, M. Ferdman, Ryan Johnson, Benjamin Moseley, H. Ngo, D. Nguyen, Soeren Olesen, R. Stirewalt, Geoffrey Washburn
{"title":"A Scheduling Approach to Incremental Maintenance of Datalog Programs","authors":"Shikha Singh, S. Madaminov, M. A. Bender, M. Ferdman, Ryan Johnson, Benjamin Moseley, H. Ngo, D. Nguyen, Soeren Olesen, R. Stirewalt, Geoffrey Washburn","doi":"10.1109/IPDPS47924.2020.00093","DOIUrl":null,"url":null,"abstract":"In this paper, we study the problem of incremental maintenance of Datalog programs and model it as a scheduling problem on DAGs. We design provably good time- and memory-efficient scheduling algorithms for (re)executing a Datalog program where some (but not necessarily all) of the inputs have changed. We prove that our schedulers, called LevelBased and LevelBased with lookahead, have asymptotically improved running time and space efficiency when compared with benchmark algorithms used in production at LogicBlox.The main result of the paper is a hybrid scheduler, which combines LevelBased with the production LogicBlox scheduler (or any other heuristic scheduler). The hybrid scheduler achieves strong worst-case guarantees and robustness without losing out on the best-case behavior of the production LogicBlox scheduler. Our experiments show that the hybrid scheduler results in similar or improved total execution times compared to LogicBlox scheduler, while consistently reducing the scheduling overhead—by as much as 50% on some datasets. This hybrid scheme requires little to no overhead but provides predictability and reliability, which are crucial in a commercial application such as LogicBlox.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"23 1","pages":"864-873"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we study the problem of incremental maintenance of Datalog programs and model it as a scheduling problem on DAGs. We design provably good time- and memory-efficient scheduling algorithms for (re)executing a Datalog program where some (but not necessarily all) of the inputs have changed. We prove that our schedulers, called LevelBased and LevelBased with lookahead, have asymptotically improved running time and space efficiency when compared with benchmark algorithms used in production at LogicBlox.The main result of the paper is a hybrid scheduler, which combines LevelBased with the production LogicBlox scheduler (or any other heuristic scheduler). The hybrid scheduler achieves strong worst-case guarantees and robustness without losing out on the best-case behavior of the production LogicBlox scheduler. Our experiments show that the hybrid scheduler results in similar or improved total execution times compared to LogicBlox scheduler, while consistently reducing the scheduling overhead—by as much as 50% on some datasets. This hybrid scheme requires little to no overhead but provides predictability and reliability, which are crucial in a commercial application such as LogicBlox.
本文研究了数据表程序的增量维护问题,并将其建模为dag上的调度问题。我们设计了可证明的时间和内存效率高的调度算法,用于(重新)执行一些(但不一定是全部)输入发生变化的Datalog程序。我们证明了我们的调度器,称为LevelBased和LevelBased with forward,与LogicBlox生产中使用的基准算法相比,已经逐渐提高了运行时间和空间效率。本文的主要成果是一个混合调度器,它结合了LevelBased和生产LogicBlox调度器(或任何其他启发式调度器)。混合调度器实现了强大的最坏情况保证和健壮性,而不会失去生产LogicBlox调度器的最佳情况行为。我们的实验表明,与LogicBlox调度器相比,混合调度器的总执行时间相似或改进,同时在某些数据集上始终如一地减少调度开销,最多可减少50%。这种混合方案几乎不需要任何开销,但提供了可预测性和可靠性,这在LogicBlox等商业应用程序中是至关重要的。