Efficient Inductive Logic Programming based on Predictive A*-like Algorithm

Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial Intelligence and Future Applications Pub Date : 1900-01-01 DOI:10.54941/ahfe1002934

Moeko Okawara, Junji Fukuhara, M. Takimoto, Tsutomu Kumazawa, Y. Kambayashi

{"title":"Efficient Inductive Logic Programming based on Predictive A*-like Algorithm","authors":"Moeko Okawara, Junji Fukuhara, M. Takimoto, Tsutomu Kumazawa, Y. Kambayashi","doi":"10.54941/ahfe1002934","DOIUrl":null,"url":null,"abstract":"Various machine learning (ML) techniques have been developed widely over the last\n decade. Especially, deep learning (DL) contributes to ML for creating a lot of\n structured data such as tables from unstructured data such as images and sounds. The\n results have led to a lot of successes in engineering, but most of their decisions and\n actions are hard to be explained or verified. On the other hand, as a perfectly\n explainable ML approach, i.e., inductive logic programming (ILP), has been used in data\n mining. ILP, which is based on the first order predicate logic, is one of the symbolic\n approaches that is useful to deal with structured data and the relations between them.\n In a practical sense, we can add the results generated by ILP into given background\n knowledge, and make the knowledge database rich. Thus, ILP becomes more important for\n data mining than before, and we can extract meaningful relations between the structured\n data. However, contrary to DL, it is not easy for ILP to perform a learning process\n efficiently, because we cannot make ILP processes uniformly executable in parallel on\n GPU. One learning process corresponds to an inductive prediction process, where training\n samples correspond to positive and negative examples. In the process, ILP explores\n hypothesis candidates while calculating a cover set that is a set of examples deduced\n from each candidate. Notice that from the finally obtained hypothesis, the positive\n examples should be deduced, and the negative ones should not be deduced with the\n background knowledge. The cover set is known to be uniformly calculated in relational\n operations, which are executed on GPU or a relational database management system (RDBMS)\n such as SQL. Since modern RDBMSs can not only manage memory operations safely but also\n execute SQL in parallel utilizing GPU. Thus, we can partially execute ILP in parallel.\n But the overhead of launching the procedure for each cover set calculation is heavy and\n we cannot ignore the significance of the total overhead. In order to mitigate this\n problem, we propose an extension of the algorithm for searching a hypothesis in Progol,\n which is one of the most popular ILP systems. Progol uses A*-like algorithm for\n searching a hypothesis. The algorithm incrementally refines each hypothesis candidate\n through adding a literal to it, calculating its cover set in order to check whether it\n satisfies the condition as a hypothesis. In our approach, our algorithm simultaneously\n performs several refinements with high possibility as a hypothesis. We call it\n predictive refinement. Even though the refinements may include redundant ones because\n the same hypothesis may be found earlier, the predictive refinement reduces a lot of\n overhead cost for launching the procedure of cover set calculation. Thus our algorithm\n can generate a hypothesis more efficiently than the conventional search algorithm. We\n have extended Progol to implement the predictive refinement and cover set calculation of\n generating hypothesis candidates on PostgreSQL. We have successfully demonstrated that\n our extended Progol works significantly well to obtain practical experimental results.","PeriodicalId":383834,"journal":{"name":"Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial\n Intelligence and Future Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial\n Intelligence and Future Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1002934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Various machine learning (ML) techniques have been developed widely over the last decade. Especially, deep learning (DL) contributes to ML for creating a lot of structured data such as tables from unstructured data such as images and sounds. The results have led to a lot of successes in engineering, but most of their decisions and actions are hard to be explained or verified. On the other hand, as a perfectly explainable ML approach, i.e., inductive logic programming (ILP), has been used in data mining. ILP, which is based on the first order predicate logic, is one of the symbolic approaches that is useful to deal with structured data and the relations between them. In a practical sense, we can add the results generated by ILP into given background knowledge, and make the knowledge database rich. Thus, ILP becomes more important for data mining than before, and we can extract meaningful relations between the structured data. However, contrary to DL, it is not easy for ILP to perform a learning process efficiently, because we cannot make ILP processes uniformly executable in parallel on GPU. One learning process corresponds to an inductive prediction process, where training samples correspond to positive and negative examples. In the process, ILP explores hypothesis candidates while calculating a cover set that is a set of examples deduced from each candidate. Notice that from the finally obtained hypothesis, the positive examples should be deduced, and the negative ones should not be deduced with the background knowledge. The cover set is known to be uniformly calculated in relational operations, which are executed on GPU or a relational database management system (RDBMS) such as SQL. Since modern RDBMSs can not only manage memory operations safely but also execute SQL in parallel utilizing GPU. Thus, we can partially execute ILP in parallel. But the overhead of launching the procedure for each cover set calculation is heavy and we cannot ignore the significance of the total overhead. In order to mitigate this problem, we propose an extension of the algorithm for searching a hypothesis in Progol, which is one of the most popular ILP systems. Progol uses A*-like algorithm for searching a hypothesis. The algorithm incrementally refines each hypothesis candidate through adding a literal to it, calculating its cover set in order to check whether it satisfies the condition as a hypothesis. In our approach, our algorithm simultaneously performs several refinements with high possibility as a hypothesis. We call it predictive refinement. Even though the refinements may include redundant ones because the same hypothesis may be found earlier, the predictive refinement reduces a lot of overhead cost for launching the procedure of cover set calculation. Thus our algorithm can generate a hypothesis more efficiently than the conventional search algorithm. We have extended Progol to implement the predictive refinement and cover set calculation of generating hypothesis candidates on PostgreSQL. We have successfully demonstrated that our extended Progol works significantly well to obtain practical experimental results.

查看原文本刊更多论文

基于预测类A*算法的高效归纳逻辑规划

在过去十年中，各种机器学习(ML)技术得到了广泛的发展。特别是，深度学习(DL)有助于ML从图像和声音等非结构化数据中创建大量结构化数据(如表)。结果导致了许多工程上的成功，但他们的大多数决策和行动很难解释或验证。另一方面，作为一种完全可解释的机器学习方法，即归纳逻辑编程(ILP)已被用于数据挖掘。基于一阶谓词逻辑的ILP是处理结构化数据及其相互关系的一种符号方法。从实际意义上讲，我们可以将ILP生成的结果添加到给定的背景知识中，使知识库丰富。因此，ILP在数据挖掘中变得比以前更重要，我们可以提取结构化数据之间有意义的关系。然而，与DL相反，ILP不容易有效地执行学习过程，因为我们无法使ILP进程在GPU上并行执行。一个学习过程对应于一个归纳预测过程，其中训练样本对应于正例和负例。在这个过程中，ILP探索候选假设，同时计算一个覆盖集，即从每个候选假设中推断出的一组示例。注意，从最后得到的假设中，应该推导出正例，而不应该用背景知识推导出负例。众所周知，覆盖集在关系操作中是统一计算的，这些操作在GPU或关系型数据库管理系统(RDBMS)(如SQL)上执行。由于现代rdbms不仅可以安全地管理内存操作，而且还可以利用GPU并行执行SQL。因此，我们可以部分地并行执行ILP。但是，启动每个覆盖集计算过程的开销很大，我们不能忽视总开销的重要性。为了缓解这一问题，我们提出了在最流行的ILP系统之一Progol中搜索假设的扩展算法。Progol使用类似A*的算法来搜索假设。该算法通过向每个候选假设添加文字来逐步细化每个候选假设，计算其覆盖集，以检查其是否满足作为假设的条件。在我们的方法中，我们的算法同时执行一些高可能性的改进作为假设。我们称之为预测性改进。尽管由于可能较早地发现相同的假设，这些改进可能包含冗余的改进，但预测性改进减少了启动覆盖集计算过程的大量开销。因此，我们的算法可以比传统的搜索算法更有效地生成假设。我们扩展了Progol来实现在PostgreSQL上生成候选假设的预测细化和覆盖集计算。我们已经成功地证明了我们的扩展方案可以很好地工作，并获得了实际的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial Intelligence and Future Applications

自引率

0.00%

发文量