Moeko Okawara, Junji Fukuhara, M. Takimoto, Tsutomu Kumazawa, Y. Kambayashi
{"title":"Efficient Inductive Logic Programming based on Predictive A*-like Algorithm","authors":"Moeko Okawara, Junji Fukuhara, M. Takimoto, Tsutomu Kumazawa, Y. Kambayashi","doi":"10.54941/ahfe1002934","DOIUrl":null,"url":null,"abstract":"Various machine learning (ML) techniques have been developed widely over the last\n decade. Especially, deep learning (DL) contributes to ML for creating a lot of\n structured data such as tables from unstructured data such as images and sounds. The\n results have led to a lot of successes in engineering, but most of their decisions and\n actions are hard to be explained or verified. On the other hand, as a perfectly\n explainable ML approach, i.e., inductive logic programming (ILP), has been used in data\n mining. ILP, which is based on the first order predicate logic, is one of the symbolic\n approaches that is useful to deal with structured data and the relations between them.\n In a practical sense, we can add the results generated by ILP into given background\n knowledge, and make the knowledge database rich. Thus, ILP becomes more important for\n data mining than before, and we can extract meaningful relations between the structured\n data. However, contrary to DL, it is not easy for ILP to perform a learning process\n efficiently, because we cannot make ILP processes uniformly executable in parallel on\n GPU. One learning process corresponds to an inductive prediction process, where training\n samples correspond to positive and negative examples. In the process, ILP explores\n hypothesis candidates while calculating a cover set that is a set of examples deduced\n from each candidate. Notice that from the finally obtained hypothesis, the positive\n examples should be deduced, and the negative ones should not be deduced with the\n background knowledge. The cover set is known to be uniformly calculated in relational\n operations, which are executed on GPU or a relational database management system (RDBMS)\n such as SQL. Since modern RDBMSs can not only manage memory operations safely but also\n execute SQL in parallel utilizing GPU. Thus, we can partially execute ILP in parallel.\n But the overhead of launching the procedure for each cover set calculation is heavy and\n we cannot ignore the significance of the total overhead. In order to mitigate this\n problem, we propose an extension of the algorithm for searching a hypothesis in Progol,\n which is one of the most popular ILP systems. Progol uses A*-like algorithm for\n searching a hypothesis. The algorithm incrementally refines each hypothesis candidate\n through adding a literal to it, calculating its cover set in order to check whether it\n satisfies the condition as a hypothesis. In our approach, our algorithm simultaneously\n performs several refinements with high possibility as a hypothesis. We call it\n predictive refinement. Even though the refinements may include redundant ones because\n the same hypothesis may be found earlier, the predictive refinement reduces a lot of\n overhead cost for launching the procedure of cover set calculation. Thus our algorithm\n can generate a hypothesis more efficiently than the conventional search algorithm. We\n have extended Progol to implement the predictive refinement and cover set calculation of\n generating hypothesis candidates on PostgreSQL. We have successfully demonstrated that\n our extended Progol works significantly well to obtain practical experimental results.","PeriodicalId":383834,"journal":{"name":"Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial\n Intelligence and Future Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial\n Intelligence and Future Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1002934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Various machine learning (ML) techniques have been developed widely over the last
decade. Especially, deep learning (DL) contributes to ML for creating a lot of
structured data such as tables from unstructured data such as images and sounds. The
results have led to a lot of successes in engineering, but most of their decisions and
actions are hard to be explained or verified. On the other hand, as a perfectly
explainable ML approach, i.e., inductive logic programming (ILP), has been used in data
mining. ILP, which is based on the first order predicate logic, is one of the symbolic
approaches that is useful to deal with structured data and the relations between them.
In a practical sense, we can add the results generated by ILP into given background
knowledge, and make the knowledge database rich. Thus, ILP becomes more important for
data mining than before, and we can extract meaningful relations between the structured
data. However, contrary to DL, it is not easy for ILP to perform a learning process
efficiently, because we cannot make ILP processes uniformly executable in parallel on
GPU. One learning process corresponds to an inductive prediction process, where training
samples correspond to positive and negative examples. In the process, ILP explores
hypothesis candidates while calculating a cover set that is a set of examples deduced
from each candidate. Notice that from the finally obtained hypothesis, the positive
examples should be deduced, and the negative ones should not be deduced with the
background knowledge. The cover set is known to be uniformly calculated in relational
operations, which are executed on GPU or a relational database management system (RDBMS)
such as SQL. Since modern RDBMSs can not only manage memory operations safely but also
execute SQL in parallel utilizing GPU. Thus, we can partially execute ILP in parallel.
But the overhead of launching the procedure for each cover set calculation is heavy and
we cannot ignore the significance of the total overhead. In order to mitigate this
problem, we propose an extension of the algorithm for searching a hypothesis in Progol,
which is one of the most popular ILP systems. Progol uses A*-like algorithm for
searching a hypothesis. The algorithm incrementally refines each hypothesis candidate
through adding a literal to it, calculating its cover set in order to check whether it
satisfies the condition as a hypothesis. In our approach, our algorithm simultaneously
performs several refinements with high possibility as a hypothesis. We call it
predictive refinement. Even though the refinements may include redundant ones because
the same hypothesis may be found earlier, the predictive refinement reduces a lot of
overhead cost for launching the procedure of cover set calculation. Thus our algorithm
can generate a hypothesis more efficiently than the conventional search algorithm. We
have extended Progol to implement the predictive refinement and cover set calculation of
generating hypothesis candidates on PostgreSQL. We have successfully demonstrated that
our extended Progol works significantly well to obtain practical experimental results.