Enabling RETE Algorithm for RDFS Reasoning on Apache Spark

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2) Pub Date : 2018-11-01 DOI:10.1109/SC2.2018.00028

H. Ju, Sangyoon Oh

引用次数: 2

Abstract

Semantic web technology has been used to help various software, including Intelligence Personal Assistant, by acquiring new data or understanding the knowledge through relations between data. However, it is hard to apply the current semantic web schemes such as RDFS reasoning to the real world data because of huge volume of data need to be processed. In this study, we design and enable RDFS reasoning with RETE algorithm on Apache Spark in parallel fashion. In addition, we apply rule sequence optimization ordering from existing studies to enhance the processing performance. From the empirical experiment results, we verified that the implementation of our design shows a strong scalability. However, the current naïve approach of using Spark provided distinct function to deduplicate data should be improved to yield a better processing performance. In future studies, we will study further to find new deduplication method.

查看原文本刊更多论文

在Apache Spark上启用RETE算法进行RDFS推理

语义网技术已被用于帮助各种软件，包括智能个人助理，通过获取新的数据或通过数据之间的关系来理解知识。然而，由于需要处理大量的数据，目前的语义web方案(如RDFS推理)很难应用于现实世界的数据。在本研究中，我们以并行方式在Apache Spark上设计并启用了使用RETE算法的RDFS推理。此外，我们还应用已有研究中的规则序列优化排序来提高处理性能。从实证实验结果来看，我们验证了我们设计的实现具有较强的可扩展性。但是，目前使用Spark提供不同功能来重复数据删除的naïve方法应该得到改进，以获得更好的处理性能。在今后的研究中，我们将进一步研究寻找新的重复数据删除方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

自引率

0.00%

发文量