Removal mechanism of redundant blank nodes in linked data

2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA) Pub Date : 2018-05-01 DOI:10.1109/ICIEA.2018.8397878

Lu Yang, Li Huang, Haichuan Lu, Fangfang Xu

{"title":"Removal mechanism of redundant blank nodes in linked data","authors":"Lu Yang, Li Huang, Haichuan Lu, Fangfang Xu","doi":"10.1109/ICIEA.2018.8397878","DOIUrl":null,"url":null,"abstract":"In the development of the semantic web, blank nodes (also called anonymous nodes or anonymous resources) are a significant factor in the data redundancy. Blank nodes are RDF nodes of the graphs which are not URI identifies. And they are convenient for those resources which are complex and not URI identifies but have property structures. It is right because blank nodes have no URI identifies that different people may create different blank nodes for the same anonymous resources which caused the huge information redundancy. A method is proposed in this paper, first, according to the features of blank nodes, detect the blank nodes, and then, dictionary the triples of the RDF graph, for blank nodes, expressed in negative, which is convenient to query the triples containing blank nodes. Then according to the mining rules of linked data, all the S-Models (The triples set which uses as subject), O-Model (The triples set which use o as object) and B-Blanks (Blank node collection) of the RDF graph can be constructed. Traverse the B — Blanks collection, and remove the redundancy of SB-Model (The triples set which use blank node b as subject) and OB-Model (The triples set which use blank node b as object). Experimental results show that the proposed blank node detection method is very efficient. And the efficiency of compression and storage is improved based on the processed RDF file. Experiments show that the use of dictionary-based correlation data to detect and remove the blank node greatly improves the operating efficiency. In this paper, the detection of the blank nodes based on the correlation data is only based on the representation of the blank nodes in the triplet. It can not detect the blank nodes for the data in the common RDF chart format. The blank nodes based on the correlation data For now, only for the simple RDF data model, the SOBM algorithm proposed in this paper can not remove the blank nodes well. So the next work mainly includes: (1) Perfecting the method of detecting blank nodes to better support RDF data format of various structures; (2) Perfecting the algorithm of removing blank nodes to make it better to deal with Blank node chain problem.","PeriodicalId":140420,"journal":{"name":"2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIEA.2018.8397878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In the development of the semantic web, blank nodes (also called anonymous nodes or anonymous resources) are a significant factor in the data redundancy. Blank nodes are RDF nodes of the graphs which are not URI identifies. And they are convenient for those resources which are complex and not URI identifies but have property structures. It is right because blank nodes have no URI identifies that different people may create different blank nodes for the same anonymous resources which caused the huge information redundancy. A method is proposed in this paper, first, according to the features of blank nodes, detect the blank nodes, and then, dictionary the triples of the RDF graph, for blank nodes, expressed in negative, which is convenient to query the triples containing blank nodes. Then according to the mining rules of linked data, all the S-Models (The triples set which uses as subject), O-Model (The triples set which use o as object) and B-Blanks (Blank node collection) of the RDF graph can be constructed. Traverse the B — Blanks collection, and remove the redundancy of SB-Model (The triples set which use blank node b as subject) and OB-Model (The triples set which use blank node b as object). Experimental results show that the proposed blank node detection method is very efficient. And the efficiency of compression and storage is improved based on the processed RDF file. Experiments show that the use of dictionary-based correlation data to detect and remove the blank node greatly improves the operating efficiency. In this paper, the detection of the blank nodes based on the correlation data is only based on the representation of the blank nodes in the triplet. It can not detect the blank nodes for the data in the common RDF chart format. The blank nodes based on the correlation data For now, only for the simple RDF data model, the SOBM algorithm proposed in this paper can not remove the blank nodes well. So the next work mainly includes: (1) Perfecting the method of detecting blank nodes to better support RDF data format of various structures; (2) Perfecting the algorithm of removing blank nodes to make it better to deal with Blank node chain problem.

查看原文本刊更多论文

链接数据中冗余空白节点的去除机制

在语义web的发展中，空白节点(也称为匿名节点或匿名资源)是影响数据冗余的一个重要因素。空白节点是图的RDF节点，它们不是URI标识。它们对于那些复杂且没有URI标识但具有属性结构的资源很方便。由于空白节点没有URI标识，不同的人可能会对相同的匿名资源创建不同的空白节点，从而造成巨大的信息冗余。本文提出了一种方法，首先根据空白节点的特征，对空白节点进行检测，然后对RDF图的三元组进行字典化，对于空白节点，用负表示，方便查询包含空白节点的三元组。然后根据关联数据的挖掘规则，构造出RDF图的所有S-Models(以o为主体的三元组集合)、o - model(以o为对象的三元组集合)和B-Blanks(空白节点集合)。遍历B- Blanks集合，并删除SB-Model(使用空白节点B作为主题的三元组集合)和OB-Model(使用空白节点B作为对象的三元组集合)的冗余。实验结果表明，所提出的空白节点检测方法是非常有效的。通过对RDF文件的处理，提高了压缩和存储的效率。实验表明，利用基于字典的相关数据来检测和去除空白节点，大大提高了操作效率。在本文中，基于相关数据的空白节点检测仅基于空白节点在三元组中的表示。它不能检测通用RDF图格式数据的空白节点。目前，仅针对简单的RDF数据模型，本文提出的SOBM算法不能很好地去除空白节点。因此下一步的工作主要包括:(1)完善空白节点的检测方法，更好地支持各种结构的RDF数据格式;(2)完善空白节点移除算法，使其更好地处理空白节点链问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA)

自引率

0.00%

发文量