{"title":"An entity based RDF indexing schema using Hadoop and HBase","authors":"F. Abiri, M. Kahani, Fatane Zarinkalam","doi":"10.1109/ICCKE.2014.6993400","DOIUrl":null,"url":null,"abstract":"Recent development of semantic web has opened new research to design search engines which organize and manage semantic data. The core of a search engine is the indexing system which consists of two main parts: data storage and data retrieval. With the increasing amount of semantic data, the most important goal expected from an indexing system is the ability to store large amount of data and retrieve them as fast as possible. In other words, having a scalable indexing system is one of the major challenges in semantic search engines. In this paper, a scalable method is presented to index the RDF data which utilizes HBase database, a NOSQL database management system, as its underlying data storage. HBase provides random access to massive data on the distributed framework of Hadoop, therefore, it can be a proper option for the management of the massive data. Further, due to the importance and popularity of the entity-based queries, a new schema based on a clustering algorithm is designed to effectively respond to this type of queries. The experimental evaluation shows that the proposed indexing system is effective in terms of improving scalability and retrieval of RDF data.","PeriodicalId":152540,"journal":{"name":"2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE.2014.6993400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Recent development of semantic web has opened new research to design search engines which organize and manage semantic data. The core of a search engine is the indexing system which consists of two main parts: data storage and data retrieval. With the increasing amount of semantic data, the most important goal expected from an indexing system is the ability to store large amount of data and retrieve them as fast as possible. In other words, having a scalable indexing system is one of the major challenges in semantic search engines. In this paper, a scalable method is presented to index the RDF data which utilizes HBase database, a NOSQL database management system, as its underlying data storage. HBase provides random access to massive data on the distributed framework of Hadoop, therefore, it can be a proper option for the management of the massive data. Further, due to the importance and popularity of the entity-based queries, a new schema based on a clustering algorithm is designed to effectively respond to this type of queries. The experimental evaluation shows that the proposed indexing system is effective in terms of improving scalability and retrieval of RDF data.