Experimental framework for searching large RDF on GPUs based on key-value storage

The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2013-05-29 DOI:10.1109/JCSSE.2013.6567340

Chidchanok Choksuchat, C. Chantrapornchai

{"title":"Experimental framework for searching large RDF on GPUs based on key-value storage","authors":"Chidchanok Choksuchat, C. Chantrapornchai","doi":"10.1109/JCSSE.2013.6567340","DOIUrl":null,"url":null,"abstract":"Resource Description Framework (RDF) is commonly used for the semantic web query. During this decade, due to big data processing, the large numbers of RDF triples are crawled. The triples usually stored distributed on the clouds storage or the large clusters. To search for the query answer, it is usually difficult to handle the search across platforms. Also, the search takes a long executed time. Thus, the data representation and platform are important to speedup the search and handle the heterogeneousness. In this paper, we present the experimental framework which can be used to handle the search of RDF data in GPU clusters. Our framework uses the Java platform to manipulate the semantic query while using JCuda1 to perform the GPU processing. Apache Cassandra storage, known as CumulusRDF, is used to store key-values for searching. In the experiments, DBpedia and Freebase dataset are extracted and manipulated. The triple structures are transformed and loaded into Apache Cassandra storage as CumulusRDF's flat layout. The subject-predicate-object keys are kept in CQL caching. There are about 3-hundred-million tags that can be handled with in one machine, which can reduce time, with an inexpensive cost. We shape the data grid from row-major-ordering of Java, to GPU thread grid of CUDA, retrieved keys to join for finding the correspondence of the RDF graph.","PeriodicalId":199516,"journal":{"name":"The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2013.6567340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Resource Description Framework (RDF) is commonly used for the semantic web query. During this decade, due to big data processing, the large numbers of RDF triples are crawled. The triples usually stored distributed on the clouds storage or the large clusters. To search for the query answer, it is usually difficult to handle the search across platforms. Also, the search takes a long executed time. Thus, the data representation and platform are important to speedup the search and handle the heterogeneousness. In this paper, we present the experimental framework which can be used to handle the search of RDF data in GPU clusters. Our framework uses the Java platform to manipulate the semantic query while using JCuda1 to perform the GPU processing. Apache Cassandra storage, known as CumulusRDF, is used to store key-values for searching. In the experiments, DBpedia and Freebase dataset are extracted and manipulated. The triple structures are transformed and loaded into Apache Cassandra storage as CumulusRDF's flat layout. The subject-predicate-object keys are kept in CQL caching. There are about 3-hundred-million tags that can be handled with in one machine, which can reduce time, with an inexpensive cost. We shape the data grid from row-major-ordering of Java, to GPU thread grid of CUDA, retrieved keys to join for finding the correspondence of the RDF graph.

查看原文本刊更多论文

基于键值存储的gpu上大型RDF搜索实验框架

资源描述框架(RDF)是语义web查询的常用框架。在这十年中，由于大数据处理，抓取了大量RDF三元组。三元组通常分布在云存储或大型集群上。要搜索查询答案，通常很难处理跨平台的搜索。而且，搜索需要很长的执行时间。因此，数据表示和平台对于加快搜索速度和处理异构性具有重要意义。本文提出了一个实验框架，该框架可用于处理GPU集群中RDF数据的搜索。我们的框架使用Java平台来操作语义查询，同时使用JCuda1来执行GPU处理。Apache Cassandra存储(称为CumulusRDF)用于存储用于搜索的键值。在实验中，对DBpedia和Freebase数据集进行了提取和处理。这三个结构被转换并作为CumulusRDF的平面布局加载到Apache Cassandra存储中。主题-谓词-对象键保存在CQL缓存中。一台机器可以处理大约三亿个标签，这可以减少时间，成本也不高。我们塑造了数据网格，从Java的行主排序，到CUDA的GPU线程网格，检索键连接以查找RDF图的对应关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量