Vito Giovanni Castellana, Antonino Tumeo, Oreste Villa, D. Haglin, J. Feo
{"title":"Composing Data Parallel Code for a SPARQL Graph Engine","authors":"Vito Giovanni Castellana, Antonino Tumeo, Oreste Villa, D. Haglin, J. Feo","doi":"10.1109/SocialCom.2013.104","DOIUrl":null,"url":null,"abstract":"The emergence of petascale triple stores have motivated the investigation of alternates to traditional table-based relational methods. Since triple stores represent data as structured tuples, graphs are a natural data structure for encoding their information. The use of graph data structures, rather than tables, requires us to rethink the methods used to process queries on the store. We are developing a scalable, in-memory SPARQL graph engine that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods, and a custom multithreaded runtime layer for multinode commodity systems. Rather than transforming SPARQL queries into a series of select and join operations on tables, our front end compiles the queries into data parallel C code with calls to graph methods that walk internal data structures, constructing answers in their wake. In this paper, we describe the compilation process and give examples of the generated C code parallelized with OpenMP. We present performance numbers for the SP2Bench SPARQL benchmark queries on a 48-core shared-memory system. With respect to conventional relational database systems such as Virtuoso, our approach uses less memory and provides higher performance.","PeriodicalId":129308,"journal":{"name":"2013 International Conference on Social Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SocialCom.2013.104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The emergence of petascale triple stores have motivated the investigation of alternates to traditional table-based relational methods. Since triple stores represent data as structured tuples, graphs are a natural data structure for encoding their information. The use of graph data structures, rather than tables, requires us to rethink the methods used to process queries on the store. We are developing a scalable, in-memory SPARQL graph engine that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods, and a custom multithreaded runtime layer for multinode commodity systems. Rather than transforming SPARQL queries into a series of select and join operations on tables, our front end compiles the queries into data parallel C code with calls to graph methods that walk internal data structures, constructing answers in their wake. In this paper, we describe the compilation process and give examples of the generated C code parallelized with OpenMP. We present performance numbers for the SP2Bench SPARQL benchmark queries on a 48-core shared-memory system. With respect to conventional relational database systems such as Virtuoso, our approach uses less memory and provides higher performance.