{"title":"P-Spar(k)ql:基于并行查询计划的Spark GraphX的SPARQL评估方法","authors":"G. Gombos, A. Kiss","doi":"10.1109/FiCloud.2017.48","DOIUrl":null,"url":null,"abstract":"The Semantic Data are built from triples, that contain subjects, predicates and objects. On the other hand we can consider the triples as edges. The subject and the object are the nodes and the predicate is the label of the edge. In this view the Semantic Data define a graph. This graph can be very large, because a Semantic Dataset contains millions of triples. To query this dataset we can use the SPARQL query language. Since the Big Data tools appeared the researchers try to evaluate the SPARQL with that tools. In the last few year the distributed graph analytic tools appeared too. So the challenge is: use the graph analytic tools to evaluate the semantic query on the semantic graph. In this paper we present the PSparkql that extends the Sparkql with parallel query plan. The system uses the Spark GraphX distributed graph analytic tool. We show less edges enough for the evaluation than the Sparkql is using. We also collect some statistics (number of predicates, data properties) about the graph to change the evaluation order of the SPARQL query. We compare our results with related works: the Sparkql and the S2X.","PeriodicalId":115925,"journal":{"name":"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)","volume":"1999 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"P-Spar(k)ql: SPARQL Evaluation Method on Spark GraphX with Parallel Query Plan\",\"authors\":\"G. Gombos, A. Kiss\",\"doi\":\"10.1109/FiCloud.2017.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Semantic Data are built from triples, that contain subjects, predicates and objects. On the other hand we can consider the triples as edges. The subject and the object are the nodes and the predicate is the label of the edge. In this view the Semantic Data define a graph. This graph can be very large, because a Semantic Dataset contains millions of triples. To query this dataset we can use the SPARQL query language. Since the Big Data tools appeared the researchers try to evaluate the SPARQL with that tools. In the last few year the distributed graph analytic tools appeared too. So the challenge is: use the graph analytic tools to evaluate the semantic query on the semantic graph. In this paper we present the PSparkql that extends the Sparkql with parallel query plan. The system uses the Spark GraphX distributed graph analytic tool. We show less edges enough for the evaluation than the Sparkql is using. We also collect some statistics (number of predicates, data properties) about the graph to change the evaluation order of the SPARQL query. We compare our results with related works: the Sparkql and the S2X.\",\"PeriodicalId\":115925,\"journal\":{\"name\":\"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)\",\"volume\":\"1999 10\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FiCloud.2017.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FiCloud.2017.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
P-Spar(k)ql: SPARQL Evaluation Method on Spark GraphX with Parallel Query Plan
The Semantic Data are built from triples, that contain subjects, predicates and objects. On the other hand we can consider the triples as edges. The subject and the object are the nodes and the predicate is the label of the edge. In this view the Semantic Data define a graph. This graph can be very large, because a Semantic Dataset contains millions of triples. To query this dataset we can use the SPARQL query language. Since the Big Data tools appeared the researchers try to evaluate the SPARQL with that tools. In the last few year the distributed graph analytic tools appeared too. So the challenge is: use the graph analytic tools to evaluate the semantic query on the semantic graph. In this paper we present the PSparkql that extends the Sparkql with parallel query plan. The system uses the Spark GraphX distributed graph analytic tool. We show less edges enough for the evaluation than the Sparkql is using. We also collect some statistics (number of predicates, data properties) about the graph to change the evaluation order of the SPARQL query. We compare our results with related works: the Sparkql and the S2X.