{"title":"TripleID-C: Low Cost Compressed Representation for RDF Query Processing in GPUs","authors":"C. Phongpensri, Pisit Makpaisit","doi":"10.1145/3149457.3155322","DOIUrl":null,"url":null,"abstract":"Resource Description Framework (RDF) is a standard format for representing information linkage around the Internet. It uses Internationalized Resources Identifier (IRI) which refers to an external information. Typically, an RDF data is serialized as a large text file which contains millions of relationships. This paper proposes a compact representation for a query processing, called TripleID-C, for large RDF data processing in Graphic Processing Units (GPU). The representation is based on TripleID which is converted from RDF data format. Then TripleID format is converted to TripleID-C which is derived from either compressed rows or compressed column format. TripleID-C is a compressed format whose size is only 5-10% of the traditional NT file, and is about 20-30% of the traditonal TripleID and is about 50-60% of original HDT. We also address how to speedup the conversion process by adjusting data structure usages and using multithreads, where the conversion process can run faster by 30 times compared to the original one.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3149457.3155322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Resource Description Framework (RDF) is a standard format for representing information linkage around the Internet. It uses Internationalized Resources Identifier (IRI) which refers to an external information. Typically, an RDF data is serialized as a large text file which contains millions of relationships. This paper proposes a compact representation for a query processing, called TripleID-C, for large RDF data processing in Graphic Processing Units (GPU). The representation is based on TripleID which is converted from RDF data format. Then TripleID format is converted to TripleID-C which is derived from either compressed rows or compressed column format. TripleID-C is a compressed format whose size is only 5-10% of the traditional NT file, and is about 20-30% of the traditonal TripleID and is about 50-60% of original HDT. We also address how to speedup the conversion process by adjusting data structure usages and using multithreads, where the conversion process can run faster by 30 times compared to the original one.