M. Capotă, T. Hegeman, A. Iosup, Arnau Prat-Pérez, O. Erling, P. Boncz
{"title":"Graphalytics: A Big Data Benchmark for Graph-Processing Platforms","authors":"M. Capotă, T. Hegeman, A. Iosup, Arnau Prat-Pérez, O. Erling, P. Boncz","doi":"10.1145/2764947.2764954","DOIUrl":"https://doi.org/10.1145/2764947.2764954","url":null,"abstract":"Graphs are increasingly used in industry, governance, and science. This has stimulated the appearance of many and diverse graph-processing platforms. Although platform diversity is beneficial, it also makes it very challenging to select the best platform for an application domain or one of its important applications, and to design new and tune existing platforms. Continuing a long tradition of using benchmarking to address such challenges, in this work we present our vision for Graphalytics, a big data benchmark for graph-processing platforms. We have already benchmarked with Graphalytics a variety of popular platforms, such as Giraph, GraphX, and Neo4j.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130590094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constructing Bisimulation Summaries on a Multi-Core Graph Processing Framework","authors":"S. Khatchadourian, M. Consens","doi":"10.1145/2764947.2764955","DOIUrl":"https://doi.org/10.1145/2764947.2764955","url":null,"abstract":"Bisimulation summaries of graph data have multiple applications, including facilitating graph exploration and enabling query optimization techniques, but efficient, scalable, summary construction is challenging. The literature describes parallel construction algorithms using message-passing, and these have been recently adapted to MapReduce environments. The fixpoint nature of bisimulation is well suited to iterative graph processing, but the existing MapReduce solutions do not drastically decrease per-iteration times as the computation progresses. In this paper, we focus on leveraging parallel multi-core graph frameworks with the goal of constructing summaries in roughly the same amount of time that it takes to input the data into the framework (for a range of real world data graphs) and output the summary. To achieve our goal we introduce a singleton optimization that significantly reduces per-iteration times after only a few iterations. We present experimental results validating that our scalable GraphChi implementation achieves our goal with bisimulation summaries of million to billion edge graphs.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126458727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oshini Goonetilleke, Saket K. Sathe, T. Sellis, Xiuzhen Zhang
{"title":"Microblogging Queries on Graph Databases: An Introspection","authors":"Oshini Goonetilleke, Saket K. Sathe, T. Sellis, Xiuzhen Zhang","doi":"10.1145/2764947.2764952","DOIUrl":"https://doi.org/10.1145/2764947.2764952","url":null,"abstract":"Microblogging data is growing at a rapid pace. This poses new challenges to the data management systems, such as graph databases, that are typically suitable for analyzing such data. In this paper, we share our experience on executing a wide variety of micro-blogging queries on two popular graph databases: Neo4j and Sparksee. Our queries are designed to be relevant to popular applications of micro-blogging data. The queries are executed on a large real graph data set comprising of nearly 50 million nodes and 326 million edges.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132450749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-Scale BSP Graph Processing in Distributed Non-Volatile Memory","authors":"T. Nito, Yoshiko Nagasaka, H. Uchigaito","doi":"10.1145/2764947.2764949","DOIUrl":"https://doi.org/10.1145/2764947.2764949","url":null,"abstract":"Processing large graphs is becoming increasingly important for many domains. Large-scale graph processing requires a large-scale cluster system, which is very expensive. Thus, for high-performance large-scale graph processing in small clusters, we have developed bulk synchronous parallel graph processing in distributed non-volatile memory that has lower bit cost, lower power consumption, and larger capacity than DRAM. When non-volatile memory is used, accessing non-volatile memory is a performance bottleneck because accesses to non-volatile memory are fine-grained random accesses and non-volatile memory has much larger latency than DRAM. Thus, we propose non-volatile memory group access method and the implementation for using non-volatile memory efficiently. Proposed method and implementation improve the access performance to non-volatile memory by changing fine-grained random accesses to random accesses the same size as a non-volatile memory page and hiding non-volatile memory latency with pipelining. An evaluation indicated that the proposed graph processing can hide the latency of non-volatile memory and has the proportional performance to non-volatile memory bandwidth. When non-volatile memory read/write mixture bandwidth is 4.2 GB/sec, the performance of proposed graph processing and the performance storing all data in main memory have the same order of magnitude (46%). In addition, the proposed graph processing had scalable performance for any number of nodes. The proposed method and implementation can process 125 times bigger graph than a DRAM-only system.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124879672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frappé: Querying the Linux Kernel Dependency Graph","authors":"Nathan Hawes, Ben Barham, C. Cifuentes","doi":"10.1145/2764947.2764951","DOIUrl":"https://doi.org/10.1145/2764947.2764951","url":null,"abstract":"Frappé is a developer tool for querying and visualizing the dependencies of large C/C++ software systems to the order of 10s of millions of lines of code in size. It supports developers with a range of code comprehension queries such as Does function X or something it calls write to global variable Y? and How much code could be affected if I change this macro? Results are overlaid on a visualization of the dependency graph data based on a cartographic map metaphor. In this paper, we give a brief overview of Frappé and describe our experiences implementing it on top of the Neo4j graph database. We detail the graph model used by Frappé and outline its key use cases using representative queries and their runtimes with the dependency graph data of the Unbreakable Enterprise Kernel. Finally, we discuss some of the open challenges in supporting source code queries across single and multiple versions of an evolving codebase with current property graph database technologies: performance, efficient storage, and the expressivity of the graph querying language given a graph model.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130326329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Partitioning of Multi-Labeled Graphs","authors":"Ioanna Filippidou, Y. Kotidis","doi":"10.1145/2764947.2764950","DOIUrl":"https://doi.org/10.1145/2764947.2764950","url":null,"abstract":"Graph partitioning is an old problem that is finding renewed interest in the era of big, complex datasets and parallel computing frameworks that can benefit from a proper partitiong of big graph data across multiple nodes in a cluster. In this paper we look into a specific instance of the problem termed online graph partitioning that addresses the need to partition large graphs that do not fit in main memory. A neglected aspect of modern graph datasets is that real graphs have labels! Node labels may, for instance, correspond to categorical attributes (such as country, profession, participating groups, etc.) of the entities depicted by the vertices of the graph. Edge labels may represent different relationship types (e.g. \"friend-of\", \"likes\", etc.). In this work we first revisit the formulation of the graph partitioning problem for graphs with labels on both nodes and edges. We introduce \"relation-cut\", as a new metric that extends the traditional \"edge-cut\" metric used in graph partitioning in order to take into account the existence of different edge-types. Then, we combine this metric with a novel \"label-cut\" metric that takes into consideration the displacement of related nodes with similar labels across partitions. In our experiments we adapt two recent online partitioning algorithms for the new proposed metric and provide a thorough evaluation on a variety of real and synthetic graphs. Our experiments demonstrate that the proposed technique balances the generated cuts on both relations and labels on the resulting partitions.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127876007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Graph Structure of Wikipedia for Query Expansion","authors":"Joan Guisado-Gámez, Arnau Prat-Pérez","doi":"10.1145/2764947.2764953","DOIUrl":"https://doi.org/10.1145/2764947.2764953","url":null,"abstract":"Knowledge bases are very good sources for knowledge extraction, the ability to create knowledge from structured and unstructured sources and use it to improve automatic processes as query expansion. However, extracting knowledge from unstructured sources is still an open challenge [9]. In this respect, understanding the structure of knowledge bases can provide significant benefits for the effectiveness of such purpose. In particular, Wikipedia has become a very popular knowledge base in the last years because it is a general encyclopedia that has a large amount of information and thus, covers a large amount of different topics. In this piece of work, we analyze how articles and categories of Wikipedia relate to each other and how these relationships can support a query expansion technique. In particular, we show that the structures in the form of dense cycles with a minimum amount of categories tend to identify the most relevant information.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131930461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Nguyen, M. Aref, Martin Bravenboer, G. Kollias, H. Ngo, C. Ré, A. Rudra
{"title":"Join Processing for Graph Patterns: An Old Dog with New Tricks","authors":"D. Nguyen, M. Aref, Martin Bravenboer, G. Kollias, H. Ngo, C. Ré, A. Rudra","doi":"10.1145/2764947.2764948","DOIUrl":"https://doi.org/10.1145/2764947.2764948","url":null,"abstract":"Join optimization has been dominated by Selinger-style, pairwise optimizers for decades. But, Selinger-style algorithms are asymptotically suboptimal for applications in graphic analytics. This sub-optimality is one of the reasons that many have advocated supplementing relational engines with specialized graph processing engines. Recently, new join algorithms have been discovered that achieve optimal worst-case run times for any join or even so-called beyond worst-case (or instance optimal) run time guarantees for specialized classes of joins. These new algorithms match or improve on those used in specialized graph-processing systems. This paper asks can these new join algorithms allow relational engines to close the performance gap with graph engines? We examine this question for graph-pattern queries or join queries. We find that classical relational databases like Postgres and MonetDB or newer graph databases/stores like Virtuoso and Neo4j may be orders of magnitude slower than these new approaches compared to a fully featured RDBMS, LogicBlox, using these new ideas. Our results demonstrate that an RDBMS with such new algorithms can perform as well as specialized engines like GraphLab -- while retaining a high-level interface. We hope our work adds to the ongoing debate of the role of graph accelerators, new graph systems, and relational systems in modern workloads.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129585914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}