PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567640
N. Dasari, D. Ranjan, M. Zubair
{"title":"Maximal clique enumeration for large graphs on hadoop framework","authors":"N. Dasari, D. Ranjan, M. Zubair","doi":"10.1145/2567634.2567640","DOIUrl":"https://doi.org/10.1145/2567634.2567640","url":null,"abstract":"Maximal clique enumeration (MCE) problem for very large graphs appears in many critical applications such as community detection in social networks, aligning 3D protein sequences, finding motifs in genomic data, identifying co-expressed genes and data analytics in communication networks. It is not unusual to have graphs of billions of nodes and edges in these applications. The MCE problem is NP hard, but a number of algorithms both sequential and parallel have been proposed that work efficiently for real graphs. In addition to the large sizes of the input graphs, the MCE algorithms in general result in large intermediate data making it even more challenging to efficiently process the data. Recently an approach has been proposed, referred to as pbitMCE, which is shown to outperform or perform equally well compared to the existing approaches. The approach uses degeneracy ordering of vertices which plays a vital role in the performance of the algorithm. Degeneracy ordering of vertices can be generated in linear time. However it is challenging to find the degeneracy ordering in a distributed environment as it requires extensive communication between the nodes. In some cases generating the ordering can take a significant amount of time. In such cases a different ordering such as ordering by degree can be a better choice than the degeneracy ordering. In this paper we experimentally study the impact of various ordering of vertices on the performance of an MCE algorithm in the context of mapreduce framework. We present an implementation of pbitMCE using mapreduce that takes a large graph and an ordering of vertices as input and enumerates all the maximal cliques. To support the study, we present the experimental results on various graphs using different orderings. The results show that the degree ordering performs comparable to the degeneracy ordering in most cases while it performs poorer in the case of large graphs.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567648
K. Pingali
{"title":"High-speed graph analytics with the galois system","authors":"K. Pingali","doi":"10.1145/2567634.2567648","DOIUrl":"https://doi.org/10.1145/2567634.2567648","url":null,"abstract":"The Galois project at UT Austin has developed a high-level programming model and a lightweight parallel execution engine that enable application writers to write and tune complex parallel applications at a high level of abstraction.\u0000 This talk describes the experiences of our group and of our industrial collaborators in using the Galois system for \"big data\" graph analytics. We show that (i) the rich programming model of Galois enables application programmers to write sophisticated graph analytics algorithms that cannot be expressed directly in current graph analytics DSLs, (ii) even when the same algorithm is used, the lightweight execution engine permits Galois programs to run much faster than programs in other DSLs, and (iii) the APIs of most current graph analytics DSLs can be implemented on top of the Galois system in a few hundred lines of code.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125409178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567647
R. Bond
{"title":"Graphs & networks: computing and analytics at lincoln laboratory","authors":"R. Bond","doi":"10.1145/2567634.2567647","DOIUrl":"https://doi.org/10.1145/2567634.2567647","url":null,"abstract":"Over the last decade, Lincoln Laboratory has been conducting research and development in graph and network analytics, and the computing architectures that support these analytics as data sets scale to million-node graphs and beyond. This talk gives a brief introduction to the application domains that can exploit graph analytics, and the \"big data\" computational challenges inherent in these applications. The talk then presents a computational framework for graph and network analytics based on spectral analysis methods. To support the programming and computational needs of large scale graphs and complex analytics, Lincoln Laboratory has also been developing specialized programming languages and computing architectures. Recent advances in these areas and future directions are discussed.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133127189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567636
K. Bakanov, I. Spence, H. Vandierendonck, C. Gillan
{"title":"Rigorous specification and low-latency implementation of technical market indicators","authors":"K. Bakanov, I. Spence, H. Vandierendonck, C. Gillan","doi":"10.1145/2567634.2567636","DOIUrl":"https://doi.org/10.1145/2567634.2567636","url":null,"abstract":"Technical market indicators are tools used by technical analysts to understand trends in trading markets. Technical (market) indicators are often calculated in real-time, as trading progresses. This paper presents a mathematically-founded framework for calculating technical indicators. Our framework consists of a domain specific language for the unambiguous specification of technical indicators, and a runtime system based on Click, for computing the indicators. We argue that our solution enhances the ease of programming due to aligning our domain-specific language to the mathematical description of technical indicators, and that it enables executing programs in kernel space for decreased latency, without exposing the system to users' programming errors.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127042537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567645
E. Baranoski
{"title":"Future directions in analytic applications","authors":"E. Baranoski","doi":"10.1145/2567634.2567645","DOIUrl":"https://doi.org/10.1145/2567634.2567645","url":null,"abstract":"Analytic applications to tackle the \"big data\" problem are pervasive in all fields. The breadth and utility of these applications are growing far faster than the rate of Moore's Law, and are particularly stressing the electrical power to host such applications. This talk will sample some of the analytical research programs being performed at IARPA, as well as some thoughts on future processor implications to support these applications.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125575403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567635
Oded Green, Lluís-Miquel Munguía, David A. Bader
{"title":"Load balanced clustering coefficients","authors":"Oded Green, Lluís-Miquel Munguía, David A. Bader","doi":"10.1145/2567634.2567635","DOIUrl":"https://doi.org/10.1145/2567634.2567635","url":null,"abstract":"Clustering coefficients is a building block in network sciences that offers insights on how tightly bound vertices are in a network. Effective and scalable parallelization of clustering coefficients requires load balancing amongst the cores. This property is not easy to achieve since many real world networks are scale free, which leads to some vertices requiring more attention than others. In this work we show two scalable approaches that load balance clustering coefficients. The first method achieves optimal load balancing with an Ο(|E|) storage requirement. The second method has a lower storage requirement of Ο(|V|) at the cost of some imbalance. While both methods have a similar time complexity, they represent a tradeoff between maintaining a balanced workload and memory complexity. Using a 40-core system we show that our load balancing techniques outperform the widely used and simple parallel approach by a factor of 3X-7.5X for real graphs and 1.5X-4X for random graphs. Further, we achieve 25X-35X speedup over the sequential algorithm for most of the graphs.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126703232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567637
Yanwei Zhang, Qing Liu, S. Klasky, M. Wolf, K. Schwan, G. Eisenhauer, J. Choi, N. Podhorszki
{"title":"Active workflow system for near real-time extreme-scale science","authors":"Yanwei Zhang, Qing Liu, S. Klasky, M. Wolf, K. Schwan, G. Eisenhauer, J. Choi, N. Podhorszki","doi":"10.1145/2567634.2567637","DOIUrl":"https://doi.org/10.1145/2567634.2567637","url":null,"abstract":"In recent years, streaming-based data processing has been gaining substantial traction for dealing with overwhelming data generated by real-time applications, from both enterprise sources and scientific computing. In this work, however, we look at an emerging class of scientific data with Near Real-Time (NRT) requirement, in which data is typically generated in a bursty fashion with the near real-time constraints being applied primarily between bursts, rather than within a stream. A key challenge for this types of data sources is that the processing time per data element is not uniform, and not always feasible to predict. Given the observations on the increasing unpredictability of compute load and system dynamics, this work looks to adapt streaming-based approach to the context of this new class of large experiments and simulations that have complex run-time control and analysis issues.\u0000 In particular, we deploy a novel two-tier scheme for handling the increasing unpredictability of runtime behaviors: Instead of relying on determining what and where to run the scientific workflows beforehand or partial dynamically, the decision will also be adaptively enhanced online according to system runtime status. This is enabled by embedding workflow along with data streams. Specifically, we break data outputs generated from experiments or simulations into multiple self-describing \"chunks\", which we call active data objects. As such, if there is a transient hotspot observed, a data object with unfinished workflow pipeline can break its previous schedule and search for a least loaded location to continue the execution. Our preliminary experiment results based on synthetic workloads demonstrate the proposed active workflow system as a very promising solution by outperforming the state-of-the-art semi-dynamic workflow schedulers with an improved workflow completion time, as well as a good scalability.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133172364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567646
D. Nahamoo
{"title":"Cognitive computing journey","authors":"D. Nahamoo","doi":"10.1145/2567634.2567646","DOIUrl":"https://doi.org/10.1145/2567634.2567646","url":null,"abstract":"Building intelligent machines has been a long dream of humanity. While the journey has been difficult and slow, the progress in Machine Learning, Optimization Techniques and advancement in Deep Belief Networks offers promising ways to engineer cognitive systems. The science behind cognitive computing seeks to develop systems that emulate human brain functions such as perception, knowledge accumulation, goal planning, and logical inference. Cognitive systems will operate at a speed and an informational capacity that far exceeds human capability. They will serve to act as an advisor, partner, helpmate, and co-creator to the humans, collaborating on human terms.\u0000 Cognitive computing is a fundamentally new computing paradigm for tackling real world problems, exploiting enormous amounts of information using massively parallel machines that interact with humans and other cognitive systems. Cognitive systems will bring human-like reasoning to the problems of Big Data, and will also permit us to expand into the white space of domains that require human-like cognition but that either exceed human capacity or are impossible for a live human presence.\u0000 In this talk, I will review the past progress and discuss the future challenges. I will address the architectural challenges of building a general purpose system of systems that can learn, can reason, and can interact in a human natural way.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114302314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PPAA '14Pub Date : 2014-02-16DOI: 10.1145/2567634.2567638
R. McColl, David Ediger, Jason A. Poovey, D. Campbell, David A. Bader
{"title":"A performance evaluation of open source graph databases","authors":"R. McColl, David Ediger, Jason A. Poovey, D. Campbell, David A. Bader","doi":"10.1145/2567634.2567638","DOIUrl":"https://doi.org/10.1145/2567634.2567638","url":null,"abstract":"With the proliferation of large, irregular, and sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query languages. Many of these platforms apply graph structures and analysis techniques to enable users to ingest, update, query, and compute on the topological structure of the network represented as sets of edges relating sets of vertices. To store and process Facebook-scale datasets, software and algorithms must be able to support data sources with billions of edges, update rates of millions of updates per second, and complex analysis kernels. These platforms must provide intuitive interfaces that enable graph experts and novice programmers to write implementations of common graph algorithms. In this paper, we conduct a qualitative study and a performance comparison of 12 open source graph databases using four fundamental graph algorithms on networks containing up to 256 million edges.","PeriodicalId":379963,"journal":{"name":"PPAA '14","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133891882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}