Tomokatsu Takahashi, Hiroaki Shiokawa, H. Kitagawa
{"title":"SCAN-XP: Parallel Structural Graph Clustering Algorithm on Intel Xeon Phi Coprocessors","authors":"Tomokatsu Takahashi, Hiroaki Shiokawa, H. Kitagawa","doi":"10.1145/3068943.3068949","DOIUrl":"https://doi.org/10.1145/3068943.3068949","url":null,"abstract":"The structural graph clustering method SCAN, proposed by Xu et al., is successfully used in many applications because it not only detects densely connected nodes as clusters but also extracts sparsely connected nodes as hubs or outliers. However, it is difficult to applying SCAN to large-scale graphs since SCAN needs to evaluate the density for all adjacent nodes included in the given graphs. In this paper, so as to address the above problem, we present a novel algorithm SCAN-XP that performs over Intel Xeon Phi. We designed SCAN-XP in order to make best use of the hardware potential of Intel Xeon Phi by employing the following approaches: First, SCAN-XP avoids the bottlenecks that arise from parallel graph computations by providing good load balances among cores on the Intel Xeon Phi. Second, SCAN-XP effectively exploits 512 bit SIMD instructions implemented in the Intel Xeon Phi to speed up the density evaluations. As a result, SCAN-XP detects clusters, hubs, and outliers from large-scale graphs with much shorter computation time than SCAN. Specifically, SCAN-XP runs approximately 100 times faster than SCAN; for the graphs with 100 million edges, SCAN-XP is able to perform in a few seconds. In this paper, extensive evaluations on real-world graphs demonstrate the performance superiority of SCAN-XP over existing approaches.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114716419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2nd International Workshop on Network Data Analytics","authors":"Akhil Arora, Shourya Roy, A. Bhattacharya","doi":"10.1145/3068943","DOIUrl":"https://doi.org/10.1145/3068943","url":null,"abstract":"We are delighted to present the papers from the 2nd NDA Workshop on Network Data Analytics, which took place on 19th May, 2017 co-located with the ACM SIGMOD conference in Chicago, Illinois, USA. \u0000 \u0000Networks are prevalent in today's electronic world in a wide variety of domains ranging from engineering to social sciences, life sciences, physical sciences, and so on. Researchers and practitioners have studied networks in multiple ways like defining network metrics, providing theoretical results and examining problems like pattern mining, link prediction, etc. The NDA workshop is a forum for exchanging ideas and methods for mining, querying and learning with real-world networks, developing new common understandings of the problems at hand, sharing of data sets where applicable, and leveraging existing knowledge from different disciplines. The purpose of this workshop is to bring together researchers from academia, industry, and government, to create a forum for discussing recent advances in (large-scale) graph analysis, as well as propose and discuss novel methods and techniques towards addressing domain specific challenges and handling noise in real-world graphs.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114907488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Construction of Structured Heterogeneous Networks from Massive Text Data: Extended Abstract","authors":"Jiawei Han","doi":"10.1145/3068943.3068944","DOIUrl":"https://doi.org/10.1145/3068943.3068944","url":null,"abstract":"Network data analytics is important, powerful, and exciting. How big role may network data analytics play in the real world? Much real-world data is unstructured, in the form of natural language text. A grand challenges on big data research is to develop effective and scalable methods to turn such massive text data into actionable knowledge. In order to turn such massive unstructured, text-rich, but interconnected data into knowledge, we propose a data-to-network-to-knowledge (D2N2K) paradigm, that is, first transform data into relatively structured heterogeneous information networks, and then mine such text-rich and structure-rich heterogeneous networks to generate useful knowledge. We argue that such a paradigm represents a promising direction and network data analytics will play an essential role in transforming data to knowledge. However, a critical bottleneck in this game is mining structures from text data. We present our recent progress on developing effective methods for mining structures from massive text data and constructing structured heterogeneous information networks.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127391264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Repairing Noisy Graphs","authors":"D. Srivastava","doi":"10.1145/3068943.3068945","DOIUrl":"https://doi.org/10.1145/3068943.3068945","url":null,"abstract":"Graphs are a flexible way to represent data in a variety of applications, with nodes representing domain-specific entities (e.g., records in record linkage, products and types in an ontology) and edges capturing a variety of relationships between these entities (e.g., an equivalence relationship between records in record linkage, a type-subtype relationship between types in an ontology). Often, the edges in this graph are noisy, in that some edges are missing (i.e., real-world relationships that do not have corresponding edges in the graph) and some edges are spurious (i.e., edges in the graph that do not have corresponding real-world relationships). Directly analyzing noisy graphs can lead to undesirable outcomes, making it important to repair noisy graphs. In this talk, we describe an approach that takes advantage of properties of real-world relationships and their estimated probabilities to ask oracle queries (an abstraction of crowdsourcing) to efficiently repair the noisy graphs. We illustrate this approach for the case of graphs that are unions of cliques (which is the case for record linkage) and graphs that are trees (which is the case for ontologies), and present theoretical and empirical results for these cases. This is joint work with Donatella Firmani, Sainyam Galhotra and Barna Saha.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129437534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Prediction for Graph Queries","authors":"M. Namaki, K. Sasani, Yinghui Wu, A. Gebremedhin","doi":"10.1145/3068943.3068947","DOIUrl":"https://doi.org/10.1145/3068943.3068947","url":null,"abstract":"Query performance prediction has shown benefits to query optimization and resource allocation for relational databases. Emerging applications are leading to search scenarios where workloads with heterogeneous, structure-less analytical queries are processed over large-scale graph and network data. This calls for effective models to predict the performance of graph analytical queries, which are often more involved than their relational counterparts. In this paper, we study and evaluate predictive techniques for graph query performance prediction. We make several contributions. (1) We propose a general learning framework that makes use of practical and computationally efficient statistics from query scenarios and employs regression models. (2) We instantiate the framework with two routinely issued query classes, namely, reachability and graph pattern matching, that exhibit different query complexity. We develop modeling and learning algorithms for both query classes. (3) We show that our prediction models readily apply to resource-bounded querying, by providing a learning-based workload optimization strategy. Given a query workload and a time bound, the models select queries to be processed with a maximized query profit and a total cost within the bound. Using real-world graphs, we experimentally demonstrate the efficacy of our framework in terms of accuracy and the effectiveness of workload optimization.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128097124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Graphical Features To Improve Demographic Prediction From Smart Phone Data","authors":"S. Akter, L. Holder","doi":"10.1145/3068943.3068948","DOIUrl":"https://doi.org/10.1145/3068943.3068948","url":null,"abstract":"Demographic information such as gender, age, ethnicity, level of education, disabilities, employment, and socio-economic status are important in the area of social science, survey and marketing. But it is difficult to obtain the demographic information from users due to reluctance of users to participate and low response rate. Through automated demographics prediction from smart phone sensor data, researchers can obtain this valuable information in a nonintrusive and cost-effective manner. We approach the problem of demographic prediction, namely, classification of gender, age group and job type, through the use of a graphical feature based framework. The framework represents information collected from sensor networks as graphs, extracts useful and relevant graphical features, and predicts demographic information. We evaluated our approach on the Nokia Mobile Phone dataset for the three classification tasks: gender, age-group and job-type. Our approach produced comparable results with most of the state of the art methods while having the additional advantage of general applicability to sensor networks without using sophisticated and application-specific feature generation techniques, background knowledge and special techniques to address class imbalance.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126994089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Mining to Characterize Competition for Employment","authors":"A. Toulis, Lukasz Golab","doi":"10.1145/3068943.3068946","DOIUrl":"https://doi.org/10.1145/3068943.3068946","url":null,"abstract":"In this paper, we discuss a novel application of graph analytics to characterize competition in the workforce. We propose a methodology that relies on finding communities in a graph representing prospective employees (with edges connecting people who interviewed for the same job) and communities in a graph representing available jobs (with edges connecting jobs that interviewed the same person). We then apply the proposed methodology to a real dataset corresponding to cooperative internships offered to undergraduate students at a North American post-secondary institution, illustrating the benefits of our approach.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121310049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web and Social Media Analytics towards Enhancing Urban Transportations: A Case for Bangalore","authors":"Manjira Sinha, P. Varma, Tridib Mukherjee","doi":"10.1145/3068943.3068950","DOIUrl":"https://doi.org/10.1145/3068943.3068950","url":null,"abstract":"Cities today are typically plagued by multiple issues such as âĂŞ traffic jams, garbage, transit overload, public safety, drainage etc. Citizens today tend to discuss these issues in public forums, social media, web blogs, in a widespread manner. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, aggregation and visualization of urban public transportation issues. The primary challenges in deriving useful insights from web-based sources, stem from: (a) the number of reports; (b) incomplete or implicit spatio-temporal context; and the (c) unstructured nature of text in these reports. This paper provides the text categorization techniques that can be adopted to address specifically these challenges. The work initiates with the formal complaint data from the largest public transportation agency in Bangalore, complemented by complaint reports from web-based and social media sources. An easy to navigate and well-organized dashboard is developed for efficient visualization. The dashboard is currently being piloted with the largest transportation agency in Bangalore.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128342918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}