Pranay Anchuri, Mohammed J. Zaki, Omer Barkol, Shahar Golan, Moshe Shamy
{"title":"Approximate graph mining with label costs","authors":"Pranay Anchuri, Mohammed J. Zaki, Omer Barkol, Shahar Golan, Moshe Shamy","doi":"10.1145/2487575.2487602","DOIUrl":"https://doi.org/10.1145/2487575.2487602","url":null,"abstract":"Many real-world graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to define a cost (or distance) between different labels. Using this information, it becomes possible to mine a much richer set of approximate subgraph patterns, which preserve the topology but allow bounded label mismatches. We present novel and scalable methods to efficiently solve the approximate isomorphism problem. We show that approximate mining yields interesting patterns in several real-world graphs ranging from IT and protein interaction networks to protein structures.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77528210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining discriminative subgraphs from global-state networks","authors":"Sayan Ranu, Minh X. Hoang, Ambuj K. Singh","doi":"10.1145/2487575.2487692","DOIUrl":"https://doi.org/10.1145/2487575.2487692","url":null,"abstract":"Global-state networks provide a powerful mechanism to model the increasing heterogeneity in data generated by current systems. Such a network comprises of a series of network snapshots with dynamic local states at nodes, and a global network state indicating the occurrence of an event. Mining discriminative subgraphs from global-state networks allows us to identify the influential sub-networks that have maximum impact on the global state and unearth the complex relationships between the local entities of a network and their collective behavior. In this paper, we explore this problem and design a technique called MINDS to mine minimally discriminative subgraphs from large global-state networks. To combat the exponential subgraph search space, we derive the concept of an edit map and perform Metropolis Hastings sampling on it to compute the answer set. Furthermore, we formulate the idea of network-constrained decision trees to learn prediction models that adhere to the underlying network structure. Extensive experiments on real datasets demonstrate excellent accuracy in terms of prediction quality. Additionally, MINDS achieves a speed-up of at least four orders of magnitude over baseline techniques.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79190269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Financing lead triggers: empowering sales reps through knowledge discovery and fusion","authors":"K. Aggour, Bethany Hoogs","doi":"10.1145/2487575.2488190","DOIUrl":"https://doi.org/10.1145/2487575.2488190","url":null,"abstract":"Sales representatives must have access to meaningful and actionable intelligence about potential customers to be effective in their roles. Historically, GE Capital Americas sales reps identified leads by manually searching through news reports and financial statements either in print or online. Here we describe a system built to automate the collection and aggregation of information on companies, which is then mined to identify actionable sales leads. The Financing Lead Triggers system is comprised of three core components that perform information fusion, knowledge discovery and information visualization. Together these components extract raw data from disparate sources, fuse that data into information, and then automatically mine that information for actionable sales leads driven by a combination of expert-defined and statistically derived triggers. A web-based interface provides sales reps access to the company information and sales leads in a single location. The use of the Lead Triggers system has significantly improved the performance of the sales reps, providing them with actionable intelligence that has improved their productivity by 30-50%. In 2010, Lead Triggers provided leads on opportunities that represented over $44B in new deal commitments for GE Capital.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78663986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantin Kutzkov, A. Bifet, F. Bonchi, A. Gionis
{"title":"STRIP: stream learning of influence probabilities","authors":"Konstantin Kutzkov, A. Bifet, F. Bonchi, A. Gionis","doi":"10.1145/2487575.2487657","DOIUrl":"https://doi.org/10.1145/2487575.2487657","url":null,"abstract":"Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to different models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(nlog n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86996767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi Wang, Marina Danilevsky, Nihit Desai, Yinan Zhang, Phuong Nguyen, T. Taula, Jiawei Han
{"title":"A phrase mining framework for recursive construction of a topical hierarchy","authors":"Chi Wang, Marina Danilevsky, Nihit Desai, Yinan Zhang, Phuong Nguyen, T. Taula, Jiawei Han","doi":"10.1145/2487575.2487631","DOIUrl":"https://doi.org/10.1145/2487575.2487631","url":null,"abstract":"A high quality hierarchical organization of the concepts in a dataset at different levels of granularity has many valuable applications such as search, summarization, and content browsing. In this paper we propose an algorithm for recursively constructing a hierarchy of topics from a collection of content-representative documents. We characterize each topic in the hierarchy by an integrated ranked list of mixed-length phrases. Our mining framework is based on a phrase-centric view for clustering, extracting, and ranking topical phrases. Experiments with datasets from three different domains illustrate our ability to generate hierarchies of high quality topics represented by meaningful phrases.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82787111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marina Danilevsky, Chi Wang, Fangbo Tao, Son Nguyen, Gong Chen, Nihit Desai, Lidan Wang, Jiawei Han
{"title":"AMETHYST: a system for mining and exploring topical hierarchies of heterogeneous data","authors":"Marina Danilevsky, Chi Wang, Fangbo Tao, Son Nguyen, Gong Chen, Nihit Desai, Lidan Wang, Jiawei Han","doi":"10.1145/2487575.2487716","DOIUrl":"https://doi.org/10.1145/2487575.2487716","url":null,"abstract":"In this demo we present AMETHYST, a system for exploring and analyzing a topical hierarchy constructed from a heterogeneous information network (HIN). HINs, composed of multiple types of entities and links are very common in the real world. Many have a text component, and thus can benefit from a high quality hierarchical organization of the topics in the network dataset. By organizing the topics into a hierarchy, AMETHYST helps understand search results in the context of an ontology, and explain entity relatedness at different granularities. The automatically constructed topical hierarchy reflects a domain-specific ontology, interacts with multiple types of linked entities, and can be tailored for both free text and OLAP queries.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87807047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Davidson, Sean Gilpin, Owen Carmichael, Peter B. Walker
{"title":"Network discovery via constrained tensor analysis of fMRI data","authors":"I. Davidson, Sean Gilpin, Owen Carmichael, Peter B. Walker","doi":"10.1145/2487575.2487619","DOIUrl":"https://doi.org/10.1145/2487575.2487619","url":null,"abstract":"We pose the problem of network discovery which involves simplifying spatio-temporal data into cohesive regions (nodes) and relationships between those regions (edges). Such problems naturally exist in fMRI scans of human subjects. These scans consist of activations of thousands of voxels over time with the aim to simplify them into the underlying cognitive network being used. We propose supervised and semi-supervised variations of this problem and postulate a constrained tensor decomposition formulation and a corresponding alternating least squares solver that is easy to implement. We show this formulation works well in controlled experiments where supervision is incomplete, superfluous and noisy and is able to recover the underlying ground truth network. We then show that for real fMRI data our approach can reproduce well known results in neurology regarding the default mode network in resting-state healthy and Alzheimer affected individuals. Finally, we show that the reconstruction error of the decomposition provides a useful measure of the network strength and is useful at predicting key cognitive scores both by itself and with clinical information.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83090582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate intelligible models with pairwise interactions","authors":"Yin Lou, R. Caruana, J. Gehrke, G. Hooker","doi":"10.1145/2487575.2487579","DOIUrl":"https://doi.org/10.1145/2487575.2487579","url":null,"abstract":"Standard generalized additive models (GAMs) usually model the dependent variable as a sum of univariate models. Although previous studies have shown that standard GAMs can be interpreted by users, their accuracy is significantly less than more complex models that permit interactions. In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. The resulting models, which we call GA2{M}$-models, for Generalized Additive Models plus Interactions, consist of univariate terms and a small number of pairwise interaction terms. Since these models only include one- and two-dimensional components, the components of GA2M-models can be visualized and interpreted by users. To explore the huge (quadratic) number of pairs of features, we develop a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model. In a large-scale empirical study, we show the effectiveness of FAST in ranking candidate pairs of features. In addition, we show the surprising result that GA2M-models have almost the same performance as the best full-complexity models on a number of real datasets. Thus this paper postulates that for many problems, GA2M-models can yield models that are both intelligible and accurate.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84359623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Einat Kermany, Hanna Mazzawi, Dorit Baras, Y. Naveh, Hagai Michaelis
{"title":"Analysis of advanced meter infrastructure data of water consumption in apartment buildings","authors":"Einat Kermany, Hanna Mazzawi, Dorit Baras, Y. Naveh, Hagai Michaelis","doi":"10.1145/2487575.2488193","DOIUrl":"https://doi.org/10.1145/2487575.2488193","url":null,"abstract":"We present our experience of using machine learning techniques over data originating from advanced meter infrastructure (AMI) systems for water consumption in a medium-size city. We focus on two new use cases that are of special importance to city authorities. One use case is the automatic identification of malfunctioning meters, with a focus on distinguishing them from legitimate non-consumption such as during periods when the household residents are on vacation. The other use case is the identification of leaks or theft in the unmetered common areas of apartment buildings. These two use cases are highly important to city authorities both because of the lost revenue they imply and because of the hassle to the residents in cases of delayed identification. Both cases are inherently complex to analyze and require advanced data mining techniques in order to achieve high levels of correct identification. Our results provide for faster and more accurate detection of malfunctioning meters as well as leaks in the common areas. This results in significant tangible value to the authorities in terms of increase in technician efficiency and a decrease in the amount of wasted, non-revenue, water.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86780449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Komal Kapoor, Nisheeth Srivastava, J. Srivastava, P. Schrater
{"title":"Measuring spontaneous devaluations in user preferences","authors":"Komal Kapoor, Nisheeth Srivastava, J. Srivastava, P. Schrater","doi":"10.1145/2487575.2487679","DOIUrl":"https://doi.org/10.1145/2487575.2487679","url":null,"abstract":"Spontaneous devaluation in preferences is ubiquitous, where yesterday's hit is today's affliction. Despite technological advances facilitating access to a wide range of media commodities, finding engaging content is a major enterprise with few principled solutions. Systems tracking spontaneous devaluation in user preferences can allow prediction of the onset of boredom in users potentially catering to their changed needs. In this work, we study the music listening histories of Last.fm users focusing on the changes in their preferences based on their choices for different artists at different points in time. A hazard function, commonly used in statistics for survival analysis, is used to capture the rate at which a user returns to an artist as a function of exposure to the artist. The analysis provides the first evidence of spontaneous devaluation in preferences of music listeners. Better understanding of the temporal dynamics of this phenomenon can inform solutions to the similarity-diversity dilemma of recommender systems.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86632711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}