Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第6页

Mining discriminative subgraphs from global-state networks 从全局状态网络中挖掘判别子图

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487692

Sayan Ranu, Minh X. Hoang, Ambuj K. Singh

{"title":"Mining discriminative subgraphs from global-state networks","authors":"Sayan Ranu, Minh X. Hoang, Ambuj K. Singh","doi":"10.1145/2487575.2487692","DOIUrl":"https://doi.org/10.1145/2487575.2487692","url":null,"abstract":"Global-state networks provide a powerful mechanism to model the increasing heterogeneity in data generated by current systems. Such a network comprises of a series of network snapshots with dynamic local states at nodes, and a global network state indicating the occurrence of an event. Mining discriminative subgraphs from global-state networks allows us to identify the influential sub-networks that have maximum impact on the global state and unearth the complex relationships between the local entities of a network and their collective behavior. In this paper, we explore this problem and design a technique called MINDS to mine minimally discriminative subgraphs from large global-state networks. To combat the exponential subgraph search space, we derive the concept of an edit map and perform Metropolis Hastings sampling on it to compute the answer set. Furthermore, we formulate the idea of network-constrained decision trees to learn prediction models that adhere to the underlying network structure. Extensive experiments on real datasets demonstrate excellent accuracy in terms of prediction quality. Additionally, MINDS achieves a speed-up of at least four orders of magnitude over baseline techniques.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79190269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

When TEDDY meets GrizzLY: temporal dependency discovery for triggering road deicing operations 当TEDDY遇到GrizzLY:触发道路除冰操作的时间依赖性发现

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487706

C. Robardet, Vasile-Marian Scuturici, M. Plantevit, A. Fraboulet

引用次数: 1

Financing lead triggers: empowering sales reps through knowledge discovery and fusion 融资线索触发器:通过知识发现和融合赋予销售代表权力

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488190

K. Aggour, Bethany Hoogs

{"title":"Financing lead triggers: empowering sales reps through knowledge discovery and fusion","authors":"K. Aggour, Bethany Hoogs","doi":"10.1145/2487575.2488190","DOIUrl":"https://doi.org/10.1145/2487575.2488190","url":null,"abstract":"Sales representatives must have access to meaningful and actionable intelligence about potential customers to be effective in their roles. Historically, GE Capital Americas sales reps identified leads by manually searching through news reports and financial statements either in print or online. Here we describe a system built to automate the collection and aggregation of information on companies, which is then mined to identify actionable sales leads. The Financing Lead Triggers system is comprised of three core components that perform information fusion, knowledge discovery and information visualization. Together these components extract raw data from disparate sources, fuse that data into information, and then automatically mine that information for actionable sales leads driven by a combination of expert-defined and statistically derived triggers. A web-based interface provides sales reps access to the company information and sales leads in a single location. The use of the Lead Triggers system has significantly improved the performance of the sales reps, providing them with actionable intelligence that has improved their productivity by 30-50%. In 2010, Lead Triggers provided leads on opportunities that represented over $44B in new deal commitments for GE Capital.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78663986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

STRIP: stream learning of influence probabilities STRIP:影响概率的流学习

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487657

Konstantin Kutzkov, A. Bifet, F. Bonchi, A. Gionis

{"title":"STRIP: stream learning of influence probabilities","authors":"Konstantin Kutzkov, A. Bifet, F. Bonchi, A. Gionis","doi":"10.1145/2487575.2487657","DOIUrl":"https://doi.org/10.1145/2487575.2487657","url":null,"abstract":"Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to different models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(nlog n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86996767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

A phrase mining framework for recursive construction of a topical hierarchy 用于递归构建主题层次结构的短语挖掘框架

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487631

Chi Wang, Marina Danilevsky, Nihit Desai, Yinan Zhang, Phuong Nguyen, T. Taula, Jiawei Han

引用次数: 91

AMETHYST: a system for mining and exploring topical hierarchies of heterogeneous data AMETHYST:用于挖掘和探索异构数据的主题层次结构的系统

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487716

Marina Danilevsky, Chi Wang, Fangbo Tao, Son Nguyen, Gong Chen, Nihit Desai, Lidan Wang, Jiawei Han

引用次数: 9

Network discovery via constrained tensor analysis of fMRI data 网络发现通过约束张量分析的fMRI数据

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487619

I. Davidson, Sean Gilpin, Owen Carmichael, Peter B. Walker

{"title":"Network discovery via constrained tensor analysis of fMRI data","authors":"I. Davidson, Sean Gilpin, Owen Carmichael, Peter B. Walker","doi":"10.1145/2487575.2487619","DOIUrl":"https://doi.org/10.1145/2487575.2487619","url":null,"abstract":"We pose the problem of network discovery which involves simplifying spatio-temporal data into cohesive regions (nodes) and relationships between those regions (edges). Such problems naturally exist in fMRI scans of human subjects. These scans consist of activations of thousands of voxels over time with the aim to simplify them into the underlying cognitive network being used. We propose supervised and semi-supervised variations of this problem and postulate a constrained tensor decomposition formulation and a corresponding alternating least squares solver that is easy to implement. We show this formulation works well in controlled experiments where supervision is incomplete, superfluous and noisy and is able to recover the underlying ground truth network. We then show that for real fMRI data our approach can reproduce well known results in neurology regarding the default mode network in resting-state healthy and Alzheimer affected individuals. Finally, we show that the reconstruction error of the decomposition provides a useful measure of the network strength and is useful at predicting key cognitive scores both by itself and with clinical information.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83090582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

Accurate intelligible models with pairwise interactions 具有两两相互作用的精确可理解模型

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487579

Yin Lou, R. Caruana, J. Gehrke, G. Hooker

{"title":"Accurate intelligible models with pairwise interactions","authors":"Yin Lou, R. Caruana, J. Gehrke, G. Hooker","doi":"10.1145/2487575.2487579","DOIUrl":"https://doi.org/10.1145/2487575.2487579","url":null,"abstract":"Standard generalized additive models (GAMs) usually model the dependent variable as a sum of univariate models. Although previous studies have shown that standard GAMs can be interpreted by users, their accuracy is significantly less than more complex models that permit interactions. In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. The resulting models, which we call GA2{M}$-models, for Generalized Additive Models plus Interactions, consist of univariate terms and a small number of pairwise interaction terms. Since these models only include one- and two-dimensional components, the components of GA2M-models can be visualized and interpreted by users. To explore the huge (quadratic) number of pairs of features, we develop a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model. In a large-scale empirical study, we show the effectiveness of FAST in ranking candidate pairs of features. In addition, we show the surprising result that GA2M-models have almost the same performance as the best full-complexity models on a number of real datasets. Thus this paper postulates that for many problems, GA2M-models can yield models that are both intelligible and accurate.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84359623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 406

Analysis of advanced meter infrastructure data of water consumption in apartment buildings 公寓用水量先进计量基础设施数据分析

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488193

Einat Kermany, Hanna Mazzawi, Dorit Baras, Y. Naveh, Hagai Michaelis

{"title":"Analysis of advanced meter infrastructure data of water consumption in apartment buildings","authors":"Einat Kermany, Hanna Mazzawi, Dorit Baras, Y. Naveh, Hagai Michaelis","doi":"10.1145/2487575.2488193","DOIUrl":"https://doi.org/10.1145/2487575.2488193","url":null,"abstract":"We present our experience of using machine learning techniques over data originating from advanced meter infrastructure (AMI) systems for water consumption in a medium-size city. We focus on two new use cases that are of special importance to city authorities. One use case is the automatic identification of malfunctioning meters, with a focus on distinguishing them from legitimate non-consumption such as during periods when the household residents are on vacation. The other use case is the identification of leaks or theft in the unmetered common areas of apartment buildings. These two use cases are highly important to city authorities both because of the lost revenue they imply and because of the hassle to the residents in cases of delayed identification. Both cases are inherently complex to analyze and require advanced data mining techniques in order to achieve high levels of correct identification. Our results provide for faster and more accurate detection of malfunctioning meters as well as leaks in the common areas. This results in significant tangible value to the authorities in terms of increase in technician efficiency and a decrease in the amount of wasted, non-revenue, water.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86780449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Measuring spontaneous devaluations in user preferences 测量用户偏好的自发贬值

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487679

Komal Kapoor, Nisheeth Srivastava, J. Srivastava, P. Schrater

引用次数: 15