Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第3页

Mining evolutionary multi-branch trees from text streams 从文本流中挖掘进化多分支树

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487603

Xiting Wang, Shixia Liu, Yangqiu Song, B. Guo

引用次数: 19

STED: semi-supervised targeted-interest event detectionin in twitter twitter中半监督的目标兴趣事件检测

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487712

Ting Hua, F. Chen, Liang Zhao, Chang-Tien Lu, Naren Ramakrishnan

{"title":"STED: semi-supervised targeted-interest event detectionin in twitter","authors":"Ting Hua, F. Chen, Liang Zhao, Chang-Tien Lu, Naren Ramakrishnan","doi":"10.1145/2487575.2487712","DOIUrl":"https://doi.org/10.1145/2487575.2487712","url":null,"abstract":"Social microblogs such as Twitter and Weibo are experiencing an explosive growth with billions of global users sharing their daily observations and thoughts. Beyond public interests (e.g., sports, music), microblogs can provide highly detailed information for those interested in public health, homeland security, and financial analysis. However, the language used in Twitter is heavily informal, ungrammatical, and dynamic. Existing data mining algorithms require extensive manually labeling to build and maintain a supervised system. This paper presents STED, a semi-supervised system that helps users to automatically detect and interactively visualize events of a targeted type from twitter, such as crimes, civil unrests, and disease outbreaks. Our model first applies transfer learning and label propagation to automatically generate labeled data, then learns a customized text classifier based on mini-clustering, and finally applies fast spatial scan statistics to estimate the locations of events. We demonstrate STED's usage and benefits using twitter data collected from Latin America countries, and show how our system helps to detect and track example events such as civil unrests and crimes.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90968803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

MI2LS: multi-instance learning from multiple informationsources MI2LS:从多个信息源进行多实例学习

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487651

Dan Zhang, Jingrui He, Richard D. Lawrence

{"title":"MI2LS: multi-instance learning from multiple informationsources","authors":"Dan Zhang, Jingrui He, Richard D. Lawrence","doi":"10.1145/2487575.2487651","DOIUrl":"https://doi.org/10.1145/2487575.2487651","url":null,"abstract":"In Multiple Instance Learning (MIL), each entity is normally expressed as a set of instances. Most of the current MIL methods only deal with the case when each instance is represented by one type of features. However, in many real world applications, entities are often described from several different information sources/views. For example, when applying MIL to image categorization, the characteristics of each image can be derived from both its RGB features and SIFT features. Previous research work has shown that, in traditional learning methods, leveraging the consistencies between different information sources could improve the classification performance drastically. Out of a similar motivation, to incorporate the consistencies between different information sources into MIL, we propose a novel research framework -- Multi-Instance Learning from Multiple Information Sources (MI2LS). Based on this framework, an algorithm -- Fast MI2LS (FMI2LS) is designed, which combines Concave-Convex Constraint Programming (CCCP) method and an adapte- d Stoachastic Gradient Descent (SGD) method. Some theoretical analysis on the optimality of the adapted SGD method and the generalized error bound of the formulation are given based on the proposed method. Experimental results on document classification and a novel application -- Insider Threat Detection (ITD), clearly demonstrate the superior performance of the proposed method over state-of-the-art MIL methods.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78211976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

iHR: an online recruiting system for Xiamen Talent Service Center iHR:厦门市人才服务中心网上招聘系统

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488199

Wenxing Hong, Lei Li, Tao Li, Wenfu Pan

引用次数: 21

A data-driven method for in-game decision making in MLB: when to pull a starting pitcher MLB游戏内决策的数据驱动方法:何时启用首发投手

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487660

Gartheeban Ganeshapillai, J. Guttag

{"title":"A data-driven method for in-game decision making in MLB: when to pull a starting pitcher","authors":"Gartheeban Ganeshapillai, J. Guttag","doi":"10.1145/2487575.2487660","DOIUrl":"https://doi.org/10.1145/2487575.2487660","url":null,"abstract":"Professional sports is a roughly $500 billion dollar industry that is increasingly data-driven. In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. Specifically we show how to use regularized linear regression to learn pitcher-specific predictive models that can be used to help decide when a starting pitcher should be replaced. A key step in the process is our method of converting categorical variables (e.g., the venue in which a game is played) into continuous variables suitable for the regression. Another key step is dealing with situations in which there is an insufficient amount of data to compute measures such as the effectiveness of a pitcher against specific batters. For each season we trained on the first 80% of the games, and tested on the rest. The results suggest that using our model could have led to better decisions than those made by major league managers. Applying our model would have led to a different decision 48% of the time. For those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up performing poorly 60% of the time.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76318589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Debiasing social wisdom 消除社会智慧的偏见

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487684

Abhimanyu Das, Sreenivas Gollapudi, R. Panigrahy, Mahyar Salek

引用次数: 47

The dataminer's guide to scalable mixed-membership and nonparametric bayesian models 可扩展混合成员和非参数贝叶斯模型的数据挖掘指南

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2506181

Amr Ahmed, Alex Smola

引用次数: 0

Mining data from mobile devices: a survey of smart sensing and analytics 从移动设备中挖掘数据:智能传感和分析的调查

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2506177

S. Papadimitriou, Tina Eliassi-Rad

引用次数: 5

Scalable inference in max-margin topic models 最大边际主题模型中的可伸缩推理

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487658

Jun Zhu, Xun Zheng, Li Zhou, Bo Zhang

{"title":"Scalable inference in max-margin topic models","authors":"Jun Zhu, Xun Zheng, Li Zhou, Bo Zhang","doi":"10.1145/2487575.2487658","DOIUrl":"https://doi.org/10.1145/2487575.2487658","url":null,"abstract":"Topic models have played a pivotal role in analyzing large collections of complex data. Besides discovering latent semantics, supervised topic models (STMs) can make predictions on unseen test data. By marrying with advanced learning techniques, the predictive strengths of STMs have been dramatically enhanced, such as max-margin supervised topic models, state-of-the-art methods that integrate max-margin learning with topic models. Though powerful, max-margin STMs have a hard non-smooth learning problem. Existing algorithms rely on solving multiple latent SVM subproblems in an EM-type procedure, which can be too slow to be applicable to large-scale categorization tasks. In this paper, we present a highly scalable approach to building max-margin supervised topic models. Our approach builds on three key innovations: 1) a new formulation of Gibbs max-margin supervised topic models for both multi-class and multi-label classification; 2) a simple ``augment-and-collapse\" Gibbs sampling algorithm without making restricting assumptions on the posterior distributions; 3) an efficient parallel implementation that can easily tackle data sets with hundreds of categories and millions of documents. Furthermore, our algorithm does not need to solve SVM subproblems. Though performing the two tasks of topic discovery and learning predictive models jointly, which significantly improves the classification performance, our methods have comparable scalability as the state-of-the-art parallel algorithms for the standard LDA topic models which perform the single task of topic discovery only. Finally, an open-source implementation is also provided at: http://www.ml-thu.net/~jun/medlda.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82141696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

A “semi-lazy” approach to probabilistic path prediction in dynamic environments 动态环境中概率路径预测的“半惰性”方法

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487609

Jingbo Zhou, A. Tung, Wei Wu, W. Ng

{"title":"A “semi-lazy” approach to probabilistic path prediction in dynamic environments","authors":"Jingbo Zhou, A. Tung, Wei Wu, W. Ng","doi":"10.1145/2487575.2487609","DOIUrl":"https://doi.org/10.1145/2487575.2487609","url":null,"abstract":"Path prediction is useful in a wide range of applications. Most of the existing solutions, however, are based on eager learning methods where models and patterns are extracted from historical trajectories and then used for future prediction. Since such approaches are committed to a set of statistically significant models or patterns, problems can arise in dynamic environments where the underlying models change quickly or where the regions are not covered with statistically significant models or patterns. We propose a \"semi-lazy\" approach to path prediction that builds prediction models on the fly using dynamically selected reference trajectories. Such an approach has several advantages. First, the target trajectories to be predicted are known before the models are built, which allows us to construct models that are deemed relevant to the target trajectories. Second, unlike the lazy learning approaches, we use sophisticated learning algorithms to derive accurate prediction models with acceptable delay based on a small number of selected reference trajectories. Finally, our approach can be continuously self-correcting since we can dynamically re-construct new models if the predicted movements do not match the actual ones. Our prediction model can construct a probabilistic path whose probability of occurrence is larger than a threshold and which is furthest ahead in term of time. Users can control the confidence of the path prediction by setting a probability threshold. We conducted a comprehensive experimental study on real-world and synthetic datasets to show the effectiveness and efficiency of our approach.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"49 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83210726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49