Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献

筛选
英文 中文
Mining evolutionary multi-branch trees from text streams 从文本流中挖掘进化多分支树
Xiting Wang, Shixia Liu, Yangqiu Song, B. Guo
{"title":"Mining evolutionary multi-branch trees from text streams","authors":"Xiting Wang, Shixia Liu, Yangqiu Song, B. Guo","doi":"10.1145/2487575.2487603","DOIUrl":"https://doi.org/10.1145/2487575.2487603","url":null,"abstract":"Understanding topic hierarchies in text streams and their evolution patterns over time is very important in many applications. In this paper, we propose an evolutionary multi-branch tree clustering method for streaming text data. We build evolutionary trees in a Bayesian online filtering framework. The tree construction is formulated as an online posterior estimation problem, which considers both the likelihood of the current tree and conditional prior given the previous tree. We also introduce a constraint model to compute the conditional prior of a tree in the multi-branch setting. Experiments on real world news data demonstrate that our algorithm can better incorporate historical tree information and is more efficient and effective than the traditional evolutionary hierarchical clustering algorithm.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91335803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
STED: semi-supervised targeted-interest event detectionin in twitter twitter中半监督的目标兴趣事件检测
Ting Hua, F. Chen, Liang Zhao, Chang-Tien Lu, Naren Ramakrishnan
{"title":"STED: semi-supervised targeted-interest event detectionin in twitter","authors":"Ting Hua, F. Chen, Liang Zhao, Chang-Tien Lu, Naren Ramakrishnan","doi":"10.1145/2487575.2487712","DOIUrl":"https://doi.org/10.1145/2487575.2487712","url":null,"abstract":"Social microblogs such as Twitter and Weibo are experiencing an explosive growth with billions of global users sharing their daily observations and thoughts. Beyond public interests (e.g., sports, music), microblogs can provide highly detailed information for those interested in public health, homeland security, and financial analysis. However, the language used in Twitter is heavily informal, ungrammatical, and dynamic. Existing data mining algorithms require extensive manually labeling to build and maintain a supervised system. This paper presents STED, a semi-supervised system that helps users to automatically detect and interactively visualize events of a targeted type from twitter, such as crimes, civil unrests, and disease outbreaks. Our model first applies transfer learning and label propagation to automatically generate labeled data, then learns a customized text classifier based on mini-clustering, and finally applies fast spatial scan statistics to estimate the locations of events. We demonstrate STED's usage and benefits using twitter data collected from Latin America countries, and show how our system helps to detect and track example events such as civil unrests and crimes.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90968803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
MI2LS: multi-instance learning from multiple informationsources MI2LS:从多个信息源进行多实例学习
Dan Zhang, Jingrui He, Richard D. Lawrence
{"title":"MI2LS: multi-instance learning from multiple informationsources","authors":"Dan Zhang, Jingrui He, Richard D. Lawrence","doi":"10.1145/2487575.2487651","DOIUrl":"https://doi.org/10.1145/2487575.2487651","url":null,"abstract":"In Multiple Instance Learning (MIL), each entity is normally expressed as a set of instances. Most of the current MIL methods only deal with the case when each instance is represented by one type of features. However, in many real world applications, entities are often described from several different information sources/views. For example, when applying MIL to image categorization, the characteristics of each image can be derived from both its RGB features and SIFT features. Previous research work has shown that, in traditional learning methods, leveraging the consistencies between different information sources could improve the classification performance drastically. Out of a similar motivation, to incorporate the consistencies between different information sources into MIL, we propose a novel research framework -- Multi-Instance Learning from Multiple Information Sources (MI2LS). Based on this framework, an algorithm -- Fast MI2LS (FMI2LS) is designed, which combines Concave-Convex Constraint Programming (CCCP) method and an adapte- d Stoachastic Gradient Descent (SGD) method. Some theoretical analysis on the optimality of the adapted SGD method and the generalized error bound of the formulation are given based on the proposed method. Experimental results on document classification and a novel application -- Insider Threat Detection (ITD), clearly demonstrate the superior performance of the proposed method over state-of-the-art MIL methods.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78211976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
iHR: an online recruiting system for Xiamen Talent Service Center iHR:厦门市人才服务中心网上招聘系统
Wenxing Hong, Lei Li, Tao Li, Wenfu Pan
{"title":"iHR: an online recruiting system for Xiamen Talent Service Center","authors":"Wenxing Hong, Lei Li, Tao Li, Wenfu Pan","doi":"10.1145/2487575.2488199","DOIUrl":"https://doi.org/10.1145/2487575.2488199","url":null,"abstract":"Online recruiting systems have gained immense attention in the wake of more and more job seekers searching jobs and enterprises finding candidates on the Internet. A critical problem in a recruiting system is how to maximally satisfy the desires of both job seekers and enterprises with reasonable recommendations or search results. In this paper, we investigate and compare various online recruiting systems from a product perspective. We then point out several key functions that help achieve a win-win situation between job seekers and enterprises for a successful recruiting system. Based on the observations and key functions, we design, implement and deploy a web-based application of recruiting system, named iHR, for Xiamen Talent Service Center. The system utilizes the latest advances in data mining and recommendation technologies to create a user-oriented service for a myriad of audience in job marketing community. Empirical evaluation and online user studies demonstrate the efficacy and effectiveness of our proposed system. Currently, iHR has been deployed at http://i.xmrc.com.cn/XMRCIntel.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78433433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A data-driven method for in-game decision making in MLB: when to pull a starting pitcher MLB游戏内决策的数据驱动方法:何时启用首发投手
Gartheeban Ganeshapillai, J. Guttag
{"title":"A data-driven method for in-game decision making in MLB: when to pull a starting pitcher","authors":"Gartheeban Ganeshapillai, J. Guttag","doi":"10.1145/2487575.2487660","DOIUrl":"https://doi.org/10.1145/2487575.2487660","url":null,"abstract":"Professional sports is a roughly $500 billion dollar industry that is increasingly data-driven. In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. Specifically we show how to use regularized linear regression to learn pitcher-specific predictive models that can be used to help decide when a starting pitcher should be replaced. A key step in the process is our method of converting categorical variables (e.g., the venue in which a game is played) into continuous variables suitable for the regression. Another key step is dealing with situations in which there is an insufficient amount of data to compute measures such as the effectiveness of a pitcher against specific batters. For each season we trained on the first 80% of the games, and tested on the rest. The results suggest that using our model could have led to better decisions than those made by major league managers. Applying our model would have led to a different decision 48% of the time. For those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up performing poorly 60% of the time.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76318589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Debiasing social wisdom 消除社会智慧的偏见
Abhimanyu Das, Sreenivas Gollapudi, R. Panigrahy, Mahyar Salek
{"title":"Debiasing social wisdom","authors":"Abhimanyu Das, Sreenivas Gollapudi, R. Panigrahy, Mahyar Salek","doi":"10.1145/2487575.2487684","DOIUrl":"https://doi.org/10.1145/2487575.2487684","url":null,"abstract":"With the explosive growth of social networks, many applications are increasingly harnessing the pulse of online crowds for a variety of tasks such as marketing, advertising, and opinion mining. An important example is the wisdom of crowd effect that has been well studied for such tasks when the crowd is non-interacting. However, these studies don't explicitly address the network effects in social networks. A key difference in this setting is the presence of social influences that arise from these interactions and can undermine the wisdom of the crowd [17]. Using a natural model of opinion formation, we analyze the effect of these interactions on an individual's opinion and estimate her propensity to conform. We then propose efficient sampling algorithms incorporating these conformity values to arrive at a debiased estimate of the wisdom of a crowd. We analyze the trade-off between the sample size and estimation error and validate our algorithms using both real data obtained from online user experiments and synthetic data.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84431044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
The dataminer's guide to scalable mixed-membership and nonparametric bayesian models 可扩展混合成员和非参数贝叶斯模型的数据挖掘指南
Amr Ahmed, Alex Smola
{"title":"The dataminer's guide to scalable mixed-membership and nonparametric bayesian models","authors":"Amr Ahmed, Alex Smola","doi":"10.1145/2487575.2506181","DOIUrl":"https://doi.org/10.1145/2487575.2506181","url":null,"abstract":"Large amounts of data arise in a multitude of situations, ranging from bioinformatics to astronomy, manufacturing, and medical applications. For concreteness our tutorial focuses on data obtained in the context of the internet, such as user generated content (microblogs, e-mails, messages), behavioral data (locations, interactions, clicks, queries), and graphs. Due to its magnitude, much of the challenges are to extract structure and interpretable models without the need for additional labels, i.e. to design effective unsupervised techniques. We present design patterns for hierarchical nonparametric Bayesian models, efficient inference algorithms, and modeling tools to describe salient aspects of the data.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84685998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining data from mobile devices: a survey of smart sensing and analytics 从移动设备中挖掘数据:智能传感和分析的调查
S. Papadimitriou, Tina Eliassi-Rad
{"title":"Mining data from mobile devices: a survey of smart sensing and analytics","authors":"S. Papadimitriou, Tina Eliassi-Rad","doi":"10.1145/2487575.2506177","DOIUrl":"https://doi.org/10.1145/2487575.2506177","url":null,"abstract":"Mobile connected devices, and smartphones in particular, are rapidly emerging as a dominant computing and sensing platform. This poses several unique opportunities for data collection and analysis, as well as new challenges. In this tutorial, we survey the state-of-the-art in terms of mining data from mobile devices across different application areas such as ads, healthcare, geosocial, public policy, etc. Our tutorial has three parts. In part one, we summarize data collection in terms of various sensing modalities. In part two, we present cross-cutting challenges such as real-time analysis, security, and we outline cross cutting methods for mobile data mining such as network inference, streaming algorithms, etc. In the last part, we specifically overview emerging and fast-growing application areas, such as noted above. Concluding, we briefly highlight the opportunities for joint design of new data collection techniques and analysis methods, suggesting additional directions for future research.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85182606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Scalable inference in max-margin topic models 最大边际主题模型中的可伸缩推理
Jun Zhu, Xun Zheng, Li Zhou, Bo Zhang
{"title":"Scalable inference in max-margin topic models","authors":"Jun Zhu, Xun Zheng, Li Zhou, Bo Zhang","doi":"10.1145/2487575.2487658","DOIUrl":"https://doi.org/10.1145/2487575.2487658","url":null,"abstract":"Topic models have played a pivotal role in analyzing large collections of complex data. Besides discovering latent semantics, supervised topic models (STMs) can make predictions on unseen test data. By marrying with advanced learning techniques, the predictive strengths of STMs have been dramatically enhanced, such as max-margin supervised topic models, state-of-the-art methods that integrate max-margin learning with topic models. Though powerful, max-margin STMs have a hard non-smooth learning problem. Existing algorithms rely on solving multiple latent SVM subproblems in an EM-type procedure, which can be too slow to be applicable to large-scale categorization tasks. In this paper, we present a highly scalable approach to building max-margin supervised topic models. Our approach builds on three key innovations: 1) a new formulation of Gibbs max-margin supervised topic models for both multi-class and multi-label classification; 2) a simple ``augment-and-collapse\" Gibbs sampling algorithm without making restricting assumptions on the posterior distributions; 3) an efficient parallel implementation that can easily tackle data sets with hundreds of categories and millions of documents. Furthermore, our algorithm does not need to solve SVM subproblems. Though performing the two tasks of topic discovery and learning predictive models jointly, which significantly improves the classification performance, our methods have comparable scalability as the state-of-the-art parallel algorithms for the standard LDA topic models which perform the single task of topic discovery only. Finally, an open-source implementation is also provided at: http://www.ml-thu.net/~jun/medlda.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82141696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A “semi-lazy” approach to probabilistic path prediction in dynamic environments 动态环境中概率路径预测的“半惰性”方法
Jingbo Zhou, A. Tung, Wei Wu, W. Ng
{"title":"A “semi-lazy” approach to probabilistic path prediction in dynamic environments","authors":"Jingbo Zhou, A. Tung, Wei Wu, W. Ng","doi":"10.1145/2487575.2487609","DOIUrl":"https://doi.org/10.1145/2487575.2487609","url":null,"abstract":"Path prediction is useful in a wide range of applications. Most of the existing solutions, however, are based on eager learning methods where models and patterns are extracted from historical trajectories and then used for future prediction. Since such approaches are committed to a set of statistically significant models or patterns, problems can arise in dynamic environments where the underlying models change quickly or where the regions are not covered with statistically significant models or patterns. We propose a \"semi-lazy\" approach to path prediction that builds prediction models on the fly using dynamically selected reference trajectories. Such an approach has several advantages. First, the target trajectories to be predicted are known before the models are built, which allows us to construct models that are deemed relevant to the target trajectories. Second, unlike the lazy learning approaches, we use sophisticated learning algorithms to derive accurate prediction models with acceptable delay based on a small number of selected reference trajectories. Finally, our approach can be continuously self-correcting since we can dynamically re-construct new models if the predicted movements do not match the actual ones. Our prediction model can construct a probabilistic path whose probability of occurrence is larger than a threshold and which is furthest ahead in term of time. Users can control the confidence of the path prediction by setting a probability threshold. We conducted a comprehensive experimental study on real-world and synthetic datasets to show the effectiveness and efficiency of our approach.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83210726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信