Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献

筛选
英文 中文
Social Media Anomaly Detection: Challenges and Solutions 社交媒体异常检测:挑战与解决方案
Yan Liu, S. Chawla
{"title":"Social Media Anomaly Detection: Challenges and Solutions","authors":"Yan Liu, S. Chawla","doi":"10.1145/2783258.2789990","DOIUrl":"https://doi.org/10.1145/2783258.2789990","url":null,"abstract":"Anomaly detection is of critical importance to prevent malicious activities such as bullying, terrorist attack planning, and fraud information dissemination. With the recent popularity of social media, new types of anomalous behaviors arise, causing concerns from various parties. While a large body of work haven been dedicated to traditional anomaly detection problems, we observe a surge of research interests in the new realm of social media anomaly detection. In this tutorial, we survey existing work on social media anomaly detection, focusing on the new anomalous phenomena in social media and most recent techniques to detect those special types of anomalies. We aim to provide a general overview of the problem domain, common formulations, existing methodologies and future directions.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121720732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Non-exhaustive, Overlapping Clustering via Low-Rank Semidefinite Programming 基于低秩半定规划的非穷举重叠聚类
Yangyang Hou, Joyce Jiyoung Whang, D. Gleich, I. Dhillon
{"title":"Non-exhaustive, Overlapping Clustering via Low-Rank Semidefinite Programming","authors":"Yangyang Hou, Joyce Jiyoung Whang, D. Gleich, I. Dhillon","doi":"10.1145/2783258.2783398","DOIUrl":"https://doi.org/10.1145/2783258.2783398","url":null,"abstract":"Clustering is one of the most fundamental tasks in data mining. To analyze complex real-world data emerging in many data-centric applications, the problem of non-exhaustive, overlapping clustering has been studied where the goal is to find overlapping clusters and also detect outliers simultaneously. We propose a novel convex semidefinite program (SDP) as a relaxation of the non-exhaustive, overlapping clustering problem. Although the SDP formulation enjoys attractive theoretical properties with respect to global optimization, it is computationally intractable for large problem sizes. As an alternative, we optimize a low-rank factorization of the solution. The resulting problem is non-convex, but has a smaller number of solution variables. We construct an optimization solver using an augmented Lagrangian methodology that enables us to deal with problems with tens of thousands of data points. The new solver provides more accurate and reliable answers than other approaches. By exploiting the connection between graph clustering objective functions and a kernel k-means objective, our new low-rank solver can also compute overlapping communities of social networks with state-of-the-art accuracy.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122066580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An Architecture for Agile Machine Learning in Real-Time Applications 实时应用中敏捷机器学习的体系结构
Johann Schleier-Smith
{"title":"An Architecture for Agile Machine Learning in Real-Time Applications","authors":"Johann Schleier-Smith","doi":"10.1145/2783258.2788628","DOIUrl":"https://doi.org/10.1145/2783258.2788628","url":null,"abstract":"Machine learning techniques have proved effective in recommender systems and other applications, yet teams working to deploy them lack many of the advantages that those in more established software disciplines today take for granted. The well-known Agile methodology advances projects in a chain of rapid development cycles, with subsequent steps often informed by production experiments. Support for such workflow in machine learning applications remains primitive. The platform developed at if(we) embodies a specific machine learning approach and a rigorous data architecture constraint, so allowing teams to work in rapid iterative cycles. We require models to consume data from a time-ordered event history, and we focus on facilitating creative feature engineering. We make it practical for data scientists to use the same model code in development and in production deployment, and make it practical for them to collaborate on complex models. We deliver real-time recommendations at scale, returning top results from among 10,000,000 candidates with sub-second response times and incorporating new updates in just a few seconds. Using the approach and architecture described here, our team can routinely go from ideas for new models to production-validated results within two weeks.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125021868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Who Supported Obama in 2012?: Ecological Inference through Distribution Regression 谁在2012年支持奥巴马?:分布回归的生态推断
S. Flaxman, Yu-Xiang Wang, Alex Smola
{"title":"Who Supported Obama in 2012?: Ecological Inference through Distribution Regression","authors":"S. Flaxman, Yu-Xiang Wang, Alex Smola","doi":"10.1145/2783258.2783300","DOIUrl":"https://doi.org/10.1145/2783258.2783300","url":null,"abstract":"We present a new solution to the ``ecological inference'' problem, of learning individual-level associations from aggregate data. This problem has a long history and has attracted much attention, debate, claims that it is unsolvable, and purported solutions. Unlike other ecological inference techniques, our method makes use of unlabeled individual-level data by embedding the distribution over these predictors into a vector in Hilbert space. Our approach relies on recent learning theory results for distribution regression, using kernel embeddings of distributions. Our novel approach to distribution regression exploits the connection between Gaussian process regression and kernel ridge regression, giving us a coherent, Bayesian approach to learning and inference and a convenient way to include prior information in the form of a spatial covariance function. Our approach is highly scalable as it relies on FastFood, a randomized explicit feature representation for kernel embeddings. We apply our approach to the challenging political science problem of modeling the voting behavior of demographic groups based on aggregate voting data. We consider the 2012 US Presidential election, and ask: what was the probability that members of various demographic groups supported Barack Obama, and how did this vary spatially across the country? Our results match standard survey-based exit polling data for the small number of states for which it is available, and serve to fill in the large gaps in this data, at a much higher degree of granularity.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125902076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams Dirichlet-Hawkes过程在连续时间文档流聚类中的应用
Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alex Smola, Le Song
{"title":"Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams","authors":"Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alex Smola, Le Song","doi":"10.1145/2783258.2783411","DOIUrl":"https://doi.org/10.1145/2783258.2783411","url":null,"abstract":"Clusters in document streams, such as online news articles, can be induced by their textual contents, as well as by the temporal dynamics of their arriving patterns. Can we leverage both sources of information to obtain a better clustering of the documents, and distill information that is not possible to extract using contents only? In this paper, we propose a novel random process, referred to as the Dirichlet-Hawkes process, to take into account both information in a unified framework. A distinctive feature of the proposed model is that the preferential attachment of items to clusters according to cluster sizes, present in Dirichlet processes, is now driven according to the intensities of cluster-wise self-exciting temporal point processes, the Hawkes processes. This new model establishes a previously unexplored connection between Bayesian Nonparametrics and temporal Point Processes, which makes the number of clusters grow to accommodate the increasing complexity of online streaming contents, while at the same time adapts to the ever changing dynamics of the respective continuous arrival time. We conducted large-scale experiments on both synthetic and real world news articles, and show that Dirichlet-Hawkes processes can recover both meaningful topics and temporal dynamics, which leads to better predictive performance in terms of content perplexity and arrival time of future documents.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127095986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 172
Online Outlier Exploration Over Large Datasets 大型数据集的在线离群值探索
Lei Cao, Mingrui Wei, Di Yang, Elke A. Rundensteiner
{"title":"Online Outlier Exploration Over Large Datasets","authors":"Lei Cao, Mingrui Wei, Di Yang, Elke A. Rundensteiner","doi":"10.1145/2783258.2783387","DOIUrl":"https://doi.org/10.1145/2783258.2783387","url":null,"abstract":"Traditional outlier detection systems process each individual outlier detection request instantiated with a particular parameter setting one at a time. This is not only prohibitively time-consuming for large datasets, but also tedious for analysts as they explore the data to hone in on the appropriate parameter setting and desired results. In this work, we present the first online outlier exploration platform, called ONION, that enables analysts to effectively explore anomalies even in large datasets. First, ONION features an innovative interactive anomaly exploration model that offers an \"outlier-centric panorama\" into big datasets along with rich classes of exploration operations. Second, to achieve this model ONION employs an online processing framework composed of a one time offline preprocessing phase followed by an online exploration phase that enables users to interactively explore the data. The preprocessing phase compresses raw big data into a knowledge-rich ONION abstraction that encodes critical interrelationships of outlier candidates so to support subsequent interactive outlier exploration. For the interactive exploration phase, our ONION framework provides several processing strategies that efficiently support the outlier exploration operations. Our user study with real data confirms the effectiveness of ONION in recognizing \"true\" outliers. Furthermore as demonstrated by our extensive experiments with large datasets, ONION supports all exploration operations within milliseconds response time.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127380227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Edge-Weighted Personalized PageRank: Breaking A Decade-Old Performance Barrier 边缘加权个性化网页排名:打破十年之久的性能障碍
Wenlei Xie, D. Bindel, A. Demers, J. Gehrke
{"title":"Edge-Weighted Personalized PageRank: Breaking A Decade-Old Performance Barrier","authors":"Wenlei Xie, D. Bindel, A. Demers, J. Gehrke","doi":"10.1145/2783258.2783278","DOIUrl":"https://doi.org/10.1145/2783258.2783278","url":null,"abstract":"Personalized PageRank is a standard tool for finding vertices in a graph that are most relevant to a query or user. To personalize PageRank, one adjusts node weights or edge weights that determine teleport probabilities and transition probabilities in a random surfer model. There are many fast methods to approximate PageRank when the node weights are personalized; however, personalization based on edge weights has been an open problem since the dawn of personalized PageRank over a decade ago. In this paper, we describe the first fast algorithm for computing PageRank on general graphs when the edge weights are personalized. Our method, which is based on model reduction, outperforms existing methods by nearly five orders of magnitude. This huge performance gain over previous work allows us --- for the very first time --- to solve learning-to-rank problems for edge weight personalization at interactive speeds, a goal that had not previously been achievable for this class of problems.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129552310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering 实时Top-R主题检测与主题劫持过滤
K. Hayashi, Takanori Maehara, Masashi Toyoda, K. Kawarabayashi
{"title":"Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering","authors":"K. Hayashi, Takanori Maehara, Masashi Toyoda, K. Kawarabayashi","doi":"10.1145/2783258.2783402","DOIUrl":"https://doi.org/10.1145/2783258.2783402","url":null,"abstract":"Twitter is a \"what's-happening-right-now\" tool that enables interested parties to follow thoughts and commentary of individual users in nearly real-time. While it is a valuable source of information for real-time topic detection and tracking, Twitter data are not clean because of noisy messages and users, which significantly diminish the reliability of obtained results. In this paper, we integrate both the extraction of meaningful topics and the filtering of messages over the Twitter stream. We develop a streaming algorithm for a sequence of document-frequency tables; our algorithm enables real-time monitoring of the top-10 topics from approximately 25% of all Twitter messages, while automatically filtering noisy and meaningless topics. We apply our proposed streaming algorithm to the Japanese Twitter stream and successfully demonstrate that, compared with other online nonnegative matrix factorization methods, our framework both tracks real-world events with high accuracy in terms of the perplexity and simultaneously eliminates irrelevant topics.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128903831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Data, Knowledge and Discovery: Machine Learning meets Natural Science 数据、知识和发现:机器学习与自然科学的结合
H. Durrant-Whyte
{"title":"Data, Knowledge and Discovery: Machine Learning meets Natural Science","authors":"H. Durrant-Whyte","doi":"10.1145/2783258.2785467","DOIUrl":"https://doi.org/10.1145/2783258.2785467","url":null,"abstract":"Increasingly it is data, vast amounts of data, that drives scientific discovery. At the heart of this so-called \"fourth paradigm of science\" is the rapid development of large scale statistical data fusion and machine learning methods. While these developments in \"big data\" methods are largely driven by commercial applications such as internet search or customer modelling, the opportunity for applying these to scientific discovery is huge. This talk will describe a number of applied machine learning projects addressing real-world inference problems in physical, life and social science areas. In particular, I will describe a major Science and Industry Endowment Fund (SIEF) project, in collaboration with the NICTA and Macquarie University, looking to apply machine learning techniques to discovery in the natural sciences. This talk will look at the key methods in machine learning that are being applied to the discovery process, especially in areas like geology, ecology and biological discovery.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117218751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
CoupledLP: Link Prediction in Coupled Networks 耦合网络中的链路预测
Yuxiao Dong, Jing Zhang, Jie Tang, N. Chawla, Bai Wang
{"title":"CoupledLP: Link Prediction in Coupled Networks","authors":"Yuxiao Dong, Jing Zhang, Jie Tang, N. Chawla, Bai Wang","doi":"10.1145/2783258.2783329","DOIUrl":"https://doi.org/10.1145/2783258.2783329","url":null,"abstract":"We study the problem of link prediction in coupled networks, where we have the structure information of one (source) network and the interactions between this network and another (target) network. The goal is to predict the missing links in the target network. The problem is extremely challenging as we do not have any information of the target network. Moreover, the source and target networks are usually heterogeneous and have different types of nodes and links. How to utilize the structure information in the source network for predicting links in the target network? How to leverage the heterogeneous interactions between the two networks for the prediction task? We propose a unified framework, CoupledLP, to solve the problem. Given two coupled networks, we first leverage atomic propagation rules to automatically construct implicit links in the target network for addressing the challenge of target network incompleteness, and then propose a coupled factor graph model to incorporate the meta-paths extracted from the coupled part of the two networks for transferring heterogeneous knowledge. We evaluate the proposed framework on two different genres of datasets: disease-gene (DG) and mobile social networks. In the DG networks, we aim to use the disease network to predict the associations between genes. In the mobile networks, we aim to use the mobile communication network of one mobile operator to infer the network structure of its competitors. On both datasets, the proposed CoupledLP framework outperforms several alternative methods. The proposed problem of coupled link prediction and the corresponding framework demonstrate both the scientific and business applications in biology and social networks.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122662131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信