Mojtaba Kadkhodaie, Konstantina Christakopoulou, Maziar Sanjabi, A. Banerjee
{"title":"Accelerated Alternating Direction Method of Multipliers","authors":"Mojtaba Kadkhodaie, Konstantina Christakopoulou, Maziar Sanjabi, A. Banerjee","doi":"10.1145/2783258.2783400","DOIUrl":"https://doi.org/10.1145/2783258.2783400","url":null,"abstract":"Recent years have seen a revival of interest in the Alternating Direction Method of Multipliers (ADMM), due to its simplicity, versatility, and scalability. As a first order method for general convex problems, the rate of convergence of ADMM is O(1=k) [4, 25]. Given the scale of modern data mining problems, an algorithm with similar properties as ADMM but faster convergence rate can make a big difference in real world applications. In this paper, we introduce the Accelerated Alternating Direction Method of Multipliers (A2DM2) which solves problems with the same structure as ADMM. When the objective function is strongly convex, we show that A2DM2 has a O(1=k2) convergence rate. Unlike related existing literature on trying to accelerate ADMM, our analysis does not need any additional restricting assumptions. Through experiments, we show that A2DM2 converges faster than ADMM on a variety of problems. Further, we illustrate the versatility of the general A2DM2 on the problem of learning to rank, where it is shown to be competitive with the state-of-the-art specialized algorithms for the problem on both scalability and accuracy.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129875053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Collective Bayesian Poisson Factorization Model for Cold-start Local Event Recommendation","authors":"Wei Zhang, Jianyong Wang","doi":"10.1145/2783258.2783336","DOIUrl":"https://doi.org/10.1145/2783258.2783336","url":null,"abstract":"Event-based social networks (EBSNs), in which organizers publish events to attract other users in local city to attend offline, emerge in recent years and grow rapidly. Due to the large volume of events in EBSNs, event recommendation is essential. A few recent works focus on this task, while almost all the methods need that each event to be recommended should have been registered by some users to attend. Thus they ignore two essential characteristics of events in EBSNs: (1) a large number of new events will be published every day which means many events have few participants in the beginning, (2) events have life cycles which means outdated events should not be recommended. Overall, event recommendation in EBSNs inevitably faces the cold-start problem. In this work, we address the new problem of cold-start local event recommendation in EBSNs. We propose a collective Bayesian Poisson factorization (CBPF) model for handling this problem. CBPF takes recently proposed Bayesian Poisson factorization as its basic unit to model user response to events, social relation, and content text separately. Then it further jointly connects these units by the idea of standard collective matrix factorization model. Moreover, in our model event textual content, organizer, and location information are utilized to learn representation of cold-start events for predicting user response to them. Besides, an efficient coordinate ascent algorithm is adopted to learn the model. We conducted comprehensive experiments on real datasets crawled from EBSNs and the results demonstrate our proposed model is effective and outperforms several alternative methods.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134504718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Fujiwara, M. Nakatsuji, Hiroaki Shiokawa, Yasutoshi Ida, Machiko Toyoda
{"title":"Adaptive Message Update for Fast Affinity Propagation","authors":"Y. Fujiwara, M. Nakatsuji, Hiroaki Shiokawa, Yasutoshi Ida, Machiko Toyoda","doi":"10.1145/2783258.2783280","DOIUrl":"https://doi.org/10.1145/2783258.2783280","url":null,"abstract":"Affinity Propagation is a clustering algorithm used in many applications. It iteratively updates messages between data points until convergence. The message updating process enables Affinity Propagation to have higher clustering quality compared with other approaches. However, its computation cost is high; it is quadratic in the number of data points. This is because it updates the messages of all data point pairs. This paper proposes an efficient algorithm that guarantees the same clustering results as the original algorithm. Our approach, F-AP, is based on two ideas: (1) it computes upper and lower estimates to limit the messages to be updated in each iteration, and (2) it dynamically detects converged messages to efficiently skip unneeded updates. Experiments show that F-AP is much faster than previous approaches with no loss in clustering performance.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133495701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big Data Analytics: Optimization and Randomization","authors":"Tianbao Yang, Qihang Lin, Rong Jin","doi":"10.1145/2783258.2789989","DOIUrl":"https://doi.org/10.1145/2783258.2789989","url":null,"abstract":"As the scale and dimensionality of data continue to grow in many applications of data analytics (e.g., bioinformatics, finance, computer vision, medical informatics), it becomes critical to develop efficient and effective algorithms to solve numerous machine learning and data mining problems. This tutorial will focus on simple yet practically effective techniques and algorithms for big data analytics. In the first part, we plan to present the state-of-the-art large-scale optimization algorithms, including various stochastic gradient descent methods, stochastic coordinate descent methods and distributed optimization algorithms, for solving various machine learning problems. In the second part, we will focus on randomized approximation algorithms for learning from large-scale data. We will discuss i) randomized algorithms for low-rank matrix approximation; ii) approximation techniques for solving kernel learning problems; iii) randomized reduction methods for addressing the high-dimensional challenge. Along with the description of algorithms, we will also present some empirical results to facilitate understanding of different algorithms and comparison between them.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132187018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Mitra, Ramachandra Kota, S. Bandyopadhyay, V. Arya, B. Sullivan, Richard Mueller, H. Storey, Gerard Labut
{"title":"Voltage Correlations in Smart Meter Data","authors":"R. Mitra, Ramachandra Kota, S. Bandyopadhyay, V. Arya, B. Sullivan, Richard Mueller, H. Storey, Gerard Labut","doi":"10.1145/2783258.2788594","DOIUrl":"https://doi.org/10.1145/2783258.2788594","url":null,"abstract":"The connectivity model of a power distribution network can easily become outdated due to system changes occurring in the field. Maintaining and sustaining an accurate connectivity model is a key challenge for distribution utilities worldwide. This work shows that voltage time series measurements collected from customer smart meters exhibit correlations that are consistent with the hierarchical structure of the distribution network. These correlations may be leveraged to cluster customers based on common ancestry and help verify and correct an existing connectivity model. Additionally, customers may be clustered in combination with voltage data from circuit metering points, spatial data from the geographical information system, and any existing but partially accurate connectivity model to infer customer to transformer and phase connectivity relationships with high accuracy. We report analysis and validation results based on data collected from multiple feeders of a large electric distribution network in North America. To the best of our knowledge, this is the first large scale measurement study of customer voltage data and its use in inferring network connectivity information.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130252552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible and Robust Multi-Network Clustering","authors":"Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang","doi":"10.1145/2783258.2783262","DOIUrl":"https://doi.org/10.1145/2783258.2783262","url":null,"abstract":"Integrating multiple graphs (or networks) has been shown to be a promising approach to improve the graph clustering accuracy. Various multi-view and multi-domain graph clustering methods have recently been developed to integrate multiple networks. In these methods, a network is treated as a view or domain.The key assumption is that there is a common clustering structure shared across all views (domains), and different views (domains) provide compatible and complementary information on this underlying clustering structure. However, in many emerging real-life applications, different networks have different data distributions, where the assumption that all networks share a single common clustering structure does not hold. In this paper, we propose a flexible and robust framework that allows multiple underlying clustering structures across different networks. Our method models the domain similarity as a network, which can be utilized to regularize the clustering structures in different networks. We refer to such a data model as a network of networks (NoN). We develop NoNClus, a novel method based on non-negative matrix factorization (NMF), to cluster an NoN. We provide rigorous theoretical analysis of NoNClus in terms of its correctness, convergence and complexity. Extensive experimental results on synthetic and real-life datasets show the effectiveness of our method.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131850996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Artificial Intelligence and Big Data Created Rocket Fuel: A Case Study","authors":"George H. John","doi":"10.1145/2783258.2790458","DOIUrl":"https://doi.org/10.1145/2783258.2790458","url":null,"abstract":"In 2008, Rocket Fuel's founders saw a gap in the digital advertising market. None of the existing players were building autonomous systems based on big data and artificial intelligence, but instead they were offering fairly simple technology and relying on human campaign managers to drive success. Five years later in 2013, Rocket Fuel had the best technology IPO of the year on NASDAQ, reported $240 million in revenue, and was ranked by accounting firm Deloitte as the #1 fastest-growing technology company in North America. Along the way we learned that it's okay to be bold in our expectations of what is possible with fully autonomous systems, we learned that mainstream customers will buy advanced technology if it's delivered in a familiar way, and we also learned that it's incredibly difficult to debug the complex \"robot psychology\" when a number of complex autonomous systems interact. We also had excellent luck and timing: as we were building the company, real-time ad impression-level auctions with machine-to-machine buying and selling became commonplace, and marketers became increasingly focused on delivering better results for their company and delivering better personalized and relevant digital experiences for their customers. The case study presentation will present a fast-paced overview of the business and technology context for Rocket Fuel at inception and at present, key learnings and decisions, and the road ahead.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131934442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web Personalization and Recommender Systems","authors":"S. Berkovsky, J. Freyne","doi":"10.1145/2783258.2789995","DOIUrl":"https://doi.org/10.1145/2783258.2789995","url":null,"abstract":"The quantity of accessible information has been growing rapidly and far exceeded human processing capabilities. The sheer abundance of information often prevents users from discovering the desired information, or aggravates making informed and correct choices. This highlights the pressing need for intelligent personalized applications that simplify information access and discovery by taking into account users' preferences and needs. One type of personalized application that has recently become tremendously popular in research and industry is recommender systems. These provide to users personalized recommendations about information and products they may be interested to examine or purchase. Extensive research into recommender systems has yielded a variety of techniques, which have been published at a variety of conferences and adopted by numerous Web-sites. This tutorial will provide the participants with broad overview and thorough understanding of algorithms and practically deployed Web and mobile applications of personalized technologies.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"48 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132204494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Arlorio, J. Coïsson, G. Leonardi, M. Locatelli, L. Portinale
{"title":"Exploiting Data Mining for Authenticity Assessment and Protection of High-Quality Italian Wines from Piedmont","authors":"M. Arlorio, J. Coïsson, G. Leonardi, M. Locatelli, L. Portinale","doi":"10.1145/2783258.2788596","DOIUrl":"https://doi.org/10.1145/2783258.2788596","url":null,"abstract":"This paper discusses the data mining approach followed in a project called TRAQUASwine, aimed at the definition of methods for data analytical assessment of the authenticity and protection, against fake versions, of some of the highest value Nebbiolo-based wines from Piedmont region in Italy. This is a big issue in the wine market, where commercial frauds related to such a kind of products are estimated to be worth millions of Euros. The objective is twofold: to show that the problem can be addressed without expensive and hyper-specialized wine analyses, and to demonstrate the actual usefulness of classification algorithms for data mining on the resulting chemical profiles. Following Wagstaff's proposal for practical exploitation of machine learning (and data mining) approaches, we describe how data have been collected and prepared for the production of different datasets, how suitable classification models have been identified and how the interpretation of the results suggests the emergence of an active role of classification techniques, based on standard chemical profiling, for the assesment of the authenticity of the wines target of the study.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134326692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, Jiawei Han
{"title":"On the Discovery of Evolving Truth","authors":"Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, Jiawei Han","doi":"10.1145/2783258.2783277","DOIUrl":"https://doi.org/10.1145/2783258.2783277","url":null,"abstract":"In the era of big data, information regarding the same objects can be collected from increasingly more sources. Unfortunately, there usually exist conflicts among the information coming from different sources. To tackle this challenge, truth discovery, i.e., to integrate multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. In many real world applications, however, the information may come sequentially, and as a consequence, the truth of objects as well as the reliability of sources may be dynamically evolving. Existing truth discovery methods, unfortunately, cannot handle such scenarios. To address this problem, we investigate the temporal relations among both object truths and source reliability, and propose an incremental truth discovery framework that can dynamically update object truths and source weights upon the arrival of new data. Theoretical analysis is provided to show that the proposed method is guaranteed to converge at a fast rate. The experiments on three real world applications and a set of synthetic data demonstrate the advantages of the proposed method over state-of-the-art truth discovery methods.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130579659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}