Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第2页

Adaptive collective routing using gaussian process dynamic congestion models 基于高斯过程动态拥塞模型的自适应集体路由

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487598

Siyuan Liu, Yisong Yue, R. Krishnan

{"title":"Adaptive collective routing using gaussian process dynamic congestion models","authors":"Siyuan Liu, Yisong Yue, R. Krishnan","doi":"10.1145/2487575.2487598","DOIUrl":"https://doi.org/10.1145/2487575.2487598","url":null,"abstract":"We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network in the presence of uncertain and dynamic congestion conditions. To tackle this problem, we first propose a Gaussian Process Dynamic Congestion Model that can effectively characterize both the dynamics and the uncertainty of congestion conditions. Our model is efficient and thus facilitates real-time adaptive routing in the face of uncertainty. Using this congestion model, we develop an efficient algorithm for non-myopic adaptive routing to minimize the collective travel time of all vehicles in the system. A key property of our approach is the ability to efficiently reason about the long-term value of exploration, which enables collectively balancing the exploration/exploitation trade-off for entire fleets of vehicles. We validate our approach based on traffic data from two large Asian cities. We show that our congestion model is effective in modeling dynamic congestion conditions. We also show that our routing algorithm generates significantly faster routes compared to standard baselines, and achieves near-optimal performance compared to an omniscient routing algorithm. We also present the results from a preliminary field study, which showcases the efficacy of our approach.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82291052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

A tool for collecting provenance data in social media 在社交媒体中收集来源数据的工具

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487713

Pritam Gundecha, Suhas Ranganath, Zhuo Feng, Huan Liu

引用次数: 30

Learning geographical preferences for point-of-interest recommendation 学习地理偏好以推荐兴趣点

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487673

B. Liu, Yanjie Fu, Zijun Yao, Hui Xiong

{"title":"Learning geographical preferences for point-of-interest recommendation","authors":"B. Liu, Yanjie Fu, Zijun Yao, Hui Xiong","doi":"10.1145/2487575.2487673","DOIUrl":"https://doi.org/10.1145/2487575.2487673","url":null,"abstract":"The problem of point of interest (POI) recommendation is to provide personalized recommendations of places of interests, such as restaurants, for mobile users. Due to its complexity and its connection to location based social networks (LBSNs), the decision process of a user choose a POI is complex and can be influenced by various factors, such as user preferences, geographical influences, and user mobility behaviors. While there are some studies on POI recommendations, it lacks of integrated analysis of the joint effect of multiple factors. To this end, in this paper, we propose a novel geographical probabilistic factor analysis framework which strategically takes various factors into consideration. Specifically, this framework allows to capture the geographical influences on a user's check-in behavior. Also, the user mobility behaviors can be effectively exploited in the recommendation model. Moreover, the recommendation model can effectively make use of user check-in count data as implicity user feedback for modeling user preferences. Finally, experimental results on real-world LBSNs data show that the proposed recommendation method outperforms state-of-the-art latent factor models with a significant margin.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80550347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 430

Information cascade at group scale 群体规模的信息级联

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487683

Milad Eftekhar, Y. Ganjali, Nick Koudas

{"title":"Information cascade at group scale","authors":"Milad Eftekhar, Y. Ganjali, Nick Koudas","doi":"10.1145/2487575.2487683","DOIUrl":"https://doi.org/10.1145/2487575.2487683","url":null,"abstract":"Identifying the k most influential individuals in a social network is a well-studied problem. The objective is to detect k individuals in a (social) network who will influence the maximum number of people, if they are independently convinced of adopting a new strategy (product, idea, etc). There are cases in real life, however, where we aim to instigate groups instead of individuals to trigger network diffusion. Such cases abound, e.g., billboards, TV commercials and newspaper ads are utilized extensively to boost the popularity and raise awareness. In this paper, we generalize the \"influential nodes\" problem. Namely we are interested to locate the most \"influential groups\" in a network. As the first paper to address this problem: we (1) propose a fine-grained model of information diffusion for the group-based problem, (2) show that the process is submodular and present an algorithm to determine the influential groups under this model (with a precise approximation bound), (3) propose a coarse-grained model that inspects the network at group level (not individuals) significantly speeding up calculations for large networks, (4) show that the diffusion function we design here is submodular in general case, and propose an approximation algorithm for this coarse-grained model, and finally by conducting experiments on real datasets, (5) demonstrate that seeding members of selected groups to be the first adopters can broaden diffusion (when compared to the influential individuals case). Moreover, we can identify these influential groups much faster (up to 12 million times speedup), delivering a practical solution to this problem.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83826684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

LCARS: a location-content-aware recommender system LCARS:一个位置内容感知推荐系统

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487608

Hongzhi Yin, Yizhou Sun, B. Cui, Zhiting Hu, Ling Chen

{"title":"LCARS: a location-content-aware recommender system","authors":"Hongzhi Yin, Yizhou Sun, B. Cui, Zhiting Hu, Ling Chen","doi":"10.1145/2487575.2487608","DOIUrl":"https://doi.org/10.1145/2487575.2487608","url":null,"abstract":"Newly emerging location-based and event-based social network services provide us with a new platform to understand users' preferences based on their activity history. A user can only visit a limited number of venues/events and most of them are within a limited distance range, so the user-item matrix is very sparse, which creates a big challenge for traditional collaborative filtering-based recommender systems. The problem becomes more challenging when people travel to a new city where they have no activity history. In this paper, we propose LCARS, a location-content-aware recommender system that offers a particular user a set of venues (e.g., restaurants) or events (e.g., concerts and exhibitions) by giving consideration to both personal interest and local preference. This recommender system can facilitate people's travel not only near the area in which they live, but also in a city that is new to them. Specifically, LCARS consists of two components: offline modeling and online recommendation. The offline modeling part, called LCA-LDA, is designed to learn the interest of each individual user and the local preference of each individual city by capturing item co-occurrence patterns and exploiting item contents. The online recommendation part automatically combines the learnt interest of the querying user and the local preference of the querying city to produce the top-k recommendations. To speed up this online process, a scalable query processing technique is developed by extending the classic Threshold Algorithm (TA). We evaluate the performance of our recommender system on two large-scale real data sets, DoubanEvent and Foursquare. The results show the superiority of LCARS in recommending spatial items for users, especially when traveling to new cities, in terms of both effectiveness and efficiency.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79055830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 354

Algorithmic techniques for modeling and mining large graphs (AMAzING) 建模和挖掘大型图的算法技术(AMAzING)

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2506176

A. Frieze, A. Gionis, Charalampos E. Tsourakakis

引用次数: 4

The bang for the buck: fair competitive viral marketing from the host perspective 物有所值:从主机的角度来看，公平竞争的病毒式营销

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487649

Wei Lu, F. Bonchi, Amit Goyal, L. Lakshmanan

{"title":"The bang for the buck: fair competitive viral marketing from the host perspective","authors":"Wei Lu, F. Bonchi, Amit Goyal, L. Lakshmanan","doi":"10.1145/2487575.2487649","DOIUrl":"https://doi.org/10.1145/2487575.2487649","url":null,"abstract":"The key algorithmic problem in viral marketing is to identify a set of influential users (called seeds) in a social network, who, when convinced to adopt a product, shall influence other users in the network, leading to a large number of adoptions. When two or more players compete with similar products on the same network we talk about competitive viral marketing, which so far has been studied exclusively from the perspective of one of the competing players. In this paper we propose and study the novel problem of competitive viral marketing from the perspective of the host, i.e., the owner of the social network platform. The host sells viral marketing campaigns as a service to its customers, keeping control of the selection of seeds. Each company specifies its budget and the host allocates the seeds accordingly. From the host's perspective, it is important not only to choose the seeds to maximize the collective expected spread, but also to assign seeds to companies so that it guarantees the \"bang for the buck\" for all companies is nearly identical, which we formalize as the fair seed allocation problem. We propose a new propagation model capturing the competitive nature of viral marketing. Our model is intuitive and retains the desired properties of monotonicity and submodularity. We show that the fair seed allocation problem is NP-hard, and develop an efficient algorithm called Needy Greedy. We run experiments on three real-world social networks, showing that our algorithm is effective and scalable.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81180903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 80

Understanding Twitter data with TweetXplorer 使用TweetXplorer理解Twitter数据

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487703

Fred Morstatter, Shamanth Kumar, Huan Liu, Ross Maciejewski

引用次数: 76

Multi-source learning with block-wise missing data for Alzheimer's disease prediction 基于块缺失数据的多源学习用于阿尔茨海默病预测

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487594

Shuo Xiang, Lei Yuan, Wei Fan, Yalin Wang, P. Thompson, Jieping Ye

{"title":"Multi-source learning with block-wise missing data for Alzheimer's disease prediction","authors":"Shuo Xiang, Lei Yuan, Wei Fan, Yalin Wang, P. Thompson, Jieping Ye","doi":"10.1145/2487575.2487594","DOIUrl":"https://doi.org/10.1145/2487575.2487594","url":null,"abstract":"With the advances and increasing sophistication in data collection techniques, we are facing with large amounts of data collected from multiple heterogeneous sources in many applications. For example, in the study of Alzheimer's Disease (AD), different types of measurements such as neuroimages, gene/protein expression data, genetic data etc. are often collected and analyzed together for improved predictive power. It is believed that a joint learning of multiple data sources is beneficial as different data sources may contain complementary information, and feature-pruning and data source selection are critical for learning interpretable models from high-dimensional data. Very often the collected data comes with block-wise missing entries; for example, a patient without the MRI scan will have no information in the MRI data block, making his/her overall record incomplete. There has been a growing interest in the data mining community on expanding traditional techniques for single-source complete data analysis to the study of multi-source incomplete data. The key challenge is how to effectively integrate information from multiple heterogeneous sources in the presence of block-wise missing data. In this paper we first investigate the situation of complete data and present a unified ``bi-level\" learning model for multi-source data. Then we give a natural extension of this model to the more challenging case with incomplete data. Our major contributions are threefold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances. Moreover, it can be easily generalized to other applications with block-wise missing data sources; (3) efficient optimization algorithms are presented for both the complete and incomplete models. We have performed comprehensive evaluations of the proposed models on the application of AD diagnosis. Our proposed models compare favorably against existing approaches.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90576930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Indexed block coordinate descent for large-scale linear classification with limited memory 有限内存下大规模线性分类的索引块坐标下降

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487626

I. E. Yen, Chun-Fu Chang, Ting-Wei Lin, Shan-Wei Lin, Shou-De Lin

{"title":"Indexed block coordinate descent for large-scale linear classification with limited memory","authors":"I. E. Yen, Chun-Fu Chang, Ting-Wei Lin, Shan-Wei Lin, Shou-De Lin","doi":"10.1145/2487575.2487626","DOIUrl":"https://doi.org/10.1145/2487575.2487626","url":null,"abstract":"Linear Classification has achieved complexity linear to the data size. However, in many applications, data contain large amount of samples that does not help improve the quality of model, but still cost much I/O and memory to process. In this paper, we show how a Block Coordinate Descent method based on Nearest-Neighbor Index can significantly reduce such cost when learning a dual-sparse model. In particular, we employ truncated loss function to induce a series of convex programs with superior dual sparsity, and solve each dual using Indexed Block Coordinate Descent, which makes use of Approximate Nearest Neighbor (ANN) search to select active dual variables without I/O cost on irrelevant samples. We prove that, despite the bias and weak guarantee from ANN query, the proposed algorithm has global convergence to the solution defined on entire dataset, with sublinear complexity each iteration. Experiments in both sufficient and limited memory conditions show that the proposed approach learns many times faster than other state-of-the-art solvers without sacrificing accuracy.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86863601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5