Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining最新文献_第2页

Node Similarity with q -Grams for Real-World Labeled Networks 真实世界标记网络的q -Grams节点相似度

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3220085

A. Conte, Gaspare Ferraro, R. Grossi, Andrea Marino, K. Sadakane, T. Uno

引用次数: 11

Metric Learning from Probabilistic Labels 基于概率标签的度量学习

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219976

Mengdi Huai, Chenglin Miao, Yaliang Li, Qiuling Suo, Lu Su, Aidong Zhang

{"title":"Metric Learning from Probabilistic Labels","authors":"Mengdi Huai, Chenglin Miao, Yaliang Li, Qiuling Suo, Lu Su, Aidong Zhang","doi":"10.1145/3219819.3219976","DOIUrl":"https://doi.org/10.1145/3219819.3219976","url":null,"abstract":"Metric learning aims to learn a good distance metric that can capture the relationships among instances, and its importance has long been recognized in many fields. In the traditional settings of metric learning, an implicit assumption is that the associated labels of the instances are deterministic. However, in many real-world applications, the associated labels come naturally with probabilities instead of deterministic values. Thus, the existing metric learning methods cannot work well in these applications. To tackle this challenge, in this paper, we study how to effectively learn the distance metric from datasets that contain probabilistic information, and then propose two novel metric learning mechanisms for two types of probabilistic labels, i.e., the instance-wise probabilistic label and the group-wise probabilistic label. Compared with the existing metric learning methods, our proposed mechanisms are capable of learning distance metrics directly from the probabilistic labels with high accuracy. We also theoretically analyze the two proposed mechanisms and provide theoretical bounds on the sample complexity for both of them. Additionally, extensive experiments based on real-world datasets are conducted to verify the desirable properties of the proposed mechanisms.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128518272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Route Recommendations for Idle Taxi Drivers: Find Me the Shortest Route to a Customer! 给空闲出租车司机的路线建议:给我找一条到客户的最短路线!

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3220055

Nandani Garg, Sayan Ranu

引用次数: 51

Predicting Estimated Time of Arrival for Commercial Flights 预测商业航班预计到达时间

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219874

S. Ayhan, P. Costas, H. Samet

{"title":"Predicting Estimated Time of Arrival for Commercial Flights","authors":"S. Ayhan, P. Costas, H. Samet","doi":"10.1145/3219819.3219874","DOIUrl":"https://doi.org/10.1145/3219819.3219874","url":null,"abstract":"Unprecedented growth is expected globally in commercial air traffic over the next ten years. To accommodate this increase in volume, a new concept of operations has been implemented in the context of the Next Generation Air Transportation System (NextGen) in the USA and the Single European Sky ATM Research (SESAR) in Europe. However, both of the systems approach airspace capacity and efficiency deterministically, failing to account for external operational circumstances which can directly affect the aircraft's actual flight profile. A major factor in increased airspace efficiency and capacity is accurate prediction of Estimated Time of Arrival (ETA) for commercial flights, which can be a challenging task due to a non-deterministic nature of environmental factors, and air traffic. Inaccurate prediction of ETA can cause potential safety risks and loss of resources for Air Navigation Service Providers (ANSP), airlines and passengers. In this paper, we present a novel ETA Prediction System for commercial flights. The system learns from historical trajectories and uses their pertinent 3D grid points to collect key features such as weather parameters, air traffic, and airport data along the potential flight path. The features are fed into various regression models and a Recurrent Neural Network (RNN) and the best performing models with the most accurate ETA predictions are compared with the ETAs currently operational by the European ANSP, EUROCONTROL. Evaluations on an extensive set of real trajectory, weather, and airport data in Europe verify that our prediction system generates more accurate ETAs with a far smaller standard deviation than those of EUROCONTROL. This translates to smaller prediction windows of flight arrival times, thereby enabling airlines to make more cost-effective ground resource allocation and ANSPs to make more efficient flight schedules.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130567028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Q&R: A Two-Stage Approach toward Interactive Recommendation Q&R:交互式推荐的两阶段方法

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219894

Konstantina Christakopoulou, Alex Beutel, Rui Li, Sagar Jain, Ed H. Chi

{"title":"Q&R: A Two-Stage Approach toward Interactive Recommendation","authors":"Konstantina Christakopoulou, Alex Beutel, Rui Li, Sagar Jain, Ed H. Chi","doi":"10.1145/3219819.3219894","DOIUrl":"https://doi.org/10.1145/3219819.3219894","url":null,"abstract":"Recommendation systems, prevalent in many applications, aim to surface to users the right content at the right time. Recently, researchers have aspired to develop conversational systems that offer seamless interactions with users, more effectively eliciting user preferences and offering better recommendations. Taking a step towards this goal, this paper explores the two stages of a single round of conversation with a user: which question to ask the user, and how to use their feedback to respond with a more accurate recommendation. Following these two stages, first, we detail an RNN-based model for generating topics a user might be interested in, and then extend a state-of-the-art RNN-based video recommender to incorporate the user's selected topic. We describe our proposed system Q&R, i.e., Question & Recommendation, and the surrogate tasks we utilize to bootstrap data for training our models. We evaluate different components of Q&R on live traffic in various applications within YouTube: User Onboarding, Homepage Recommendation, and Notifications. Our results demonstrate that our approach improves upon state-of-the-art recommendation models, including RNNs, and makes these applications more useful, such as a >1% increase in video notifications opened. Further, our design choices can be useful to practitioners wanting to transition to more conversational recommendation systems.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126308597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 107

Learning to Estimate the Travel Time 学会估计旅行时间

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219900

Zheng Wang, Kun Fu, Jieping Ye

{"title":"Learning to Estimate the Travel Time","authors":"Zheng Wang, Kun Fu, Jieping Ye","doi":"10.1145/3219819.3219900","DOIUrl":"https://doi.org/10.1145/3219819.3219900","url":null,"abstract":"Vehicle travel time estimation or estimated time of arrival (ETA) is one of the most important location-based services (LBS). It is becoming increasingly important and has been widely used as a basic service in navigation systems and intelligent transportation systems. This paper presents a novel machine learning solution to predict the vehicle travel time based on floating-car data. First, we formulate ETA as a pure spatial-temporal regression problem based on a large set of effective features. Second, we adapt different existing machine learning models to solve the regression problem. Furthermore, we propose a Wide-Deep-Recurrent (WDR) learning model to accurately predict the travel time along a given route at a given departure time. We then jointly train wide linear models, deep neural networks and recurrent neural networks together to take full advantages of all three models. We evaluate our solution offline with millions of historical vehicle travel data. We also deploy the proposed solution on Didi Chuxing's platform, which services billions of ETA requests and benefits millions of customers per day. Our extensive evaluations show that our proposed deep learning algorithm significantly outperforms the state-of-the-art learning algorithms, as well as the solutions provided by leading industry LBS providers.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"2 25","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131639467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 184

Rotation-blended CNNs on a New Open Dataset for Tropical Cyclone Image-to-intensity Regression 基于新开放数据集的热带气旋图像-强度回归的旋转混合cnn

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219926

Boyo Chen, Buo‐Fu Chen, Hsuan-Tien Lin

{"title":"Rotation-blended CNNs on a New Open Dataset for Tropical Cyclone Image-to-intensity Regression","authors":"Boyo Chen, Buo‐Fu Chen, Hsuan-Tien Lin","doi":"10.1145/3219819.3219926","DOIUrl":"https://doi.org/10.1145/3219819.3219926","url":null,"abstract":"Tropical cyclone (TC) is a type of severe weather systems that occur in tropical regions. Accurate estimation of TC intensity is crucial for disaster management. Moreover, the intensity estimation task is the key to understand and forecast the behavior of TCs better. Recently, the task has begun to attract attention from not only meteorologists but also data scientists. Nevertheless, it is hard to stimulate joint research between both types of scholars without a benchmark dataset to work on together. In this work, we release a such a benchmark dataset, which is a new open dataset collected from satellite remote sensing, for the TC-image-to-intensity estimation task. We also propose a novel model to solve this task based on the convolutional neural network (CNN). We discover that the usual CNN, which is mature for object recognition, requires several modifications when being used for the intensity estimation task. Furthermore, we combine the domain knowledge of meteorologists, such as the rotation-invariance of TCs, into our model design to reach better performance. Experimental results on the released benchmark dataset verify that the proposed model is among the most accurate models that can be used for TC intensity estimation, while being relatively more stable across all situations. The results demonstrate the potential of applying data science for meteorology study.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128951017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Where Will Dockless Shared Bikes be Stacked?: --- Parking Hotspots Detection in a New City 无桩共享单车将堆放在哪里?——新城市停车热点探测

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219920

Zhaoyang Liu, Yanyan Shen, Yanmin Zhu

引用次数: 26

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering TaxoGen:基于自适应词嵌入和聚类的无监督主题分类法构建

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3220064

Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian M. Sadler, M. Vanni, Jiawei Han

{"title":"TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering","authors":"Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian M. Sadler, M. Vanni, Jiawei Han","doi":"10.1145/3219819.3220064","DOIUrl":"https://doi.org/10.1145/3219819.3220064","url":null,"abstract":"Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131868661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 80

Dynamic Bike Reposition: A Spatio-Temporal Reinforcement Learning Approach 动态自行车重新定位:一个时空强化学习方法

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3220110

Yexin Li, Yu Zheng, Qiang Yang

{"title":"Dynamic Bike Reposition: A Spatio-Temporal Reinforcement Learning Approach","authors":"Yexin Li, Yu Zheng, Qiang Yang","doi":"10.1145/3219819.3220110","DOIUrl":"https://doi.org/10.1145/3219819.3220110","url":null,"abstract":"Bike-sharing systems are widely deployed in many major cities, while the jammed and empty stations in them lead to severe customer loss. Currently, operators try to constantly reposition bikes among stations when the system is operating. However, how to efficiently reposition to minimize the customer loss in a long period remains unsolved. We propose a spatio-temporal reinforcement learning based bike reposition model to deal with this problem. Firstly, an inter-independent inner-balance clustering algorithm is proposed to cluster stations into groups. Clusters obtained have two properties, i.e. each cluster is inner-balanced and independent from the others. As there are many trikes repositioning in a very large system simultaneously, clustering is necessary to reduce the problem complexity. Secondly, we allocate multiple trikes to each cluster to conduct inner-cluster bike reposition. A spatio-temporal reinforcement learning model is designed for each cluster to learn a reposition policy in it, targeting at minimizing its customer loss in a long period. To learn each model, we design a deep neural network to estimate its optimal long-term value function, from which the optimal policy can be easily inferred. Besides formulating the model in a multi-agent way, we further reduce its training complexity by two spatio-temporal pruning rules. Thirdly, we design a system simulator based on two predictors to train and evaluate the reposition model. Experiments on real-world datasets from Citi Bike are conducted to confirm the effectiveness of our model.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114316534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 93