{"title":"A Nonparametric Model for Event Discovery in the Geospatial-Temporal Space","authors":"Jinjin Guo, Zhiguo Gong","doi":"10.1145/2983323.2983790","DOIUrl":"https://doi.org/10.1145/2983323.2983790","url":null,"abstract":"The availability of geographical and temporal tagged documents enables many location and time based mining tasks. Event discovery is one of such tasks, which is to identify interesting happenings in the geographical and temporal space. In recent years, several techniques have been proposed. However, no existing work has provided a nonparametric algorithm for detecting events in the joint space crossing geographical and temporal dimensions. Furthermore, though some prior works proposed to capture the periodicities of topics in their solutions, some restrictions on the temporal patterns are often placed and they usually ignore the spatial patterns of the topics. To break through such limitations, in this paper we propose a novel nonparametric model to identify events in the geographical and temporal space, where any recurrent patterns of events can be automatically captured. In our approach, parameters are automatically determined by exploiting a Dirichlet Process. To reduce the influence from noisy terms in the detection, we distinguish its event role from its background role using a Bernoulli model in the solution. Experimental results on three real world datasets show the proposed algorithm outperforms previous state-of-the-art approaches.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115687573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fully Dynamic Shortest-Path Distance Query Acceleration on Massive Networks","authors":"Takanori Hayashi, Takuya Akiba, K. Kawarabayashi","doi":"10.1145/2983323.2983731","DOIUrl":"https://doi.org/10.1145/2983323.2983731","url":null,"abstract":"The distance between vertices is one of the most fundamental measures for representing relations between them, and it is the basis of other classic measures of vertices, such as similarity, centrality, and influence. The 2-hop labeling methods are known as the fastest exact point-to-point distance algorithms on million-scale networks. However, they cannot handle billion-scale networks because of the large space requirement and long preprocessing time. In this paper, we present the first algorithm that can process exact distance queries on fully dynamic billion-scale networks besides trivial non-indexing algorithms, which combines an online bidirectional breadth-first search (BFS) and an offline indexing method for handling billion-scale networks in memory. First, we accelerate bidirectional BFSs by using heuristics that exploit the small-world property of complex networks. Then, we construct bit-parallel shortest-path trees to maintain sets of shortest paths passing through high-degree vertices of networks in compact form, the information of which enables us to avoid visiting vertices with high degrees during bidirectional BFSs. Thus, the searches achieve considerable speedup. In addition, our index size reduction technique enables us to handle billion-scale networks in memory. Furthermore, we introduce dynamic update procedures of our data structure to handle fully dynamic networks. We evaluated the performance of the proposed method on real-world networks. In particular, on large-scale social networks with over 1B edges, the proposed method enables us to answer distance queries in around 1 ms, on average.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114267569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Hidden Trajectory Reconstruction from Sparse Data","authors":"Ning Yang, Philip S. Yu","doi":"10.1145/2983323.2983796","DOIUrl":"https://doi.org/10.1145/2983323.2983796","url":null,"abstract":"In this paper, we investigate the problem of reconstructing hidden trajectories from a collective of separate spatial-temporal points without ID information, given the number of hidden trajectories. The challenge is three-fold: lack of meaningful features, data sparsity, and missing trajectory links. We propose a novel approach called Hidden Trajectory Reconstruction (HTR). From an information-theoretic perspective, we devise five novel temporal features and combine them into an Latent Spatial-Temporal Feature Vector (LSTFV) to characterize the dynamics of a single spatial-temporal point. The proposed features have the potential of distinguishing spatial-temporal points between trajectories. To overcome the data sparsity, we assemble the LSTFVs to a sparse Temporal Feature Tensor (TF-Tensor) and propose an algorithm called Parallel Iterative Collaborative Approximation of Sparse Tensor (PICAST). PICAST approximates the TF-Tensor by decomposing it into a tensor product of a low-rank core identity tensor and three dense factor matrices with a divide-and-conquer strategy. To achieve a dense approximate tensor with good accuracy and efficiency, PICAST minimizes a sparsity-measure and fuses an additional matrix of static geographical region features. To recover the missing trajectory links, we propose a mapping, Cross-Temporal Connectivity Preserving Transformation (CTCPT), to map the LSTFVs of the separate spatial-temporal points to an intrinsic space called Cross-Temporal Connectivity Preserving Space (CTCPS). CTCPT uses Cross-Temporal Connectivity (CTC) to evaluate whether two spatial-temporal points belong to the same trajectory and if they do, how strong the connectivity between them is. Due to the CTCPT, the hidden trajectories can be reconstructed from clusters generated in CTCPS by a clustering algorithm. At last, the extensive experiments conducted on synthetic datasets and real datasets verify the effectiveness and efficiency of our algorithms.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116971423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengfei Li, M. Sanderson, Mark James Carman, Falk Scholer
{"title":"On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections","authors":"Pengfei Li, M. Sanderson, Mark James Carman, Falk Scholer","doi":"10.1145/2983323.2983852","DOIUrl":"https://doi.org/10.1145/2983323.2983852","url":null,"abstract":"Query-level instance weighting is a technique for unsupervised transfer ranking, which aims to train a ranker on a source collection so that it also performs effectively on a target collection, even if no judgement information exists for the latter. Past work has shown that this approach can be used to significantly improve effectiveness; in this work, the approach is re-examined on a wide set of publicly available L2R test collections with more advanced learning to rank algorithms. Different query-level weighting strategies are examined against two transfer ranking frameworks: AdaRank and a new weighted LambdaMART algorithm. Our experimental results show that the effectiveness of different weighting strategies, including those shown in past work, vary under different transferring environments. In particular, (i) Kullback-Leibler based density-ratio estimation tends to outperform a classification-based approach and (ii) aggregating document-level weights into query-level weights is likely superior to direct estimation using a query-level representation. The Nemenyi statistical test, applied across multiple datasets, indicates that most weighting transfer learning methods do not significantly outperform baselines, although there is potential for the further development of such techniques.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122114682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Baruah, Haotian Zhang, Rakesh Guttikonda, Jimmy J. Lin, Mark D. Smucker, Olga Vechtomova
{"title":"Optimizing Nugget Annotations with Active Learning","authors":"G. Baruah, Haotian Zhang, Rakesh Guttikonda, Jimmy J. Lin, Mark D. Smucker, Olga Vechtomova","doi":"10.1145/2983323.2983694","DOIUrl":"https://doi.org/10.1145/2983323.2983694","url":null,"abstract":"Nugget-based evaluations, such as those deployed in the TREC Temporal Summarization and Question Answering tracks, require human assessors to determine whether a nugget is present in a given piece of text. This process, known as nugget annotation, is labor-intensive. In this paper, we present two active learning techniques that prioritize the sequence in which candidate nugget/sentence pairs are presented to an assessor, based on the likelihood that the sentence contains a nugget. Our approach builds on the recognition that nugget annotation is similar to high-recall retrieval, and we adapt proven existing solutions. Simulation experiments with four existing TREC test collections show that our techniques yield far more matches for a given level of effort than baselines that are typically deployed in previous nugget-based evaluations.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125151078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KB-Enabled Query Recommendation for Long-Tail Queries","authors":"Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng","doi":"10.1145/2983323.2983650","DOIUrl":"https://doi.org/10.1145/2983323.2983650","url":null,"abstract":"In recent years, query recommendation algorithms have been designed to provide related queries for search engine users. Most of these solutions, which perform extensive analysis of users' search history (or query logs), are largely insufficient for long-tail queries that rarely appear in query logs. To handle such queries, we study a new solution, which makes use of a knowledge base (or KB), such as YAGO and Freebase. A KB is a rich information source that describes how real-world entities are connected. We extract entities from a query, and use these entities to explore new ones in the KB. Those discovered entities are then used to suggest new queries to the user. As shown in our experiments, our approach provides better recommendation results for long-tail queries than existing solutions.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125170778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowdsourcing-based Urban Anomaly Prediction System for Smart Cities","authors":"Chao Huang, X. Wu, Dong Wang","doi":"10.1145/2983323.2983886","DOIUrl":"https://doi.org/10.1145/2983323.2983886","url":null,"abstract":"Crowdsourcing has become an emerging data collection paradigm for smart city applications. A new category of crowdsourcing-based urban anomaly reporting systems have been developed to enable pervasive and real-time reporting of anomalies in cities (e.g., noise, illegal use of public facilities, urban infrastructure malfunctions). An interesting challenge in these applications is how to accurately predict an anomaly in a given region of the city before it happens. Prior works have made significant progress in anomaly detection. However, they can only detect anomalies after they happen, which may lead to significant information delay and lack of preparedness to handle the anomalies in an efficient way. In this paper, we develop a Crowdsourcing-based Urban Anomaly Prediction Scheme (CUAPS) to accurately predict the anomalies of a city by exploring both spatial and temporal information embedded in the crowdsourcing data. We evaluated the performance of our scheme and compared it to the state-of-the-art baselines using four real-world datasets collected from 311 service in the city of New York. The results showed that our scheme can predict different categories of anomalies in a city more accurately than the baselines.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126172406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zimu Zheng, Dan Wang, J. Pei, Yi Yuan, C. Fan, Linda Fu Xiao
{"title":"Urban Traffic Prediction through the Second Use of Inexpensive Big Data from Buildings","authors":"Zimu Zheng, Dan Wang, J. Pei, Yi Yuan, C. Fan, Linda Fu Xiao","doi":"10.1145/2983323.2983357","DOIUrl":"https://doi.org/10.1145/2983323.2983357","url":null,"abstract":"Traffic prediction, particularly in urban regions, is an important application of tremendous practical value. In this paper, we report a novel and interesting case study of urban traffic prediction in Central, Hong Kong, one of the densest urban areas in the world. The novelty of our study is that we make good second use of inexpensive big data collected from the Hong Kong International Commerce Centre (ICC), a 118-story building in Hong Kong where more than 10,000 people work. As building environment data are much cheaper to obtain than traffic data, we demonstrate that it is highly effective to estimate building occupancy information using building environment data, and then to further use the information on occupancy to provide traffic predictions in the proximate area. Scientifically, we investigate how and to what extent building data can complement traffic data in predicting traffic. In general, this study sheds new light on the development of accurate data mining applications through the second use of inexpensive big data.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129323076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hua Wei, Yuandong Wang, Tianyu Wo, Yaxiao Liu, Jie Xu
{"title":"ZEST: A Hybrid Model on Predicting Passenger Demand for Chauffeured Car Service","authors":"Hua Wei, Yuandong Wang, Tianyu Wo, Yaxiao Liu, Jie Xu","doi":"10.1145/2983323.2983667","DOIUrl":"https://doi.org/10.1145/2983323.2983667","url":null,"abstract":"Chauffeured car service based on mobile applications like Uber or Didi suffers from supply-demand disequilibrium, which can be alleviated by proper prediction on the distribution of passenger demand. In this paper, we propose a Zero-Grid Ensemble Spatio Temporal model (ZEST) to predict passenger demand with four predictors: a temporal predictor and a spatial predictor to model the influences of local and spatial factors separately, an ensemble predictor to combine the results of former two predictors comprehensively and a Zero-Grid predictor to predict zero demand areas specifically since any cruising within these areas costs extra waste on energy and time of driver. We demonstrate the performance of ZEST on actual operational data from ride-hailing applications with more than 6 million order records and 500 million GPS points. Experimental results indicate our model outperforms 5 other baseline models by over 10% both in MAE and sMAPE on the three-month datasets.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129363379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Routing an Autonomous Taxi with Reinforcement Learning","authors":"Miyoung Han, P. Senellart, S. Bressan, Huayu Wu","doi":"10.1145/2983323.2983379","DOIUrl":"https://doi.org/10.1145/2983323.2983379","url":null,"abstract":"Singapore's vision of a Smart Nation encompasses the development of effective and efficient means of transportation. The government's target is to leverage new technologies to create services for a demand-driven intelligent transportation model including personal vehicles, public transport, and taxis. Singapore's government is strongly encouraging and supporting research and development of technologies for autonomous vehicles in general and autonomous taxis in particular. The design and implementation of intelligent routing algorithms is one of the keys to the deployment of autonomous taxis. In this paper we demonstrate that a reinforcement learning algorithm of the Q-learning family, based on a customized exploration and exploitation strategy, is able to learn optimal actions for the routing autonomous taxis in a real scenario at the scale of the city of Singapore with pick-up and drop-off events for a fleet of one thousand taxis.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128554957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}