{"title":"Tensor Rank Estimation and Completion via CP-based Nuclear Norm","authors":"Qiquan Shi, Haiping Lu, Yiu-ming Cheung","doi":"10.1145/3132847.3132945","DOIUrl":"https://doi.org/10.1145/3132847.3132945","url":null,"abstract":"Tensor completion (TC) is a challenging problem of recovering missing entries of a tensor from its partial observation. One main TC approach is based on CP/Tucker decomposition. However, this approach often requires the determination of a tensor rank a priori. This rank estimation problem is difficult in practice. Several Bayesian solutions have been proposed but they often under/over-estimate the tensor rank while being quite slow. To address this problem of rank estimation with missing entries, we view the weight vector of the orthogonal CP decomposition of a tensor to be analogous to the vector of singular values of a matrix. Subsequently, we define a new CP-based tensor nuclear norm as the $L_1$-norm of this weight vector. We then propose Tensor Rank Estimation based on $L_1$-regularized orthogonal CP decomposition (TREL1) for both CP-rank and Tucker-rank. Specifically, we incorporate a regularization with CP-based tensor nuclear norm when minimizing the reconstruction error in TC to automatically determine the rank of an incomplete tensor. Experimental results on both synthetic and real data show that: 1) Given sufficient observed entries, TREL1 can estimate the true rank (both CP-rank and Tucker-rank) of incomplete tensors well; 2) The rank estimated by TREL1 can consistently improve recovery accuracy of decomposition-based TC methods; 3) TREL1 is not sensitive to its parameters in general and more efficient than existing rank estimation methods.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75262250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuqi Wang, Jiannong Cao, Lifang He, Wengen Li, Lichao Sun, Philip S. Yu
{"title":"Coupled Sparse Matrix Factorization for Response Time Prediction in Logistics Services","authors":"Yuqi Wang, Jiannong Cao, Lifang He, Wengen Li, Lichao Sun, Philip S. Yu","doi":"10.1145/3132847.3132948","DOIUrl":"https://doi.org/10.1145/3132847.3132948","url":null,"abstract":"Nowadays, there is an emerging way of connecting logistics orders and van drivers, where it is crucial to predict the order response time. Accurate prediction of order response time would not only facilitate decision making on order dispatching, but also pave ways for applications such as supply-demand analysis and driver scheduling, leading to high system efficiency. In this work, we forecast order response time on current day by fusing data from order history and driver historical locations. Specifically, we propose Coupled Sparse Matrix Factorization (CSMF) to deal with the heterogeneous fusion and data sparsity challenges raised in this problem. CSMF jointly learns from multiple heterogeneous sparse data through the proposed weight setting mechanism therein. Experiments on real-world datasets demonstrate the effectiveness of our approach, compared to various baseline methods. The performances of many variants of the proposed method are also presented to show the effectiveness of each component.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75269657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Social Bots by Jointly Modeling Deep Behavior and Content Information","authors":"C. Cai, Linjing Li, D. Zeng","doi":"10.1145/3132847.3133050","DOIUrl":"https://doi.org/10.1145/3132847.3133050","url":null,"abstract":"Bots are regarded as the most common kind of malwares in the era of Web 2.0. In recent years, Internet has been populated by hundreds of millions of bots, especially on social media. Thus, the demand on effective and efficient bot detection algorithms is more urgent than ever. Existing works have partly satisfied this requirement by way of laborious feature engineering. In this paper, we propose a deep bot detection model aiming to learn an effective representation of social user and then detect social bots by jointly modeling social behavior and content information. The proposed model learns the representation of social behavior by encoding both endogenous and exogenous factors which affect user behavior. As to the representation of content, we regard the user content as temporal text data instead of just plain text as be treated in other existing works to extract semantic information and latent temporal patterns. To the best of our knowledge, this is the first trial that applies deep learning in modeling social users and accomplishing social bot detection. Experiments on real world dataset collected from Twitter demonstrate the effectiveness of the proposed model.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73428419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaiping Zheng, Wei Wang, Jinyang Gao, K. Ngiam, B. Ooi, J. Yip
{"title":"Capturing Feature-Level Irregularity in Disease Progression Modeling","authors":"Kaiping Zheng, Wei Wang, Jinyang Gao, K. Ngiam, B. Ooi, J. Yip","doi":"10.1145/3132847.3132944","DOIUrl":"https://doi.org/10.1145/3132847.3132944","url":null,"abstract":"Disease progression modeling (DPM) analyzes patients' electronic medical records (EMR) to predict the health state of patients, which facilitates accurate prognosis, early detection and treatment of chronic diseases. However, EMR are irregular because patients visit hospital irregularly based on the need of treatment. For each visit, they are typically given different diagnoses, prescribed various medications and lab tests. Consequently, EMR exhibit irregularity at the feature level. To handle this issue, we propose a model based on the Gated Recurrent Unit by decaying the effect of previous records using fine-grained feature-level time span information, and learn the decaying parameters for different features to take into account their different behaviours like decaying speeds under irregularity. Extensive experimental results in both an Alzheimer's disease dataset and a chronic kidney disease dataset demonstrate that our proposed model of capturing feature-level irregularity can effectively improve the accuracy of DPM.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"180 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75365561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Navbox Generation by Interpretable Clustering over Linked Entities","authors":"Chenhao Xie, Lihan Chen, Jiaqing Liang, Kezun Zhang, Yanghua Xiao, Hanghang Tong, Haixun Wang, Wei Wang","doi":"10.1145/3132847.3132899","DOIUrl":"https://doi.org/10.1145/3132847.3132899","url":null,"abstract":"Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"178 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75377581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deception Detection: When Computers Become Better than Humans","authors":"Rada Mihalcea","doi":"10.1145/3132847.3137174","DOIUrl":"https://doi.org/10.1145/3132847.3137174","url":null,"abstract":"Whether we like it or not, deception happens every day and everywhere: thousands of trials taking place daily around the world; little white lies: \"I'm busy that day!\" even if your calendar is blank; news \"with a twist\" (a.k.a. fake news) meant to attract the readers attraction, and get some advertisement clicks on the side; portrayed identities, on dating sites and elsewhere. Can a computer automatically detect deception in written accounts or in video recordings? In this talk, I will describe our work in building linguistic and multimodal algorithms for deception detection, targeting deceptive statements, trial videos, fake news, identity deceptions, and also going after deception in multiple cultures. I will also show how these algorithms can provide insights into what makes a good lie - and thus teach us how we can spot a liar. As it turns out, computers can be trained to identify lies in many different contexts, and they can do it much better than humans do!","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75576058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FM-Hawkes: A Hawkes Process Based Approach for Modeling Online Activity Correlations","authors":"Sha Li, Xiaofeng Gao, Weiming Bao, Guihai Chen","doi":"10.1145/3132847.3132883","DOIUrl":"https://doi.org/10.1145/3132847.3132883","url":null,"abstract":"Understanding and predicting user behavior on online platforms has proved to be of significant value, with applications spanning from targeted advertising, political campaigning, anomaly detection to user self-monitoring. With the growing functionality and flexibility of online platforms, users can now accomplish a variety of tasks online. This advancement has rendered many previous works that focus on modeling a single type of activity obsolete. In this work, we target this new problem by modeling the interplay between the time series of different types of activities and apply our model to predict future user behavior. Our model, FM-Hawkes, stands for Fourier-based kernel multi-dimensional Hawkes process. Specifically, we model the multiple activity time series as a multi-dimensional Hawkes process. The correlations between different types of activities are then captured by the influence factor. As for the temporal triggering kernel, we observe that the intensity function consists of numerous kernel functions with time shift. Thus, we employ a Fourier transformation based non-parametric estimation. Our model is not bound to any particular platform and explicitly interprets the causal relationship between actions. By applying our model to real-life datasets, we confirm that the mutual excitation effect between different activities prevails among users. Prediction results show our superiority over models that do not consider action types and flexible kernels","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76157235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query and Animate Multi-attribute Trajectory Data","authors":"Jianqiu Xu, R. H. Güting","doi":"10.1145/3132847.3133178","DOIUrl":"https://doi.org/10.1145/3132847.3133178","url":null,"abstract":"The widespread use of GPS-enabled devices has led to huge amounts of trajectory data. In addition to location and time, trajectories are associated with descriptive attributes representing different aspects of real entities, called multi-attribute trajectories. This comes from the combination of several data sources and enables a range of new applications in which users can find interesting trajectories and discover potential relationships that cannot be determined solely based on GPS data. In this demo, we provide the motivation scenario and introduce a system that is developed to integrate standard trajectories (a sequence of timestamped locations) and attributes into one unified framework. The system is able to answer a range of interesting queries on multi-attribute trajectories that are not handled by standard trajectories. The system supports both standard trajectories and multi-attribute trajectories. We demonstrate how to form queries and animate multi-attribute trajectories in the system. To our knowledge, existing moving objects prototype systems do not support multi-attribute trajectories.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"11 1-2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72624418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthias Hagen, Martin Potthast, Payam Adineh, Ehsan Fatehifar, Benno Stein
{"title":"Source Retrieval for Web-Scale Text Reuse Detection","authors":"Matthias Hagen, Martin Potthast, Payam Adineh, Ehsan Fatehifar, Benno Stein","doi":"10.1145/3132847.3133097","DOIUrl":"https://doi.org/10.1145/3132847.3133097","url":null,"abstract":"The first step of text reuse detection addresses the source retrieval problem: given a suspicious document, a set of candidate sources from which text might have been reused have to be retrieved by querying a search engine. Afterwards, in a second step, the retrieved candidates run through a text alignment with the suspicious document in order to identify reused passages. Obviously, any true source of text reuse that is not retrieved during the source retrieval step reduces the overall recall of a reuse detector. Hence, source retrieval is a recall-oriented task, a fact ignored even by experts: Only 3 of 20 teams participating in a respective task at PAN 2012-2016 managed to find more than half of the sources, the best one achieving a recall of only~0.59. We propose a new approach that reaches a recall of~0.89---a performance gain of~51%.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"197 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77651220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Additional Workshops Co-located with CIKM 2017","authors":"M. Winslett","doi":"10.1145/3132847.3152359","DOIUrl":"https://doi.org/10.1145/3132847.3152359","url":null,"abstract":"Summary of three workshops co-located with CIKM 2017.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77694724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}