Liang Ma, M. Srivatsa, D. Cansever, Xifeng Yan, S. Kase, M. Vanni
{"title":"Query Answering Efficiency in Expert Networks Under Decentralized Search","authors":"Liang Ma, M. Srivatsa, D. Cansever, Xifeng Yan, S. Kase, M. Vanni","doi":"10.1145/2983323.2983652","DOIUrl":"https://doi.org/10.1145/2983323.2983652","url":null,"abstract":"Expert networks are formed by a group of expert-profes-sionals with different specialties to collaboratively resolve specific queries. In such networks, when a query reaches an expert who does not have sufficient expertise, this query needs to be routed to other experts for further processing until it is completely solved; therefore, query answering efficiency is sensitive to the underlying query routing mechanism being used. Among all possible query routing mechanisms, decentralized search, operating purely on each expert's local information without any knowledge of network global structure, represents the most basic and scalable routing mechanism. However, there is still a lack of fundamental understanding of the efficiency of decentralized search in expert networks. In this regard, we investigate decentralized search by quantifying its performance under a variety of network settings. Our key findings reveal the existence of network conditions, under which decentralized search can achieve significantly short query routing paths (i.e., between O(log n) and O(log2 n) hops, n: total number of experts in the network). Based on such theoretical foundation, we then study how the unique properties of decentralized search in expert networks is related to the anecdotal small-world phenomenon. To the best of our knowledge, this is the first work studying fundamental behaviors of decentralized search in expert networks. The developed performance bounds, confirmed by real datasets, can assist in predicting network performance and designing complex expert networks.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127338272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dae Hoon Park, Yi Fang, Mengwen Liu, ChengXiang Zhai
{"title":"Mobile App Retrieval for Social Media Users via Inference of Implicit Intent in Social Media Text","authors":"Dae Hoon Park, Yi Fang, Mengwen Liu, ChengXiang Zhai","doi":"10.1145/2983323.2983843","DOIUrl":"https://doi.org/10.1145/2983323.2983843","url":null,"abstract":"People often implicitly or explicitly express their needs in social media in the form of \"user status text\". Such text can be very useful for service providers and product manufacturers to proactively provide relevant services or products that satisfy people's immediate needs. In this paper, we study how to infer a user's intent based on the user's \"status text\" and retrieve relevant mobile apps that may satisfy the user's needs. We address this problem by framing it as a new entity retrieval task where the query is a user's status text and the entities to be retrieved are mobile apps. We first propose a novel approach that generates a new representation for each query. Our key idea is to leverage social media to build parallel corpora that contain implicit intention text and the corresponding explicit intention text. Specifically, we model various user intentions in social media text using topic models, and we predict user intention in a query that contains implicit intention. Then, we retrieve relevant mobile apps with the predicted user intention. We evaluate the mobile app retrieval task using a new data set we create. Experiment results indicate that the proposed model is effective and outperforms the state-of-the-art retrieval models.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131064916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuyun Zhang, C. Leckie, Wanchun Dou, Jinjun Chen, K. Ramamohanarao, Z. Salcic
{"title":"Scalable Local-Recoding Anonymization using Locality Sensitive Hashing for Big Data Privacy Preservation","authors":"Xuyun Zhang, C. Leckie, Wanchun Dou, Jinjun Chen, K. Ramamohanarao, Z. Salcic","doi":"10.1145/2983323.2983841","DOIUrl":"https://doi.org/10.1145/2983323.2983841","url":null,"abstract":"While cloud computing has become an attractive platform for supporting data intensive applications, a major obstacle to the adoption of cloud computing in sectors such as health and defense is the privacy risk associated with releasing datasets to third-parties in the cloud for analysis. A widely-adopted technique for data privacy preservation is to anonymize data via local recoding. However, most existing local-recoding techniques are either serial or distributed without directly optimizing scalability, thus rendering them unsuitable for big data applications. In this paper, we propose a highly scalable approach to local-recoding anonymization in cloud computing, based on Locality Sensitive Hashing (LSH). Specifically, a novel semantic distance metric is presented for use with LSH to measure the similarity between two data records. Then, LSH with the MinHash function family can be employed to divide datasets into multiple partitions for use with MapReduce to parallelize computation while preserving similarity. By using our efficient LSH-based scheme, we can anonymize each partition through the use of a recursive agglomerative $k$-member clustering algorithm. Extensive experiments on real-life datasets show that our approach significantly improves the scalability and time-efficiency of local-recoding anonymization by orders of magnitude over existing approaches.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130923420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digesting News Reader Comments via Fine-Grained Associations with Event Facets and News Contents","authors":"Bei Shi, Wai Lam","doi":"10.1145/2983323.2983684","DOIUrl":"https://doi.org/10.1145/2983323.2983684","url":null,"abstract":"News articles from different sources reporting the same event are often associated with an enormous amount of reader comments resulting in difficulty in digesting the comments manually. Some of these comments, despite coming from different sources, discuss about a certain facet of the event. On the other hand, some comments discuss on the specific topic of the corresponding news article. We propose a framework that can digest reader comments automatically via fine-grained associations with event facets and news. We propose an unsupervised model called DRC, based on collective matrix factorization and develop a multiplicative-update method to infer the parameters. Experimental results show that our proposed DRC model can provide an effective way to digest news reader comments.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132395715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengwei Wang, Lei Ji, Jun Yan, Lianwen Jin, Wei-Ying Ma
{"title":"Learning to Extract Conditional Knowledge for Question Answering using Dialogue","authors":"Pengwei Wang, Lei Ji, Jun Yan, Lianwen Jin, Wei-Ying Ma","doi":"10.1145/2983323.2983777","DOIUrl":"https://doi.org/10.1145/2983323.2983777","url":null,"abstract":"Knowledge based question answering (KBQA) has attracted much attention from both academia and industry in the field of Artificial Intelligence. However, many existing knowledge bases (KBs) are built by static triples. It is hard to answer user questions with different conditions, which will lead to significant answer variances in questions with similar intent. In this work, we propose to extract conditional knowledge base (CKB) from user question-answer pairs for answering user questions with different conditions through dialogue. Given a subject, we first learn user question patterns and conditions. Then we propose an embedding based co-clustering algorithm to simultaneously group the patterns and conditions by leveraging the answers as supervisor information. After that, we extract the answers to questions conditioned on both question pattern clusters and condition clusters as a CKB. As a result, when users ask a question without clearly specifying the conditions, we use dialogues in natural language to chat with users for question specification and answer retrieval. Experiments on real question answering (QA) data show that the dialogue model using automatically extracted CKB can more accurately answer user questions and significantly improve user satisfaction for questions with missing conditions.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"98 4 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132490917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explaining Sentiment Spikes in Twitter","authors":"Anastasia Giahanou, I. Mele, F. Crestani","doi":"10.1145/2983323.2983678","DOIUrl":"https://doi.org/10.1145/2983323.2983678","url":null,"abstract":"Tracking public opinion in social media provides important information to enterprises or governments during a decision making process. In addition, identifying and extracting the causes of sentiment spikes allows interested parties to redesign and adjust strategies with the aim to attract more positive sentiments. In this paper, we focus on the problem of tracking sentiment towards different entities, detecting sentiment spikes and on the problem of extracting and ranking the causes of a sentiment spike. Our approach combines LDA topic model with Relative Entropy. The former is used for extracting the topics discussed in the time window before the sentiment spike. The latter allows to rank the detected topics based on their contribution to the sentiment spike.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131964283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Estimation of Triangles in Very Large Graphs","authors":"Roohollah Etemadi, Jianguo Lu, Yung H. Tsin","doi":"10.1145/2983323.2983849","DOIUrl":"https://doi.org/10.1145/2983323.2983849","url":null,"abstract":"The number of triangles in a graph is an important metric for understanding the graph. It is also directly related to the clustering coefficient of a graph, which is one of the most important indicator for social networks. Counting the number of triangles is computationally expensive for very large graphs. Hence, estimation is necessary for large graphs, particularly for graphs that are hidden behind searchable interfaces where the graphs in their entirety are not available. For instance, user networks in Twitter and Facebook are not available for third parties to explore their properties directly. This paper proposes a new method to estimate the number of triangles based on random edge sampling. It improves the traditional random edge sampling by probing the edges that have a higher probability of forming triangles. The method outperforms the traditional method consistently, and can be better by orders of magnitude when the graph is very large. The result is demonstrated on 20 graphs, including the largest graphs we can find. More importantly, we proved the improvement ratio, and verified our result on all the datasets. The analytical results are achieved by simplifying the variances of the estimators based on the assumption that the graph is very large. We believe that such big data assumption can lead to interesting results not only in triangle estimation, but also in other sampling problems.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130876032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Collaborative Ranking with Social Relationships in Top-N Recommendation","authors":"Dimitrios Rafailidis, F. Crestani","doi":"10.1145/2983323.2983839","DOIUrl":"https://doi.org/10.1145/2983323.2983839","url":null,"abstract":"With the advent of learning to rank methods, relevant studies showed that Collaborative Ranking (CR) models can produce accurate ranked lists in the top-N recommendation problem. However, in practice several real-world problems decrease their ranking performance, such as the sparsity and cold-start problems, which often occur in recommendation systems for inactive or new users. In this study, to account for the fact that the selections of social friends can improve the recommendation accuracy, we propose a joint CR model based on the users' social relationships. We propose two different CR strategies based on the notions of Social Reverse Height and Social Height, which consider how well the relevant and irrelevant items of users and their social friends have been ranked at the top of the list, respectively. We focus on the top of the list mainly because users see the top-N recommendations in real-world applications, and not the whole ranked list. Furthermore, we formulate a joint objective function to consider both CR strategies, and propose an alternating minimization algorithm to learn our joint CR model. Our experiments on benchmark datasets show that our proposed joint CR model outperforms other state-of-the-art models that either consider social relationships or focus on the ranking performance at the top of the list.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131328362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query Variations and their Effect on Comparing Information Retrieval Systems","authors":"G. Zuccon, João Palotti, A. Hanbury","doi":"10.1145/2983323.2983723","DOIUrl":"https://doi.org/10.1145/2983323.2983723","url":null,"abstract":"We explore the implications of using query variations for evaluating information retrieval systems and how these variations should be exploited to compare system effectiveness. Current evaluation approaches consider the availability of a set of topics (information needs), and only one expression of each topic in the form of a query is used for evaluation and system comparison. While there is strong evidence that considering query variations better models the usage of retrieval systems and accounts for the important user aspect of user variability, it is unclear how to best exploit query variations for evaluating and comparing information retrieval systems. We propose a framework for evaluating retrieval systems that explicitly takes into account query variations. The framework considers both the system mean effectiveness and its variance over query variations and topics, as opposed to current approaches that only consider the mean across topics or perform a topic-focused analysis of variance across systems. Furthermore, the framework extends current evaluation practice by encoding: (1) user tolerance to effectiveness variations, (2) the popularity of different query variations, and (3) the relative importance of individual topics. These extensions and our findings make information retrieval comparisons more aligned with user behaviour.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131346875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Hidden Features for Contextual Bandits","authors":"Huazheng Wang, Qingyun Wu, Hongning Wang","doi":"10.1145/2983323.2983847","DOIUrl":"https://doi.org/10.1145/2983323.2983847","url":null,"abstract":"Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. Most contextual bandit algorithms simply assume the learner would have access to the entire set of features, which govern the generation of payoffs from a user to an item. However, in practice it is challenging to exhaust all relevant features ahead of time, and oftentimes due to privacy or sampling constraints many factors are unobservable to the algorithm. Failing to model such hidden factors leads a system to make constantly suboptimal predictions. In this paper, we propose to learn the hidden features for contextual bandit algorithms. Hidden features are explicitly introduced in our reward generation assumption, in addition to the observable contextual features. A scalable bandit algorithm is achieved via coordinate descent, in which closed form solutions exist at each iteration for both hidden features and bandit parameters. Most importantly, we rigorously prove that the developed contextual bandit algorithm achieves a sublinear upper regret bound with high probability, and a linear regret is inevitable if one fails to model such hidden features. Extensive experimentation on both simulations and large-scale real-world datasets verified the advantages of the proposed algorithm compared with several state-of-the-art contextual bandit algorithms and existing ad-hoc combinations between bandit algorithms and matrix factorization methods.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131711488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}