{"title":"User Modeling for a Personal Assistant","authors":"R. Guha, Vineet Gupta, V. Raghunathan, R. Srikant","doi":"10.1145/2684822.2685309","DOIUrl":"https://doi.org/10.1145/2684822.2685309","url":null,"abstract":"We present a user modeling system that serves as the foundation of a personal assistant. The system ingests web search history for signed-in users, and identifies coherent contexts that correspond to tasks, interests, and habits. Unlike past work which focused on either in-session tasks or tasks over a few days, we look at several months of history in order to identify not just short-term tasks, but also long-term interests and habits. The features we use for identifying coherent contexts yield substantially higher precision and recall than past work. We also present an algorithm for identifying contexts that is 8 to 30 times faster than previous algorithms. The user modeling system has been deployed in production. It runs over hundreds of millions of users, and updates the models with a 10-minute latency. The contexts identified by the system serve as the foundation for generating recommendations in Google Now.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116002273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuzi Niu, Yanyan Lan, J. Guo, Xueqi Cheng, Lei-Ping Yu, Guoping Long
{"title":"Listwise Approach for Rank Aggregation in Crowdsourcing","authors":"Shuzi Niu, Yanyan Lan, J. Guo, Xueqi Cheng, Lei-Ping Yu, Guoping Long","doi":"10.1145/2684822.2685308","DOIUrl":"https://doi.org/10.1145/2684822.2685308","url":null,"abstract":"Inferring a gold-standard ranking over a set of objects, such as documents or images, is a key task to build test collections for various applications like Web search and recommender systems. Crowdsourcing services provide an efficient and inexpensive way to collect judgments via labeling by sets of annotators. We thus study the problem of finding a consensus ranking from crowdsourced judgments. In contrast to conventional rank aggregation methods which minimize the distance between predicted ranking and input judgments from either pointwise or pairwise perspective, we argue that it is critical to consider the distance in a listwise way to emphasize the position importance in ranking. Therefore, we introduce a new listwise approach in this paper, where ranking measure based objective functions are utilized for optimization. In addition, we also incorporate the annotator quality into our model since the reliability of annotators can vary significantly in crowdsourcing. For optimization, we transform the optimization problem to the Linear Sum Assignment Problem, and then solve it by a very efficient algorithm named CrowdAgg guaranteeing the optimal solution. Experimental results on two benchmark data sets from different crowdsourcing tasks show that our algorithm is much more effective, efficient and robust than traditional methods.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127371353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juchao Zhuo, Zeqian Huang, Yunfeng Liu, Zhanhui Kang, Xun Cao, Mingzhi Li, Long Jin
{"title":"Semantic Matching in APP Search","authors":"Juchao Zhuo, Zeqian Huang, Yunfeng Liu, Zhanhui Kang, Xun Cao, Mingzhi Li, Long Jin","doi":"10.1145/2684822.2697046","DOIUrl":"https://doi.org/10.1145/2684822.2697046","url":null,"abstract":"Past years, with the growth of smart-phones and applications, APP market has become an important mobile internet portal. As an important function in application market, APP search gains lots of attentions.However, mismatch between queries and APP is the most critical problem in APP search because of less text within term matching search engine. In this talk, we describe a semantic matching architecture in APP search--which mining topics and tags in big data. It enriches query and APP representations with topics and tags to achieve semantic matching in search. Some challenge must be considered: 1) How to extract tag-APP relationship from large web text. 2) How to use machine learning technologies to process de-noising and computing confidence. 3) How to hybrid ranking apps retrieved by different matching method. These will be introduced in some of our related works and as examples to describe how semantic matching is used in Tencent MyApp, an application market which serving hundreds of millions of users.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127408193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chronological Scientific Information Recommendation via Supervised Dynamic Topic Modeling","authors":"Zhuoren Jiang","doi":"10.1145/2684822.2697036","DOIUrl":"https://doi.org/10.1145/2684822.2697036","url":null,"abstract":"Scientific information recommendation is crucial to assist scholars for their researches. Citation recommendation is an important field of scientific recommendation. Traditional approaches ignore the chronological nature of the citation recommendation task. In this study, I propose the \"Chronological Citation Recommendation,\" which assumes initial user information need could shift while they are looking for the papers in different time slices. Specifically, I employed a supervised dynamic topic model to characterize the content \"time-varying\" dynamics and constructed a novel heterogeneous graph that contains dynamic topic-based information, time-decay citation information and word-based information. I applied different meta-paths for different ranking hypotheses, which carried different types of information for citation recommendation in different time slices along with information need shifting. I plan to generate the final \"Chronological Citation Recommendation\" rankings by feature integration using Learning to Rank. \"Chronological Citation Recommendation\" will recommend time-series ranking lists based on initial user textual information need. Preliminary experiments on the ACM corpus show that chronological citation recommendation will significantly improve the citation recommendation performance.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"580 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132413735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topics, Tasks & Beyond: Learning Representations for Personalization","authors":"Rishabh Mehrotra","doi":"10.1145/2684822.2697037","DOIUrl":"https://doi.org/10.1145/2684822.2697037","url":null,"abstract":"Accurate understanding of a user's interests, preferences and behaviours is possibly one of the most critical research challenges faced while developing personalized systems for behavior targeting and information access. We intend to develop comprehensive latent variable models for web search personalization which jointly models user's topical interests along with user's click based relevance preferences while at the same time taking into account user's intended search tasks along with information about other similar users. We further augment this model by incorporating topic-level relevance parameters, which, to the best of our knowledge, is the first attempt at modeling result ranking preferences at the topic level. Additionally, we intend to explore the possibility of modeling users in terms of the search tasks they perform thereby coupling users' topical interests with their search task behavior to learn user representations. Finally, we wish to evaluate the proposition of extending user representations to hierarchical structures as an alternative to existing flat representations. The evaluation of these alternative approaches for user modeling is based on their performance on a variety of tasks such as collaborative query recommendations, user cohort modeling and search result personalization. This proposal provides the motivation to pursue these research directions, summarizes key research problems being targeted, glances through potential ways of tackling these research challenges and highlights some initial results obtained.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132315349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oana Balalau, F. Bonchi, T-H. Hubert Chan, Francesco Gullo, Mauro Sozio
{"title":"Finding Subgraphs with Maximum Total Density and Limited Overlap","authors":"Oana Balalau, F. Bonchi, T-H. Hubert Chan, Francesco Gullo, Mauro Sozio","doi":"10.1145/2684822.2685298","DOIUrl":"https://doi.org/10.1145/2684822.2685298","url":null,"abstract":"Finding dense subgraphs in large graphs is a key primitive in a variety of real-world application domains, encompassing social network analytics, event detection, biology, and finance. In most such applications, one typically aims at finding several (possibly overlapping) dense subgraphs which might correspond to communities in social networks or interesting events. While a large amount of work is devoted to finding a single densest subgraph, perhaps surprisingly, the problem of finding several dense subgraphs with limited overlap has not been studied in a principled way, to the best of our knowledge. In this work we define and study a natural generalization of the densest subgraph problem, where the main goal is to find at most $k$ subgraphs with maximum total aggregate density, while satisfying an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs. After showing that such a problem is NP-Hard, we devise an efficient algorithm that comes with provable guarantees in some cases of interest, as well as, an efficient practical heuristic. Our extensive evaluation on large real-world graphs confirms the efficiency and effectiveness of our algorithms.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128933750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Groups Stability in Ubiquitous and Social Environments: Communities, Classes and Clusters","authors":"Mark Kibanov","doi":"10.1145/2684822.2697034","DOIUrl":"https://doi.org/10.1145/2684822.2697034","url":null,"abstract":"Ubiquitous Computing is an emerging research area of computer science. Similarly, social network analysis and mining became very important in the last years. We aim to combine these two research areas to explore the nature of processes happening around users. The presented research focuses on exploring and analyzing different groups of persons or entities (communities, clusters and classes), their stability and semantics. An example of ubiquitous social data are social networks captured during scientific conferences using face-to-face RFID proximity tags. Another example of ubiquitous data is crowd-generated environmental sensor data. In this paper we generalize various problems connected to these and further datasets and consider them as a task for measuring group stability. Group stability can be used to improve state-of-the-art methods to analyze data. We also aim to improve the performance of different data mining algorithms, eg. by better handling of data with a skewed density distribution. We describe significant results some experiments that show how the presented approach can be applied and discuss the planned experiments.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115154705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and Space-Efficient Entity Linking for Queries","authors":"Roi Blanco, G. Ottaviano, E. Meij","doi":"10.1145/2684822.2685317","DOIUrl":"https://doi.org/10.1145/2684822.2685317","url":null,"abstract":"Entity linking deals with identifying entities from a knowledge base in a given piece of text and has become a fundamental building block for web search engines, enabling numerous downstream improvements from better document ranking to enhanced search results pages. A key problem in the context of web search queries is that this process needs to run under severe time constraints as it has to be performed before any actual retrieval takes place, typically within milliseconds. In this paper we propose a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base. There are three key ingredients that make the algorithm fast and space-efficient. First, the linking process ignores any dependencies between the different entity candidates, which allows for a O(k2) implementation in the number of query terms. Second, we leverage hashing and compression techniques to reduce the memory footprint. Finally, to equip the algorithm with contextual knowledge without sacrificing speed, we factor the distance between distributional semantics of the query words and entities into the model. We show that our solution significantly outperforms several state-of-the-art baselines by more than 14% while being able to process queries in sub-millisecond times---at least two orders of magnitude faster than existing systems.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Power of Random Neighbors in Social Networks","authors":"Silvio Lattanzi, Yaron Singer","doi":"10.1145/2684822.2685293","DOIUrl":"https://doi.org/10.1145/2684822.2685293","url":null,"abstract":"The friendship paradox is a sociological phenomenon first discovered by Feld which states that individuals are likely to have fewer friends than their friends do, on average. This phenomenon has become common knowledge, has several interesting applications, and has also been observed in various data sets. In his seminal paper Feld provides an intuitive explanation by showing that in any graph the average degree of edges in the graph is an upper bound on the average degree of nodes. Despite the appeal of this argument, it does not prove the existence of the friendship paradox. In fact, it is easy to construct networks -- even with power law degree distributions -- where the ratio between the average degree of neighbors and the average degree of nodes is high, but all nodes have the exact same degree as their neighbors. Which models, then, explain the friendship paradox? In this paper we give a strong characterization that provides a formal understanding of the friendship paradox. We show that for any power law graph with exponential parameter in (1,3), when every edge is rewired with constant probability, the friendship paradox holds, i.e. there is an asymptotic gap between the average degree of the sample of polylogarithmic size and the average degree of a random set of its neighbors of equal size. To examine this characterization on real data, we performed several experiments on social network data sets that complement our theoretical analysis. We also discuss the applications of our result to influence maximization.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128183550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FLAME: A Probabilistic Model Combining Aspect Based Opinion Mining and Collaborative Filtering","authors":"Yao Wu, M. Ester","doi":"10.1145/2684822.2685291","DOIUrl":"https://doi.org/10.1145/2684822.2685291","url":null,"abstract":"Aspect-based opinion mining from online reviews has attracted a lot of attention recently. Given a set of reviews, the main task of aspect-based opinion mining is to extract major aspects of the items and to infer the latent aspect ratings from each review. However, users may have different preferences which might lead to different opinions on the same aspect of an item. Even if fine-grained aspect rating analysis is provided for each review, it is still difficult for a user to judge whether a specific aspect of an item meets his own expectation. In this paper, we study the problem of estimating personalized sentiment polarities on different aspects of the items. We propose a unified probabilistic model called Factorized Latent Aspect ModEl (FLAME), which combines the advantages of collaborative filtering and aspect based opinion mining. FLAME learns users' personalized preferences on different aspects from their past reviews, and predicts users' aspect ratings on new items by collective intelligence. Experiments on two online review datasets show that FLAME outperforms state-of-the-art methods on the tasks of aspect identification and aspect rating prediction.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}