Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining最新文献_第3页

Index Compression Using Byte-Aligned ANS Coding and Two-Dimensional Contexts 使用字节对齐的ANS编码和二维上下文的索引压缩

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3159663

Alistair Moffat, M. Petri

{"title":"Index Compression Using Byte-Aligned ANS Coding and Two-Dimensional Contexts","authors":"Alistair Moffat, M. Petri","doi":"10.1145/3159652.3159663","DOIUrl":"https://doi.org/10.1145/3159652.3159663","url":null,"abstract":"We examine approaches used for block-based inverted index compression, such as the OptPFOR mechanism, in which fixed-length blocks of postings data are compressed independently of each other. Building on previous work in which asymmetric numeral systems (ANS) entropy coding is used to represent each block, we explore a number of enhancements: (i) the use of two-dimensional conditioning contexts, with two aggregate parameters used in each block to categorize the distribution of symbol values that underlies the ANS approach, rather than just one; (ii) the use of a byte-friendly strategic mapping from symbols to ANS codeword buckets; and (iii) the use of a context merging process to combine similar probability distributions. Collectively, these improvements yield superior compression for index data, outperforming the reference point set by the Interp mechanism, and hence representing a significant step forward. We describe experiments using the 426 GiB gov2 collection and a new large collection of publicly-available news articles to demonstrate that claim, and provide query evaluation throughput rates compared to other block-based mechanisms.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129694736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Customer Purchase Behavior Prediction from Payment Datasets 基于支付数据集的客户购买行为预测

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3159707

Y. Wen, Pei-Wen Yeh, Tzu-Hao Tsai, Wen-Chih Peng, Hong-Han Shuai

{"title":"Customer Purchase Behavior Prediction from Payment Datasets","authors":"Y. Wen, Pei-Wen Yeh, Tzu-Hao Tsai, Wen-Chih Peng, Hong-Han Shuai","doi":"10.1145/3159652.3159707","DOIUrl":"https://doi.org/10.1145/3159652.3159707","url":null,"abstract":"With the advances in the development of mobile payments, a huge amount of payment data are collected by banks. User payment data offer a good dataset to depict customer behavior patterns. A comprehensive understanding of customers' purchase behavior is crucial to developing good marketing strategies, which may trigger much greater purchase amounts. For example, by exploring customer behavior patterns, given a target store, a set of potential customers is able to be identified. In other words, personalized campaigns at the right time and in the right place can be treated as the last stage of consumption. Here we propose a probability graphical model that exploits the payment data to discover customer purchase behavior in the spatial, temporal, payment amount and product category aspects, named STPC-PGM. As a result, the mobility behavior of an individual user could be predicted with a probabilistic graphical model that accounts for all aspects of each customer's relationship with the payment platform. To achieve real time advertising, we then develop an online framework that efficiently computes the prediction results. Our experiment results show that STPC-PGM is effective in discovering customers' profiling features, and outperforms the state-of-the-art methods in purchase behavior prediction. In addition, the prediction results are being deployed in the marketing of real-world credit card users, and have presented a significant growth in the advertising conversion rate.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116770106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Network Science of Teams: Characterization, Prediction, and Optimization 团队的网络科学:表征、预测和优化

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3162008

Liangyue Li, Hanghang Tong

{"title":"Network Science of Teams: Characterization, Prediction, and Optimization","authors":"Liangyue Li, Hanghang Tong","doi":"10.1145/3159652.3162008","DOIUrl":"https://doi.org/10.1145/3159652.3162008","url":null,"abstract":"Teams are increasingly indispensable to achievements in any organization. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the social, cognitive and information level in relation to team performance and network dynamics. Generally speaking, the team performance can be viewed as the composite of its users, the tasks that the team performs and the networks that the team is embedded in or operates on. The goal of this tutorial is to (1) provide a comprehensive review of the recent advances in optimizing teams' performance in the context of networks; and (2) identify the open challenges and future trends. We believe this is an emerging and high-impact topic in computational social science, which will attract both researchers and practitioners in the data mining as well as social science research communities. Our emphasis will be on (1) the recent emerging techniques on addressing team performance optimization problem; and (2) the open challenges/future trends, with a careful balance between the theories, algorithms and applications.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128309273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Collabot: Personalized Group Chat Summarization Collabot:个性化群组聊天汇总

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3160588

N. Tepper, Anat Hashavit, Maya Barnea, Inbal Ronen, L. Leiba

{"title":"Collabot: Personalized Group Chat Summarization","authors":"N. Tepper, Anat Hashavit, Maya Barnea, Inbal Ronen, L. Leiba","doi":"10.1145/3159652.3160588","DOIUrl":"https://doi.org/10.1145/3159652.3160588","url":null,"abstract":"In recent years, enterprise group chat collaboration tools, such as Slack, IBM»s Watson Workspace and Microsoft Teams, have presented unprecedented growth. With all the potential benefits of these tools - productivity increase and improved group communication - come significant challenges. Specifically, the 'always on' feature that makes it hard for users to cope with the load of conversational content and get up to speed after logging off for a while. In this demo, we present Collabot - a chat assistant service that implicitly learns users interests and social ties within a chat group and provides a personalized digest of missed content. Collabot assists users in coping with chat information overload by helping them understand the main topics discussed, collaborators, links and resources. This demo has two main contributions. First, we present a novel personalized group chat summarization algorithm; second the demonstration depicts a working implementation applied on different chat groups from different domains within IBM. A video, describing the demo can be found at https://www.youtube.com/watch?v=6cVsstiJ9vk.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"92 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128020970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Learning to Discover Domain-Specific Web Content 学习发现特定领域的Web内容

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3159724

Kien Pham, Aécio Santos, J. Freire

{"title":"Learning to Discover Domain-Specific Web Content","authors":"Kien Pham, Aécio Santos, J. Freire","doi":"10.1145/3159652.3159724","DOIUrl":"https://doi.org/10.1145/3159652.3159724","url":null,"abstract":"The ability to discover all content relevant to an information domain has many applications, from helping in the understanding of humanitarian crises to countering human and arms trafficking. In such applications, time is of essence: it is crucial to both maximize coverage and identify new content as soon as it becomes available, so that appropriate actions can be taken. In this paper, we propose new methods for efficient domain-specific re-crawling that maximize the yield for new content. By learning patterns of pages that have a high yield, our methods select a small set of pages that can be re-crawled frequently, increasing the coverage and freshness while conserving resources. Unlike previous approaches to this problem, our methods combine different factors to optimize the re-crawling strategy, do not require full snapshots for the learning step, and dynamically adapt the strategy as the crawl progresses. In an empirical evaluation, we have simulated the framework over 600 partial crawl snapshots in three different domains. The results show that our approach can achieve 150% higher coverage compared to existing, state-of-the-art techniques. In addition, it is also able to capture 80% of new relevant content within less than 4 hours of publication.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114260557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Sequential Recommendation with User Memory Networks 用户内存网络的顺序推荐

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3159668

Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, H. Zha

{"title":"Sequential Recommendation with User Memory Networks","authors":"Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, H. Zha","doi":"10.1145/3159652.3159668","DOIUrl":"https://doi.org/10.1145/3159652.3159668","url":null,"abstract":"User preferences are usually dynamic in real-world recommender systems, and a user»s historical behavior records may not be equally important when predicting his/her future interests. Existing recommendation algorithms -- including both shallow and deep approaches -- usually embed a user»s historical records into a single latent vector/representation, which may have lost the per item- or feature-level correlations between a user»s historical records and future interests. In this paper, we aim to express, store, and manipulate users» historical records in a more explicit, dynamic, and effective manner. To do so, we introduce the memory mechanism to recommender systems. Specifically, we design a memory-augmented neural network (MANN) integrated with the insights of collaborative filtering for recommendation. By leveraging the external memory matrix in MANN, we store and update users» historical records explicitly, which enhances the expressiveness of the model. We further adapt our framework to both item- and feature-level versions, and design the corresponding memory reading/writing operations according to the nature of personalized recommendation scenarios. Compared with state-of-the-art methods that consider users» sequential behavior for recommendation, e.g., sequential recommenders with recurrent neural networks (RNN) or Markov chains, our method achieves significantly and consistently better performance on four real-world datasets. Moreover, experimental analyses show that our method is able to extract the intuitive patterns of how users» future actions are affected by previous behaviors.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114546407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 421

WSDM Cup 2018: Music Recommendation and Churn Prediction WSDM杯2018:音乐推荐和流失预测

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3160605

Yian Chen, Xing Xie, Shou-de Lin, Arden Chiu

{"title":"WSDM Cup 2018: Music Recommendation and Churn Prediction","authors":"Yian Chen, Xing Xie, Shou-de Lin, Arden Chiu","doi":"10.1145/3159652.3160605","DOIUrl":"https://doi.org/10.1145/3159652.3160605","url":null,"abstract":"Excellent recommendation system facilitates users retrieving contents they like and, what»s much more important - the contents they might like but they are not aware of yet. It will further increase the satisfaction of users and increase the retention rate and conversion rate indirectly. While the public's now listening to all kinds of music, recommendation algorithms still struggle in key areas. Without enough historical data, how would an algorithm know if listeners will like a new song or a new artist? And, how would it know what songs to recommend brand new users? In WSDM Cup 2018, the first task is to solve the abovementioned challenges to build a better music recommendation system. The 2nd task in the Cup focuses on churn prediction. For a subscription business, accurately predicting churn is critical to long-term success. Even slight variations in churn can drastically affect profits. In this task, participants are asked to build an algorithm that predicts whether a user will churn after their subscription expires. The competition data and award are provided by KKBOX, a leading music streaming service in Taiwan.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127373452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Orienteering Algorithms for Generating Travel Itineraries 生成旅行路线的定向运动算法

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3159697

Zachary Friggstad, Sreenivas Gollapudi, Kostas Kollias, Tamás Sarlós, Chaitanya Swamy, A. Tomkins

{"title":"Orienteering Algorithms for Generating Travel Itineraries","authors":"Zachary Friggstad, Sreenivas Gollapudi, Kostas Kollias, Tamás Sarlós, Chaitanya Swamy, A. Tomkins","doi":"10.1145/3159652.3159697","DOIUrl":"https://doi.org/10.1145/3159652.3159697","url":null,"abstract":"We study the problem of automatically and efficiently generating itineraries for users who are on vacation. We focus on the common case, wherein the trip duration is more than a single day. Previous efficient algorithms based on greedy heuristics suffer from two problems. First, the itineraries are often unbalanced, with excellent days visiting top attractions followed by days of exclusively lower-quality alternatives. Second, the trips often re-visit neighborhoods repeatedly in order to cover increasingly low-tier points of interest. Our primary technical contribution is an algorithm that addresses both these problems by maximizing the quality of the worst day. We give theoretical results showing that this algorithm»s competitive factor is within a factor two of the guarantee of the best available algorithm for a single day, across many variations of the problem. We also give detailed empirical evaluations using two distinct datasets:(a) anonymized Google historical visit data and(b) Foursquare public check-in data. We show first that the overall utility of our itineraries is almost identical to that of algorithms specifically designed to maximize total utility, while the utility of the worst day of our itineraries is roughly twice that obtained from other approaches. We then turn to evaluation based on human raters who score our itineraries only slightly below the itineraries created by human travel experts with deep knowledge of the area.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"16 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121012306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate 追踪假新闻足迹:通过传播方式表征社交媒体信息

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3159677

Liang Wu, Huan Liu

{"title":"Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate","authors":"Liang Wu, Huan Liu","doi":"10.1145/3159652.3159677","DOIUrl":"https://doi.org/10.1145/3159652.3159677","url":null,"abstract":"When a message, such as a piece of news, spreads in social networks, how can we classify it into categories of interests, such as genuine or fake news? Classification of social media content is a fundamental task for social media mining, and most existing methods regard it as a text categorization problem and mainly focus on using content features, such as words and hashtags. However, for many emerging applications like fake news and rumor detection, it is very challenging, if not impossible, to identify useful features from content. For example, intentional spreaders of fake news may manipulate the content to make it look like real news. To address this problem, this paper concentrates on modeling the propagation of messages in a social network. Specifically, we propose a novel approach, TraceMiner, to (1) infer embeddings of social media users with social network structures; and (2) utilize an LSTM-RNN to represent and classify propagation pathways of a message. Since content information is sparse and noisy on social media, adopting TraceMiner allows to provide a high degree of classification accuracy even in the absence of content information. Experimental results on real-world datasets show the superiority over state-of-the-art approaches on the task of fake news detection and news categorization.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132976404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 264

Differential Privacy for Information Retrieval 信息检索的差分隐私

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Pub Date : 2018-02-02 DOI: 10.1145/3159652.3162006

G. Yang, Sicong Zhang

{"title":"Differential Privacy for Information Retrieval","authors":"G. Yang, Sicong Zhang","doi":"10.1145/3159652.3162006","DOIUrl":"https://doi.org/10.1145/3159652.3162006","url":null,"abstract":"The concern for privacy is real for any research that uses user data. Information Retrieval (IR) is not an exception. Many IR algorithms and applications require the use of users' personal information, contextual information and other sensitive and private information. The extensive use of personalization in IR has become a double-edged sword. Sometimes, the concern becomes so overwhelming that IR research has to stop to avoid privacy leaks. The good news is that recently there have been increasing attentions paid on the joint field of privacy and IR -- privacy-preserving IR. As part of the effort, this tutorial offers an introduction to differential privacy (DP), one of the most advanced techniques in privacy research, and provides necessary set of theoretical knowledge for applying privacy techniques in IR. Differential privacy is a technique that provides strong privacy guarantees for data protection. Theoretically, it aims to maximize the data utility in statistical datasets while minimizing the risk of exposing individual data entries to any adversary. Differential privacy has been successfully applied to a wide range of applications in database (DB) and data mining (DM). The research in privacy-preserving IR is relatively new, however, research has shown that DP is also effective in supporting multiple IR tasks. This tutorial aims to lay a theoretical foundation of DP and explains how it can be applied to IR. It highlights the differences in IR tasks and DB and DM tasks and how DP connects to IR. We hope the attendees of this tutorial will have a good understanding of DP and other necessary knowledge to work on the newly minted joint research field of privacy and IR.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134497673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4