{"title":"Geometric Inductive Matrix Completion: A Hyperbolic Approach with Unified Message Passing","authors":"Chengkun Zhang, Hongxu Chen, Sixiao Zhang, Guandong Xu, Junbin Gao","doi":"10.1145/3488560.3498402","DOIUrl":"https://doi.org/10.1145/3488560.3498402","url":null,"abstract":"Collaborative filtering is a central task in a broad range of recommender systems. As traditional methods train latent variables for user/item individuals under a transductive setting, it requires re-training for out-of-sample inferences. Inductive matrix completion (IMC) solves this problem by learning transformation functions upon engineered features, but it sacrifices model expressiveness and highly depends on feature qualities. In this paper, we propose Geometric Inductive Matrix Completion (GIMC) by introducing hyperbolic geometry and a unified message passing scheme into this generic task. The proposed method is the earliest attempt utilizing capacious hyperbolic space to enhance the capacity of IMC. It is the first work defining continuous explicit feedback prediction within non-Euclidean space by introducing hyperbolic regression for vertex interactions. This is also the first to provide comprehensive evidence that edge semantics can significantly improve recommendations, which is ignored by previous works. The proposed method outperforms the state-of-the-art algorithms with less than 1% parameters compared to its transductive counterparts. Extensive analysis and ablation studies are conducted to reveal the design considerations and practicability for a positive impact to the research community.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123292910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpretable Relation Learning on Heterogeneous Graphs","authors":"Qiang Yang, Qiannan Zhang, Chuxu Zhang, Xiangliang Zhang","doi":"10.1145/3488560.3498508","DOIUrl":"https://doi.org/10.1145/3488560.3498508","url":null,"abstract":"Relation learning, widely used in recommendation systems or relevant entity search over knowledge graphs, has attracted increasing attentions in recent years. Existing methods like network embedding and graph neural networks (GNNs), learn the node representations from neighbors and calculate the similarity score for relation prediction. Despite effective prediction performance, they lack explanations to the predicted results. We propose a novel interpretable relation learning model named IRL, which can not only predict whether relations exist between node pairs, but also make the inference more transparent and convincing. Specifically, we introduce a meta-path based path encoder to model sequential dependency between nodes through recurrent neural network. We also apply the self-supervised GNN on the extracted sub-graph to capture the graph structure by aggregating information from neighbors, which are fed into the meta-path encoder. In addition, we propose a meta-path walk pruning strategy for positive path generation and an adaptive negative sampling method for negative path generation to improve the quality of paths, which both consider the semantics of nodes in the heterogeneous graph. We conduct extensive experiments on two public heterogeneous graph data, AMiner and Delve, for different relation prediction tasks, which demonstrate significant improvements of our model over the existing embedding-based and sequential modeling-based methods.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121074005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Zhang, Lei Shi, Jiashu Zhao, Juan Yang, Tianshu Lyu, Dawei Yin, Haiping Lu
{"title":"A GNN-based Multi-task Learning Framework for Personalized Video Search","authors":"Li Zhang, Lei Shi, Jiashu Zhao, Juan Yang, Tianshu Lyu, Dawei Yin, Haiping Lu","doi":"10.1145/3488560.3498507","DOIUrl":"https://doi.org/10.1145/3488560.3498507","url":null,"abstract":"Watching online videos has become more and more popular and users tend to watch videos based on their personal tastes and preferences. Providing a customized ranking list to maximize the user's satisfaction has become increasingly important for online video platforms. Existing personalized search methods (PSMs) train their models with user feedback information (e.g. clicks). However, we identified that such feedback signals may indicate attractiveness but not necessarily indicate relevance in video search. Besides, the click data and user historical information are usually too sparse to train a good PSM, which is different from the conventional Web search containing users' rich historical information. To address these concerns, in this paper we propose a multi-task graph neural network architecture for personalized video search (MGNN-PVS) that can jointly model user's click behaviour and the relevance between queries and videos. To relieve the sparsity problem and learn better representation for users, queries and videos, we develop an efficient and novel GNN architecture based on neighborhood sampling and hierarchical aggregation strategy by leveraging their different hops of neighbors in the user-query and query-document click graph. Extensive experiments on a major commercial video search engine show that our model significantly outperforms state-of-the-art PSMs, which illustrates the effectiveness of our proposed framework.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127578935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Tosic, Fábio Coelho, B. Nouwt, David Emanuel Rua, Aleksandar Tomcic, Saša Pesic
{"title":"Towards a Cross-domain Semantically Interoperable Ecosystem","authors":"M. Tosic, Fábio Coelho, B. Nouwt, David Emanuel Rua, Aleksandar Tomcic, Saša Pesic","doi":"10.1145/3488560.3508496","DOIUrl":"https://doi.org/10.1145/3488560.3508496","url":null,"abstract":"The increasing number of IoT devices and digital services offers cross-domain sensing and control opportunities to a growing set of stakeholders. The provision of cross-domain digital services requires interoperability as a key enabler to bridge domain specifics, while inferring knowledge and allowing new data-driven services. This work addresses H2020 InterConnect project's Interoperability Framework, highlighting the use of semantic web technologies. The interoperability framework layering is presented, particularly addressing the Semantic Interoperability layer as its cornerstone to build an interoperable ecosystem of cross-domain digital services via a federation of distributed knowledge bases. Departing from a generic, ontology-agnostic approach that can fit any cross-domain use case, it validates the approach by considering the SAREF family of ontologies, showcasing an IoT and energy cross-domain use case.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126402009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Fazelnia, Eric Simon, Ian Anderson, Ben Carterette, M. Lalmas
{"title":"Variational User Modeling with Slow and Fast Features","authors":"G. Fazelnia, Eric Simon, Ian Anderson, Ben Carterette, M. Lalmas","doi":"10.1145/3488560.3498477","DOIUrl":"https://doi.org/10.1145/3488560.3498477","url":null,"abstract":"Recommender systems play a key role in helping users find their favorite music to play among an often extremely large catalog of items on online streaming services. To correctly identify users' interests, recommendation algorithms rely on past user behavior and feedback to aim at learning users' preferences through the logged interactions. User modeling is a fundamental part of this large-scale system as it enables the model to learn an optimal representation for each user. For instance, in music recommendation, the focus of this paper, users' interests at any time is shaped by their general preferences for music as well as their recent or momentary interests in a particular type of music. In this paper, we present a novel approach for learning user representation based on general and slow-changing user interests as well as fast-moving current preferences. We propose a variational autoencoder-based model that takes fast and slow-moving features and learns an optimal user representation. Our model, which we call FS-VAE, consists of sequential and non-sequential encoders to capture patterns in user-item interactions and learn users' representations. We evaluate FS-VAE on a real-world music streaming dataset. Our experimental results show a clear improvement in learning optimal representations compared to state-of-the-art baselines on the next item recommendation task. We also demonstrate how each of the model components, slow input feature, and fast ones play a role in achieving the best results in next item prediction and learning users' representations.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127251039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experiments with Predictive Long Term Guardrail Metrics","authors":"Sri Sri Perangur","doi":"10.1145/3488560.3510014","DOIUrl":"https://doi.org/10.1145/3488560.3510014","url":null,"abstract":"Product experiments today need a long term view of impact to make shipping decisions truly effective. Here we will discuss the challenges in the traditional metrics used in experiment analysis and how long term forecast metrics enable better decisions. Most tech companies such as Google, Amazon, Netflix etc run thousands of experiments (also known as A/B test) a year [1]. The aim is to measure the impact new features have on core Key Predictive Indicators (KPIs) before deciding to launch it to production. Traditional A/B testing metrics will usually measure the impact of the feature on core KPIs in the short-term. However, for many lines of business (such as loyalty and memberships), this is not enough, as we want to understand the impact of the features in the mid/long term. This reality can force companies to run experiments to 6+ months duration, or use a correlated leading metric (such as user activity, engagement level) with estimated impact in the long term. Both these situations are not ideal, the first slows down the rate of innovation while the second does not account for multiple factors that define the future results. At Lyft, this reality is shared, and one that becomes a challenge for innovation as we need to know the long term impact before we decide to ship new features. As a solution we design forecasted metrics for retention and revenue at a user level that can be used to measure the impact of experiments in the long term. In this talk we will discuss challenges and learnings from this approach, when applied in practice.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121583514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Users' Gender and Age based on Mobile Tasks","authors":"Yuan Tian","doi":"10.1145/3488560.3508494","DOIUrl":"https://doi.org/10.1145/3488560.3508494","url":null,"abstract":"Demographic attributes are a key factor in marketing products and services, which enable a business owner to find the ideal customer. Users' app usage behaviors could reveal rich clues regarding their personal attributes since they always determine what apps to use depending on their personal needs and interests. Prior studies [1, 2] have tried to predict users' gender and age through their app usage behavior. However, most of the existing methods for users' demographic prediction are straightforward, simply using popular used apps or app usage frequency as features, without considering the internal semantic relationship of apps usage. Recently, mobile tasks [3] have been identified from mobile app usage logs, representing a more accurate unit for capturing users' goals and behavioral insights, where a \"mobile task\" can be thought of as a group of related used apps to accomplish a single discrete task. For example, to plan dinner with friends, multiple apps (e.g., WhatsApp, Yelp, Uber and Google Maps) might be accessed for completing the task. In this talk, I will introduce how we leverage the fine-grained task units for generating user representation aims at predicting users' gender and age. We analyzed the effectiveness of using tasks to infer users' demographics, especially when compared to only treating apps independently. We explored different approaches for constructing users' representation and models with both mobile apps and tasks. Finally, we validated that the two-level hierarchical structure of \"apps to tasks\" and \"tasks to users\" is an important factor that should be taken into consideration for improving mobile user modelling. This work shed light on whether and how the extracted mobile tasks could be effectively applied. We believe that the task-based representations could be further explored for improving many other applications.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121689371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Razieh Nokhbeh Zaeem, Ahmad Ahbab, Josh Bestor, Hussam H. Djadi, Sunny Kharel, Victor Lai, Nick Wang, K. S. Barber
{"title":"PrivacyCheck v3: Empowering Users with Higher-Level Understanding of Privacy Policies","authors":"Razieh Nokhbeh Zaeem, Ahmad Ahbab, Josh Bestor, Hussam H. Djadi, Sunny Kharel, Victor Lai, Nick Wang, K. S. Barber","doi":"10.1145/3488560.3502184","DOIUrl":"https://doi.org/10.1145/3488560.3502184","url":null,"abstract":"Online privacy policies are lengthy and hard to read, yet are profoundly important as they communicate the practices of an organization pertaining to user data privacy. Privacy Enhancing Technologies, or PETs, seek to inform users by summarizing these privacy policies. Efforts in the research and development of such PETs, however, have largely been limited to tools that recap the policy or visualize it. We present the next generation of our research and publicly available tool, PrivacyCheck v3, that utilizes machine learning to inform and empower users with respect to privacy policies. PrivacyCheck v3 adds capabilities that are commonly absent from similar PETs on the web. In particular, it adds the ability to (1) find the competitors of an organization with Alexa traffic analysis and compare policies across them, (2) follow privacy policies to which the user has agreed and notify the user when policies change, (3) track policies over time and report how often policies change and their trends, (4) automatically find privacy policies in domains, and (5) provide a bird's-eye view of privacy policies. The new features of PrivacyCheck not only inform users about details of privacy policies, but also empower them to understand privacy policies at a higher level, make informed decisions, and even select competitors with better privacy policies.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124048198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Qian, Qintong Wu, Kai Zhang, Zhiqiang Zhang, Lihong Gu, Xiaodong Zeng, Jun Zhou, Jinjie Gu
{"title":"Scope-aware Re-ranking with Gated Attention in Feed","authors":"Hao Qian, Qintong Wu, Kai Zhang, Zhiqiang Zhang, Lihong Gu, Xiaodong Zeng, Jun Zhou, Jinjie Gu","doi":"10.1145/3488560.3498403","DOIUrl":"https://doi.org/10.1145/3488560.3498403","url":null,"abstract":"Modern recommendation systems introduce the re-ranking stage to optimize the entire list directly. This paper focuses on the design of re-ranking framework in feed to optimally model the mutual influence between items and further promote user engagement. On mobile devices, users browse the feed almost in a top-down manner and rarely compare items back and forth. Besides, users often compare item with its adjacency based on their partial observations. Given the distinct user behavior patterns, the modeling of mutual influence between items should be carefully designed. Existing re-ranking models encode the mutual influence between items with sequential encoding methods. However, previous works may be dissatisfactory due to the ignorance of connections between items on different scopes. In this paper, we first discuss Unidirectivity and Locality on the impacts and consequences, then report corresponding solutions in industrial applications. We propose a novel framework based on the empirical evidence from user analysis. To address the above problems, we design a underlineS cope-aware underlineR e-ranking with underlineG ated underlineA ttention model (SRGA ) to emulate the user behavior patterns from two aspects: 1) we emphasize the influence along the user's common browsing direction; 2) we strength the impacts of pivotal adjacent items within the user visual window. Specifically, we design a global scope attention to encode inter-item patterns unidirectionally from top to bottom. Besides, we devise a local scope attention sliding over the recommendation list to underline interactions among neighboring items. Furthermore, we design a learned gate mechanism to aggregating the information dynamically from local and global scope attention. Extensive offline experiments and online A/B testing demonstrate the benefits of our novel framework. The proposed SRGA model achieves the best performance in offline metrics compared with the state-of-the-art re-ranking methods. Further, empirical results on live traffic validate that our recommender system, equipped with SRGA in the re-ranking stage, improves significantly in user engagement.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127901160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Rise of Data Observability: Architecting the Future of Data Trust","authors":"Barr Moses","doi":"10.1145/3488560.3510007","DOIUrl":"https://doi.org/10.1145/3488560.3510007","url":null,"abstract":"As companies become increasingly data driven, the technologies underlying these rich insights have grown more and more nuanced and complex. While our ability to collect, store, aggregate, and visualize this data has largely kept up with the needs of modern data teams (think: domain-oriented data meshes, cloud warehouses, data visualization tools, and data modeling solutions), the mechanics behind data quality and integrity has lagged. To keep pace with data's clock speed of innovation, data engineers need to invest not only in the latest modeling and analytics tools, but also technologies that can increase data accuracy and prevent broken pipelines. The solution? Data observability, the next frontier of data engineering. I'll discuss why data observability matters to building a better data quality strategy and tactics best-in-class organizations use to address it -- including org structure, culture, and technology.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130113554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}