{"title":"Favorite+: Favorite Tuples Extraction via Regret Minimization","authors":"M. Xie, Yang Liu","doi":"10.1145/3511808.3557188","DOIUrl":"https://doi.org/10.1145/3511808.3557188","url":null,"abstract":"When faced with a database containing millions of tuples, a user might be only interested in some of them. In this paper, we study how to help an end user to find the favorite tuples based on the recent advancements in regret minimization queries, which guarantees the tuples returned are not far from the user's favorite tuple in the database, without asking the user to scan the entire database. We consider three types of regret minimization queries: (1) End-to-end query: Given an output size k, we directly return a subset of at most k tuples from the database; (2) Interactive query: We identify the user's favorite tuple via user interaction, where a user might be presented with a few pairs of tuples, and the user is asked to indicate the one s/he favors more from each pair; and (3) Incremental query: Analogous to how we use search engines, if the user is not satisfied with the current tuples, we continually return more. We developed a demonstration system, called Favorite+, by supporting the above queries. We demonstrate that the system could help the users to find their favorite tuples in the database efficiently and effectively.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117329157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Representation for Multi-view Clustering: From Intra-sample to Intra-view to Inter-view","authors":"Jing-Hua Yang, Chuan Chen, Hongning Dai, Meng Ding, Lele Fu, Zibin Zheng","doi":"10.1145/3511808.3557349","DOIUrl":"https://doi.org/10.1145/3511808.3557349","url":null,"abstract":"Multi-view clustering (MVC) aims at exploiting the consistent features within different views to divide samples into different clusters. Existing subspace-based MVC algorithms usually assume linear subspace structures and two-stage similarity matrix construction strategies, thereby posing challenges in imprecise low-dimensional subspace representation and inadequacy of exploring consistency. This paper presents a novel hierarchical representation for MVC method via the integration of intra-sample, intra-view, and inter-view representation learning models. In particular, we first adopt the deep autoencoder to adaptively map the original high-dimensional data into the latent low-dimensional representation of each sample. Second, we use the self-expression of the latent representation to explore the global similarity between samples of each view and obtain the subspace representation coefficients. Third, we construct the third-order tensor by arranging multiple subspace representation matrices and impose the tensor low-rank constraint to sufficiently explore the consistency among views. Being incorporated into a unified framework, these three models boost each other to achieve a satisfactory clustering result. Moreover, an alternating direction method of multipliers algorithm is developed to solve the challenging optimization problem. Extensive experiments on both simulated and real-world multi-view datasets show the superiority of the proposed method over eight state-of-the-art baselines.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123515604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuqing Bian, Wayne Xin Zhao, Jinpeng Wang, Ji-rong Wen
{"title":"A Relevant and Diverse Retrieval-enhanced Data Augmentation Framework for Sequential Recommendation","authors":"Shuqing Bian, Wayne Xin Zhao, Jinpeng Wang, Ji-rong Wen","doi":"10.1145/3511808.3557071","DOIUrl":"https://doi.org/10.1145/3511808.3557071","url":null,"abstract":"Within online platforms, it is critical to capture the semantics of sequential user behaviors for accurately predicting user interests. Recently, significant progress has been made in sequential recommendation with deep learning. However, existing neural sequential recommendation models may not perform well in practice due to the sparsity of the real-world data especially in cold-start scenarios. To tackle this problem, we propose the model ReDA, which stands for Retrieval-enhanced Data Augmentation for modeling sequential user behaviors. The main idea of our approach is to leverage the related information from similar users for generating both relevant and diverse augmentation. First, we train a neural retriever to retrieve the augmentation users according to the se- mantic similarity between user representations, and then conduct two types of data augmentation to generate augmented user representations. Furthermore, these augmented data are incorporated in a contrastive learning framework for learning more capable representations. Extensive experiments conducted on both public and industry datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods, especially when only limited training data is available.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121901254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianjie Guo, Yujie Wang, Xiaoling Huang, Shuai Yang, Kui Yu
{"title":"Bootstrap-based Causal Structure Learning","authors":"Xianjie Guo, Yujie Wang, Xiaoling Huang, Shuai Yang, Kui Yu","doi":"10.1145/3511808.3557249","DOIUrl":"https://doi.org/10.1145/3511808.3557249","url":null,"abstract":"Learning a causal structure from observational data is crucial for data scientists. Recent advances in causal structure learning (CSL) have focused on local-to-global learning, since the local-to-global CSL can be scaled to high-dimensional data. The local-to-global CSL algorithms first learn the local skeletons, then construct the global skeleton, and finally orient edges. In practice, the performance of local-to-global CSL mainly depends on the accuracy of the global skeleton. However, in many real-world settings, owing to inevitable data quality issues (e.g. noise and small sample), existing local-to-global CSL methods often yield many asymmetric edges (e.g., given anasymmetric edge containing variables A and B, the learned skeleton of A contains B, but the learned skeleton of B does not contain A), which make it difficult to construct a high quality global skeleton. To tackle this problem, this paper proposes a Bootstrap sampling based Causal Structure Learning (BCSL) algorithm. The novel contribution of BCSL is that it proposes an integrated global skeleton learning strategy that can construct more accurate global skeletons. Specifically, this strategy first utilizes the Bootstrap method to generate multiple sub-datasets, then learns the local skeleton of variables on each asymmetric edge on those sub-datasets, and finally designs a novel scoring function to estimate the learning results on all sub-datasets for correcting the asymmetric edge. Extensive experiments on both benchmark and real datasets verify the effectiveness of the proposed method.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116882744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Fang, Hongfu Liu, Zhiqiang Tao, Mikhail Yurochkin
{"title":"Fairness of Machine Learning in Search Engines","authors":"Yi Fang, Hongfu Liu, Zhiqiang Tao, Mikhail Yurochkin","doi":"10.1145/3511808.3557501","DOIUrl":"https://doi.org/10.1145/3511808.3557501","url":null,"abstract":"Fairness has gained increasing importance in a variety of AI and machine learning contexts. As one of the most ubiquitous applications of machine learning, search engines mediate much of the information experiences of members of society. Consequently, understanding and mitigating potential algorithmic unfairness in search have become crucial for both users and systems. In this tutorial, we will introduce the fundamentals of fairness in machine learning, for both supervised learning such as classification and ranking, and unsupervised learning such as clustering. We will then present the existing work on fairness in search engines, including the fairness definitions, evaluation metrics, and taxonomies of methodologies. This tutorial will help orient information retrieval researchers to algorithmic fairness, provide an introduction to the growing literature on this topic, and gathering researchers and practitioners interested in this research direction.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123937711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IntTower: The Next Generation of Two-Tower Model for Pre-Ranking System","authors":"Xiangyang Li, Bo Chen, Huifeng Guo, Jingjie Li, Chenxu Zhu, Xiang Long, Sujian Li, Yichao Wang, Wei Guo, Longxia Mao, Jinxing Liu, Zhenhua Dong, Ruiming Tang","doi":"10.1145/3511808.3557072","DOIUrl":"https://doi.org/10.1145/3511808.3557072","url":null,"abstract":"Scoring a large number of candidates precisely in several milliseconds is vital for industrial pre-ranking systems. Existing pre-ranking systems primarily adopt the two-tower model since the \"user-item decoupling architecture\" paradigm is able to balance the efficiency and effectiveness. However, the cost of high efficiency is the neglect of the potential information interaction between user and item towers, hindering the prediction accuracy critically. In this paper, we show it is possible to design a two-tower model that emphasizes both information interactions and inference efficiency. The proposed model, IntTower (short for Interaction enhanced Two-Tower), consists of Light-SE, FE-Block and CIR modules. Specifically, lightweight Light-SE module is used to identify the importance of different features and obtain refined feature representations in each tower. FE-Block module performs fine-grained and early feature interactions to capture the interactive signals between user and item towers explicitly and CIR module leverages a contrastive interaction regularization to further enhance the interactions implicitly. Experimental results on three public datasets show that IntTower outperforms the SOTA pre-ranking models significantly and even achieves comparable performance in comparison with the ranking models. Moreover, we further verify the effectiveness of IntTower on a large-scale advertisement pre-ranking system. The code of IntTower is publicly available https://gitee.com/mindspore/models/tree/master/research/recommend/IntTower.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124479557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youjin Shin, Eun-Ju Park, Simon S. Woo, Ok-Cheol Jung, D. Chung
{"title":"Selective Tensorized Multi-layer LSTM for Orbit Prediction","authors":"Youjin Shin, Eun-Ju Park, Simon S. Woo, Ok-Cheol Jung, D. Chung","doi":"10.1145/3511808.3557138","DOIUrl":"https://doi.org/10.1145/3511808.3557138","url":null,"abstract":"Although the collision of space objects not only incurs a high cost but also threatens human life, the risk of collision between satellites has increased, as the number of satellites has rapidly grown due to the significant interests in many space applications. However, it is not trivial to monitor the behavior of the satellite in real-time since the communication between the ground station and spacecraft is dynamic and sparse, and there is an increased latency due to the long distance. Accordingly, it is strongly required to predict the orbit of a satellite to prevent unexpected contingencies such as a collision. Therefore, the real-time monitoring and accurate orbit prediction are required. Furthermore, it is necessary to compress the prediction model, while achieving a high prediction performance in order to be deployable in the real systems. Although several machine learning and deep learning-based prediction approaches have been studied to address such issues, most of them have applied only basic machine learning models for orbit prediction without considering the size, running time, and complexity of the prediction model. In this research, we propose Selective Tensorized multi-layer LSTM (ST-LSTM) for orbit prediction, which not only improves the orbit prediction performance but also compresses the size of the model that can be applied in practical deployable scenarios. To evaluate our model, we use the real orbit dataset collected from the Korea Multi-Purpose Satellites (KOMPSAT-3 and KOMPSAT-3A) of the Korea Aerospace Research Institute (KARI) for 5 years. In addition, we compare our ST-LSTM to other machine learning-based regression models, LSTM, and basic tensorized LSTM models with regard to the prediction performance, model compression rate, and running time.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125882495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PyDHNet: A Python Library for Dynamic Heterogeneous Network Representation Learning and Evaluation","authors":"Hoang Nguyen, Radin Hamidi Rad, E. Bagheri","doi":"10.1145/3511808.3557181","DOIUrl":"https://doi.org/10.1145/3511808.3557181","url":null,"abstract":"Network representation learning and its applications have received increasing attention. Due to their various application areas, many research groups have developed a diverse range of software tools and techniques to learn representation for different types of networks. However, to the best of our knowledge, there are limited works that support representation learning for dynamic heterogeneous networks. The work presented in this demonstration paper attempts to fill the gap in this space by developing and publicly releasing an open-source Python library known as, PyDHNet, a Python Library for Dynamic Heterogeneous Network Representation Learning and Evaluation. PyDHNet consists of two main components: dynamic heterogeneous network representation learning and task-specific evaluation. In our paper, we demonstrate that PyDHNet has an extensible architecture, is easy to install (through PIP) and use, and integrates quite seamlessly with other Python libraries. We also show that the implementation for PyDHNet is efficient and enjoys a competitive execution time.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128205345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Graph Representation Learning via Locality-Sensitive Hashing","authors":"Xiusi Chen, Jyun-Yu Jiang, Wei Wang","doi":"10.1145/3511808.3557689","DOIUrl":"https://doi.org/10.1145/3511808.3557689","url":null,"abstract":"A massive amount of research on graph representation learning has been carried out to learn dense features as graph embedding for information networks, thereby capturing the semantics in complex networks and benefiting a variety of downstream tasks. Most of the existing studies focus on structural properties, such as distances and neighborhood proximity between nodes. However, real-world information networks are dominated by the low-degree nodes because they are not only sparse but also subject to the Power law form. Due to the sparsity, proximity-based methods are incapable of deriving satisfactory representations for these tail nodes. To address this challenge, we propose a novel approach, Content-Preserving Locality-Sensitive Hashing~(CP-LSH), by incorporating the content information for representation learning. Specifically, we aim at preserving LSH-based content similarity between nodes to leverage the knowledge from popular nodes to long-tail nodes. We also propose a novel hashing trick to reduce the redundant space consumption so that CP-LSH is capable of tackling industry-scale data. Extensive offline experiments have been conducted on three large-scale public datasets. We also deploy CP-LSH to real-world recommendation systems in one of the largest e-commerce platforms for online experiments. Experimental results demonstrate that CP-LSH outperforms competitive baseline methods in node classification and link prediction tasks. Besides, the results of online experiments also indicate that CP-LSH is practical and robust for real-world production systems.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127384091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maryam Tabar, Wooyong Jung, A. Yadav, Owen Wilson Chavez, Ashley Flores, Dongwon Lee
{"title":"WARNER: Weakly-Supervised Neural Network to Identify Eviction Filing Hotspots in the Absence of Court Records","authors":"Maryam Tabar, Wooyong Jung, A. Yadav, Owen Wilson Chavez, Ashley Flores, Dongwon Lee","doi":"10.1145/3511808.3557128","DOIUrl":"https://doi.org/10.1145/3511808.3557128","url":null,"abstract":"The widespread eviction of tenants across the United States has metamorphosed into a challenging public-policy problem. In particular, eviction exacerbates several income-based, educational, and health inequities in society, e.g., eviction disproportionately affects low-income renting families, many of whom belong to underrepresented minority groups. Despite growing interest in understanding and mitigating the eviction crisis, there are several legal and infrastructural obstacles to data acquisition at scale that limit our understanding of the distribution of eviction across the United States. To circumvent existing challenges in data acquisition, we propose WARNER, a novel Machine Learning (ML) framework that predicts eviction filing hotspots in US counties from unlabeled satellite imagery dataset. We account for the lack of labeled training data in this domain by leveraging sociological insights to propose a novel approach to generate probabilistic labels for a subset of an unlabeled dataset of satellite imagery, which is then used to train a neural network model to identify eviction filing hotspots. Our experimental results show that WARNER acheives a higher predictive performance than several strong baselines. Further, the superiority of WARNER can be generalized to different counties across the United States. Our proposed framework has the potential to assist NGOs and policymakers in designing well-informed (data-driven) resource allocation plans to improve the nationwide housing stability. This work is conducted in collaboration with The Child Poverty Action Lab (a leading non-profit leveraging data-driven approaches to inform actions for relieving poverty and relevant problems in Dallas County, TX). The code can be accessed via https://github.com/maryam-tabar/WARNER.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130006888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}