Proceedings of the 31st ACM International Conference on Information & Knowledge Management最新文献_第6页

Favorite+: Favorite Tuples Extraction via Regret Minimization 最喜欢的+:最喜欢的元组提取通过遗憾最小化

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557188

M. Xie, Yang Liu

{"title":"Favorite+: Favorite Tuples Extraction via Regret Minimization","authors":"M. Xie, Yang Liu","doi":"10.1145/3511808.3557188","DOIUrl":"https://doi.org/10.1145/3511808.3557188","url":null,"abstract":"When faced with a database containing millions of tuples, a user might be only interested in some of them. In this paper, we study how to help an end user to find the favorite tuples based on the recent advancements in regret minimization queries, which guarantees the tuples returned are not far from the user's favorite tuple in the database, without asking the user to scan the entire database. We consider three types of regret minimization queries: (1) End-to-end query: Given an output size k, we directly return a subset of at most k tuples from the database; (2) Interactive query: We identify the user's favorite tuple via user interaction, where a user might be presented with a few pairs of tuples, and the user is asked to indicate the one s/he favors more from each pair; and (3) Incremental query: Analogous to how we use search engines, if the user is not satisfied with the current tuples, we continually return more. We developed a demonstration system, called Favorite+, by supporting the above queries. We demonstrate that the system could help the users to find their favorite tuples in the database efficiently and effectively.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117329157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical Representation for Multi-view Clustering: From Intra-sample to Intra-view to Inter-view 多视图聚类的分层表示:从样本内到视图内再到视图间

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557349

Jing-Hua Yang, Chuan Chen, Hongning Dai, Meng Ding, Lele Fu, Zibin Zheng

{"title":"Hierarchical Representation for Multi-view Clustering: From Intra-sample to Intra-view to Inter-view","authors":"Jing-Hua Yang, Chuan Chen, Hongning Dai, Meng Ding, Lele Fu, Zibin Zheng","doi":"10.1145/3511808.3557349","DOIUrl":"https://doi.org/10.1145/3511808.3557349","url":null,"abstract":"Multi-view clustering (MVC) aims at exploiting the consistent features within different views to divide samples into different clusters. Existing subspace-based MVC algorithms usually assume linear subspace structures and two-stage similarity matrix construction strategies, thereby posing challenges in imprecise low-dimensional subspace representation and inadequacy of exploring consistency. This paper presents a novel hierarchical representation for MVC method via the integration of intra-sample, intra-view, and inter-view representation learning models. In particular, we first adopt the deep autoencoder to adaptively map the original high-dimensional data into the latent low-dimensional representation of each sample. Second, we use the self-expression of the latent representation to explore the global similarity between samples of each view and obtain the subspace representation coefficients. Third, we construct the third-order tensor by arranging multiple subspace representation matrices and impose the tensor low-rank constraint to sufficiently explore the consistency among views. Being incorporated into a unified framework, these three models boost each other to achieve a satisfactory clustering result. Moreover, an alternating direction method of multipliers algorithm is developed to solve the challenging optimization problem. Extensive experiments on both simulated and real-world multi-view datasets show the superiority of the proposed method over eight state-of-the-art baselines.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123515604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Relevant and Diverse Retrieval-enhanced Data Augmentation Framework for Sequential Recommendation 序列推荐的相关和多样化检索增强数据增强框架

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557071

Shuqing Bian, Wayne Xin Zhao, Jinpeng Wang, Ji-rong Wen

{"title":"A Relevant and Diverse Retrieval-enhanced Data Augmentation Framework for Sequential Recommendation","authors":"Shuqing Bian, Wayne Xin Zhao, Jinpeng Wang, Ji-rong Wen","doi":"10.1145/3511808.3557071","DOIUrl":"https://doi.org/10.1145/3511808.3557071","url":null,"abstract":"Within online platforms, it is critical to capture the semantics of sequential user behaviors for accurately predicting user interests. Recently, significant progress has been made in sequential recommendation with deep learning. However, existing neural sequential recommendation models may not perform well in practice due to the sparsity of the real-world data especially in cold-start scenarios. To tackle this problem, we propose the model ReDA, which stands for Retrieval-enhanced Data Augmentation for modeling sequential user behaviors. The main idea of our approach is to leverage the related information from similar users for generating both relevant and diverse augmentation. First, we train a neural retriever to retrieve the augmentation users according to the se- mantic similarity between user representations, and then conduct two types of data augmentation to generate augmented user representations. Furthermore, these augmented data are incorporated in a contrastive learning framework for learning more capable representations. Extensive experiments conducted on both public and industry datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods, especially when only limited training data is available.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121901254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Bootstrap-based Causal Structure Learning 基于自举的因果结构学习

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557249

Xianjie Guo, Yujie Wang, Xiaoling Huang, Shuai Yang, Kui Yu

{"title":"Bootstrap-based Causal Structure Learning","authors":"Xianjie Guo, Yujie Wang, Xiaoling Huang, Shuai Yang, Kui Yu","doi":"10.1145/3511808.3557249","DOIUrl":"https://doi.org/10.1145/3511808.3557249","url":null,"abstract":"Learning a causal structure from observational data is crucial for data scientists. Recent advances in causal structure learning (CSL) have focused on local-to-global learning, since the local-to-global CSL can be scaled to high-dimensional data. The local-to-global CSL algorithms first learn the local skeletons, then construct the global skeleton, and finally orient edges. In practice, the performance of local-to-global CSL mainly depends on the accuracy of the global skeleton. However, in many real-world settings, owing to inevitable data quality issues (e.g. noise and small sample), existing local-to-global CSL methods often yield many asymmetric edges (e.g., given anasymmetric edge containing variables A and B, the learned skeleton of A contains B, but the learned skeleton of B does not contain A), which make it difficult to construct a high quality global skeleton. To tackle this problem, this paper proposes a Bootstrap sampling based Causal Structure Learning (BCSL) algorithm. The novel contribution of BCSL is that it proposes an integrated global skeleton learning strategy that can construct more accurate global skeletons. Specifically, this strategy first utilizes the Bootstrap method to generate multiple sub-datasets, then learns the local skeleton of variables on each asymmetric edge on those sub-datasets, and finally designs a novel scoring function to estimate the learning results on all sub-datasets for correcting the asymmetric edge. Extensive experiments on both benchmark and real datasets verify the effectiveness of the proposed method.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116882744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Fairness of Machine Learning in Search Engines 搜索引擎中机器学习的公平性

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557501

Yi Fang, Hongfu Liu, Zhiqiang Tao, Mikhail Yurochkin

引用次数: 1

IntTower: The Next Generation of Two-Tower Model for Pre-Ranking System IntTower:预排名系统的下一代双塔模型

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557072

Xiangyang Li, Bo Chen, Huifeng Guo, Jingjie Li, Chenxu Zhu, Xiang Long, Sujian Li, Yichao Wang, Wei Guo, Longxia Mao, Jinxing Liu, Zhenhua Dong, Ruiming Tang

{"title":"IntTower: The Next Generation of Two-Tower Model for Pre-Ranking System","authors":"Xiangyang Li, Bo Chen, Huifeng Guo, Jingjie Li, Chenxu Zhu, Xiang Long, Sujian Li, Yichao Wang, Wei Guo, Longxia Mao, Jinxing Liu, Zhenhua Dong, Ruiming Tang","doi":"10.1145/3511808.3557072","DOIUrl":"https://doi.org/10.1145/3511808.3557072","url":null,"abstract":"Scoring a large number of candidates precisely in several milliseconds is vital for industrial pre-ranking systems. Existing pre-ranking systems primarily adopt the two-tower model since the \"user-item decoupling architecture\" paradigm is able to balance the efficiency and effectiveness. However, the cost of high efficiency is the neglect of the potential information interaction between user and item towers, hindering the prediction accuracy critically. In this paper, we show it is possible to design a two-tower model that emphasizes both information interactions and inference efficiency. The proposed model, IntTower (short for Interaction enhanced Two-Tower), consists of Light-SE, FE-Block and CIR modules. Specifically, lightweight Light-SE module is used to identify the importance of different features and obtain refined feature representations in each tower. FE-Block module performs fine-grained and early feature interactions to capture the interactive signals between user and item towers explicitly and CIR module leverages a contrastive interaction regularization to further enhance the interactions implicitly. Experimental results on three public datasets show that IntTower outperforms the SOTA pre-ranking models significantly and even achieves comparable performance in comparison with the ranking models. Moreover, we further verify the effectiveness of IntTower on a large-scale advertisement pre-ranking system. The code of IntTower is publicly available https://gitee.com/mindspore/models/tree/master/research/recommend/IntTower.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124479557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Selective Tensorized Multi-layer LSTM for Orbit Prediction 轨道预测的选择性张拉多层LSTM

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557138

Youjin Shin, Eun-Ju Park, Simon S. Woo, Ok-Cheol Jung, D. Chung

{"title":"Selective Tensorized Multi-layer LSTM for Orbit Prediction","authors":"Youjin Shin, Eun-Ju Park, Simon S. Woo, Ok-Cheol Jung, D. Chung","doi":"10.1145/3511808.3557138","DOIUrl":"https://doi.org/10.1145/3511808.3557138","url":null,"abstract":"Although the collision of space objects not only incurs a high cost but also threatens human life, the risk of collision between satellites has increased, as the number of satellites has rapidly grown due to the significant interests in many space applications. However, it is not trivial to monitor the behavior of the satellite in real-time since the communication between the ground station and spacecraft is dynamic and sparse, and there is an increased latency due to the long distance. Accordingly, it is strongly required to predict the orbit of a satellite to prevent unexpected contingencies such as a collision. Therefore, the real-time monitoring and accurate orbit prediction are required. Furthermore, it is necessary to compress the prediction model, while achieving a high prediction performance in order to be deployable in the real systems. Although several machine learning and deep learning-based prediction approaches have been studied to address such issues, most of them have applied only basic machine learning models for orbit prediction without considering the size, running time, and complexity of the prediction model. In this research, we propose Selective Tensorized multi-layer LSTM (ST-LSTM) for orbit prediction, which not only improves the orbit prediction performance but also compresses the size of the model that can be applied in practical deployable scenarios. To evaluate our model, we use the real orbit dataset collected from the Korea Multi-Purpose Satellites (KOMPSAT-3 and KOMPSAT-3A) of the Korea Aerospace Research Institute (KARI) for 5 years. In addition, we compare our ST-LSTM to other machine learning-based regression models, LSTM, and basic tensorized LSTM models with regard to the prediction performance, model compression rate, and running time.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125882495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PyDHNet: A Python Library for Dynamic Heterogeneous Network Representation Learning and Evaluation PyDHNet:动态异构网络表示学习和评估的Python库

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557181

Hoang Nguyen, Radin Hamidi Rad, E. Bagheri

引用次数: 3

Scalable Graph Representation Learning via Locality-Sensitive Hashing 基于位置敏感哈希的可扩展图表示学习

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557689

Xiusi Chen, Jyun-Yu Jiang, Wei Wang

{"title":"Scalable Graph Representation Learning via Locality-Sensitive Hashing","authors":"Xiusi Chen, Jyun-Yu Jiang, Wei Wang","doi":"10.1145/3511808.3557689","DOIUrl":"https://doi.org/10.1145/3511808.3557689","url":null,"abstract":"A massive amount of research on graph representation learning has been carried out to learn dense features as graph embedding for information networks, thereby capturing the semantics in complex networks and benefiting a variety of downstream tasks. Most of the existing studies focus on structural properties, such as distances and neighborhood proximity between nodes. However, real-world information networks are dominated by the low-degree nodes because they are not only sparse but also subject to the Power law form. Due to the sparsity, proximity-based methods are incapable of deriving satisfactory representations for these tail nodes. To address this challenge, we propose a novel approach, Content-Preserving Locality-Sensitive Hashing~(CP-LSH), by incorporating the content information for representation learning. Specifically, we aim at preserving LSH-based content similarity between nodes to leverage the knowledge from popular nodes to long-tail nodes. We also propose a novel hashing trick to reduce the redundant space consumption so that CP-LSH is capable of tackling industry-scale data. Extensive offline experiments have been conducted on three large-scale public datasets. We also deploy CP-LSH to real-world recommendation systems in one of the largest e-commerce platforms for online experiments. Experimental results demonstrate that CP-LSH outperforms competitive baseline methods in node classification and link prediction tasks. Besides, the results of online experiments also indicate that CP-LSH is practical and robust for real-world production systems.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127384091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

WARNER: Weakly-Supervised Neural Network to Identify Eviction Filing Hotspots in the Absence of Court Records 华纳:在没有法庭记录的情况下识别驱逐申请热点的弱监督神经网络

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI: 10.1145/3511808.3557128

Maryam Tabar, Wooyong Jung, A. Yadav, Owen Wilson Chavez, Ashley Flores, Dongwon Lee

{"title":"WARNER: Weakly-Supervised Neural Network to Identify Eviction Filing Hotspots in the Absence of Court Records","authors":"Maryam Tabar, Wooyong Jung, A. Yadav, Owen Wilson Chavez, Ashley Flores, Dongwon Lee","doi":"10.1145/3511808.3557128","DOIUrl":"https://doi.org/10.1145/3511808.3557128","url":null,"abstract":"The widespread eviction of tenants across the United States has metamorphosed into a challenging public-policy problem. In particular, eviction exacerbates several income-based, educational, and health inequities in society, e.g., eviction disproportionately affects low-income renting families, many of whom belong to underrepresented minority groups. Despite growing interest in understanding and mitigating the eviction crisis, there are several legal and infrastructural obstacles to data acquisition at scale that limit our understanding of the distribution of eviction across the United States. To circumvent existing challenges in data acquisition, we propose WARNER, a novel Machine Learning (ML) framework that predicts eviction filing hotspots in US counties from unlabeled satellite imagery dataset. We account for the lack of labeled training data in this domain by leveraging sociological insights to propose a novel approach to generate probabilistic labels for a subset of an unlabeled dataset of satellite imagery, which is then used to train a neural network model to identify eviction filing hotspots. Our experimental results show that WARNER acheives a higher predictive performance than several strong baselines. Further, the superiority of WARNER can be generalized to different counties across the United States. Our proposed framework has the potential to assist NGOs and policymakers in designing well-informed (data-driven) resource allocation plans to improve the nationwide housing stability. This work is conducted in collaboration with The Child Poverty Action Lab (a leading non-profit leveraging data-driven approaches to inform actions for relieving poverty and relevant problems in Dallas County, TX). The code can be accessed via https://github.com/maryam-tabar/WARNER.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130006888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0