ACM Transactions on Knowledge Discovery from Data最新文献_第3页

LMACL: Improving Graph Collaborative Filtering with Learnable Model Augmentation Contrastive Learning LMACL：利用可学习模型增强对比学习改进图协同过滤技术

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-04-12 DOI: 10.1145/3657302

Xinru Liu, Yongjing Hao, Lei Zhao, Guanfeng Liu, Victor S. Sheng, Pengpeng Zhao

{"title":"LMACL: Improving Graph Collaborative Filtering with Learnable Model Augmentation Contrastive Learning","authors":"Xinru Liu, Yongjing Hao, Lei Zhao, Guanfeng Liu, Victor S. Sheng, Pengpeng Zhao","doi":"10.1145/3657302","DOIUrl":"https://doi.org/10.1145/3657302","url":null,"abstract":"Graph collaborative filtering (GCF) has achieved exciting recommendation performance with its ability to aggregate high-order graph structure information. Recently, contrastive learning (CL) has been incorporated into GCF to alleviate data sparsity and noise issues. However, most of the existing methods employ random or manual augmentation to produce contrastive views that may destroy the original topology and amplify the noisy effects. We argue that such augmentation is insufficient to produce the optimal contrastive view, leading to suboptimal recommendation results. In this paper, we proposed a Learnable Model Augmentation Contrastive Learning (LMACL) framework for recommendation, which effectively combines graph-level and node-level collaborative relations to enhance the expressiveness of collaborative filtering (CF) paradigm. Specifically, we first use the graph convolution network (GCN) as a backbone encoder to incorporate multi-hop neighbors into graph-level original node representations by leveraging the high-order connectivity in user-item interaction graphs. At the same time, we treat the multi-head graph attention network (GAT) as an augmentation view generator to adaptively generate high-quality node-level augmented views. Finally, joint learning endows the end-to-end training fashion. In this case, the mutual supervision and collaborative cooperation of GCN and GAT achieves learnable model augmentation. Extensive experiments on several benchmark datasets demonstrate that LMACL provides a significant improvement over the strongest baseline in terms of Recall and NDCG by 2.5-3.8% and 1.6-4.0%, respectively. Our model implementation code is available at https://github.com/LiuHsinx/LMACL.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"81 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Congestion-aware Spatio-Temporal Graph Convolutional Network Based A* Search Algorithm for Fastest Route Search 基于拥塞感知时空图卷积网络的 A* 搜索算法，用于最快路径搜索

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-04-11 DOI: 10.1145/3657640

Hongjie Sui, Huan Yan, Tianyi Zheng, Wenzhen Huang, Yunlin Zhuang, Yong Li

{"title":"Congestion-aware Spatio-Temporal Graph Convolutional Network Based A* Search Algorithm for Fastest Route Search","authors":"Hongjie Sui, Huan Yan, Tianyi Zheng, Wenzhen Huang, Yunlin Zhuang, Yong Li","doi":"10.1145/3657640","DOIUrl":"https://doi.org/10.1145/3657640","url":null,"abstract":"The fastest route search, which is to find a path with the shortest travel time when the user initiates a query, has become one of the most important services in many map applications. To enhance the user experience of travel, it is necessary to achieve accurate and real-time route search. However, traffic conditions are changing dynamically, especially the frequent occurrence of traffic congestion may greatly increase travel time. Thus, it is challenging to achieve the above goal. To deal with it, we present a congestion-aware spatio-temporal graph convolutional network based A* search algorithm for the task of fastest route search. We first identify a sequence of consecutive congested traffic conditions as a traffic congestion event. Then, we propose a spatio-temporal graph convolutional network that jointly models the congestion events and changing travel time to capture their complex spatio-temporal correlations, which can predict the future travel time information of each road segment as the basis of route planning. Further, we design a path-aided neural network to achieve effective origin-destination (OD) shortest travel time estimation by encoding the complex relationships between OD pairs and their corresponding fastest paths. Finally, the cost function in the A* algorithm is set by fusing the output results of the two components, which is used to guide the route search. Our experimental results on the two real-world datasets show the superior performance of the proposed method.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"10 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FETILDA: An Evaluation Framework for Effective Representations of Long Financial Documents FETILDA：长篇财务文件有效表述的评估框架

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-04-10 DOI: 10.1145/3657299

Bolun (Namir) Xia, Vipula Rawte, Aparna Gupta, Mohammed Zaki

{"title":"FETILDA: An Evaluation Framework for Effective Representations of Long Financial Documents","authors":"Bolun (Namir) Xia, Vipula Rawte, Aparna Gupta, Mohammed Zaki","doi":"10.1145/3657299","DOIUrl":"https://doi.org/10.1145/3657299","url":null,"abstract":"In the financial sphere, there is a wealth of accumulated unstructured financial data, such as the textual disclosure documents that companies submit on a regular basis to regulatory agencies, such as the Securities and Exchange Commission (SEC). These documents are typically very long and tend to contain valuable soft information about a company’s performance that is not present in quantitative predictors. It is therefore of great interest to learn predictive models from these long textual documents, especially for forecasting numerical key performance indicators (KPIs). In recent years, there has been a great progress in natural language processing via pre-trained language models (LMs) learned from large corpora of textual data. This prompts the important question of whether they can be used effectively to produce representations for long documents, as well as how we can evaluate the effectiveness of representations produced by various LMs. Our work focuses on answering this critical question, namely the evaluation of the efficacy of various LMs in extracting useful soft information from long textual documents for prediction tasks. In this paper, we propose and implement a deep learning evaluation framework that utilizes a sequential chunking approach combined with an attention mechanism. We perform an extensive set of experiments on a collection of 10-K reports submitted annually by US banks, and another dataset of reports submitted by US companies, in order to investigate thoroughly the performance of different types of language models. Overall, our framework using LMs outperforms strong baseline methods for textual modeling as well as for numerical regression. Our work provides better insights into how utilizing pre-trained domain-specific and fine-tuned long-input LMs for representing long documents can improve the quality of representation of textual data, and therefore, help in improving predictive analyses.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"106 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Few-Label Vertical Federated Learning 实现少标签垂直联合学习

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-04-09 DOI: 10.1145/3656344

Lei Zhang, Lele Fu, Chen Liu, Zhao Yang, Jinghua Yang, Zibin Zheng, Chuan Chen

{"title":"Towards Few-Label Vertical Federated Learning","authors":"Lei Zhang, Lele Fu, Chen Liu, Zhao Yang, Jinghua Yang, Zibin Zheng, Chuan Chen","doi":"10.1145/3656344","DOIUrl":"https://doi.org/10.1145/3656344","url":null,"abstract":"Federated Learning (FL) provided a novel paradigm for privacy-preserving machine learning, enabling multiple clients to collaborate on model training without sharing private data. To handle multi-source heterogeneous data, vertical federated learning (VFL) has been extensively investigated. However, in the context of VFL, the label information tends to be kept in one authoritative client and is very limited. This poses two challenges for model training in the VFL scenario: On the one hand, a small number of labels cannot guarantee to train a well VFL model with informative network parameters, resulting in unclear boundaries for classification decisions; On the other hand, the large amount of unlabeled data is dominant and should not be discounted, and it’s worthwhile to focus on how to leverage them to improve representation modeling capabilities. In order to address the above two challenges, Firstly, we introduce supervised contrastive loss to enhance the intra-class aggregation and inter-class estrangement, which is to deeply explore label information and improve the effectiveness of downstream classification tasks. Secondly, for unlabeled data, we introduce a pseudo-label-guided consistency mechanism to induce the classification results coherent across clients, which allows the representations learned by local networks to absorb the knowledge from other clients, and alleviates the disagreement between different clients for classification tasks. We conduct sufficient experiments on four commonly used datasets, and the experimental results demonstrate that our method is superior to the state-of-the-art methods, especially in the low-label rate scenario, and the improvement becomes more significant.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"142 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computing Random Forest-distances in the presence of missing data 在数据缺失的情况下计算随机森林间距

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-04-08 DOI: 10.1145/3656345

Manuele Bicego, Ferdinando Cicalese

引用次数: 0

Enhancing Unsupervised Outlier Model Selection: A Study on IREOS Algorithms 增强无监督离群值模型选择：关于 IREOS 算法的研究

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-04-05 DOI: 10.1145/3653719

Philipp Schlieper, Hermann Luft, Kai Klede, Christoph Strohmeyer, Bjoern Eskofier, Dario Zanca

{"title":"Enhancing Unsupervised Outlier Model Selection: A Study on IREOS Algorithms","authors":"Philipp Schlieper, Hermann Luft, Kai Klede, Christoph Strohmeyer, Bjoern Eskofier, Dario Zanca","doi":"10.1145/3653719","DOIUrl":"https://doi.org/10.1145/3653719","url":null,"abstract":"Outlier detection stands as a critical cornerstone in the field of data mining, with a wide range of applications spanning from fraud detection to network security. However, real-world scenarios often lack labeled data for training, necessitating unsupervised outlier detection methods. This study centers on Unsupervised Outlier Model Selection (UOMS), with a specific focus on the family of Internal, Relative Evaluation of Outlier Solutions (IREOS) algorithms. IREOS measures outlier candidate separability by evaluating multiple maximum-margin classifiers and, while effective, it is constrained by its high computational demands. We investigate the impact of several different separation methods in UOMS in terms of ranking quality and runtime. Surprisingly, our findings indicate that different separability measures have minimal impact on IREOS’ effectiveness. However, using linear separation methods within IREOS significantly reduces its computation time. These insights hold significance for real-world applications where efficient outlier detection is critical. In the context of this work, we provide the code for the IREOS algorithm and our separability techniques.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"38 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140591115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual Homogeneity Hypergraph Motifs with Cross-view Contrastive Learning for Multiple Social Recommendations 针对多重社交推荐的跨视角对比学习双同质性超图动机

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-03-26 DOI: 10.1145/3653976

Jiadi Han, Yufei Tang, Qian Tao, Yuhan Xia, LiMing Zhang

{"title":"Dual Homogeneity Hypergraph Motifs with Cross-view Contrastive Learning for Multiple Social Recommendations","authors":"Jiadi Han, Yufei Tang, Qian Tao, Yuhan Xia, LiMing Zhang","doi":"10.1145/3653976","DOIUrl":"https://doi.org/10.1145/3653976","url":null,"abstract":"Social relations are often used as auxiliary information to address data sparsity and cold-start issues in social recommendations. In the real world, social relations among users are complex and diverse. Widely used graph neural networks (GNNs) can only model pairwise node relationships and are not conducive to exploring higher-order connectivity, while hypergraph provides a natural way to model high-order relations between nodes. However, recent studies show that social recommendations still face the following challenges: 1) a majority of social recommendations ignore the impact of multifaceted social relationships on user preferences; 2) the item homogeneity is often neglected, mainly referring to items with similar static attributes have similar attractiveness when exposed to users that indicating hidden links between items; and 3) directly combining the representations learned from different independent views cannot fully exploit the potential connections between different views. To address these challenges, in this paper, we propose a novel method DH-HGCN++ for multiple social recommendations. Specifically, dual homogeneity (i.e., social homogeneity and item homogeneity) is introduced to mine the impact of diverse social relations on user preferences and enrich item representations. Hypergraph convolution networks with motifs are further exploited to model the high-order relations between nodes. Finally, cross-view contrastive learning is proposed as an auxiliary task to jointly optimize the DH-HGCN++. Real-world datasets are used to validate the effectiveness of the proposed model, where we use sentiment analysis to extract comment relations and employ the k-means clustering algorithm to construct the item-item correlation graph. Experiment results demonstrate that our proposed method consistently outperforms the state-of-the-art baselines on Top-N recommendations.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"33 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We? 利用大型语言模型自动检查成千上万的静态错误警告：我们还有多远？

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-03-26 DOI: 10.1145/3653718

Cheng Wen, Yuandao Cai, Bin Zhang, Jie Su, Zhiwu Xu, Dugang Liu, Shengchao Qin, Zhong Ming, Cong Tian

{"title":"Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?","authors":"Cheng Wen, Yuandao Cai, Bin Zhang, Jie Su, Zhiwu Xu, Dugang Liu, Shengchao Qin, Zhong Ming, Cong Tian","doi":"10.1145/3653718","DOIUrl":"https://doi.org/10.1145/3653718","url":null,"abstract":"Static analysis tools for capturing bugs and vulnerabilities in software programs are widely employed in practice, as they have the unique advantages of high coverage and independence from the execution environment. However, existing tools for analyzing large codebases often produce a great deal of false warnings over genuine bug reports. As a result, developers are required to manually inspect and confirm each warning, a challenging, time-consuming, and automation-essential task. This paper advocates a fast, general, and easily extensible approach called Llm4sa that automatically inspects a sheer volume of static warnings by harnessing (some of) the powers of Large Language Models (LLMs). Our key insight is that LLMs have advanced program understanding capabilities, enabling them to effectively act as human experts in conducting manual inspections on bug warnings with their relevant code snippets. In this spirit, we propose a static analysis to effectively extract the relevant code snippets via program dependence traversal guided by the bug warnings reports themselves. Then, by formulating customized questions that are enriched with domain knowledge and representative cases to query LLMs, Llm4sa can remove a great deal of false warnings and facilitate bug discovery significantly. Our experiments demonstrate that Llm4sa is practical in automatically inspecting thousands of static warnings from Juliet benchmark programs and 11 real-world C/C++ projects, showcasing a high precision (81.13%) and a recall rate (94.64%) for a total of 9,547 bug warnings. Our research introduces new opportunities and methodologies for using the LLMs to reduce human labor costs, improve the precision of static analyzers, and ensure software trustworthiness.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"54 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SA2E-AD: A Stacked Attention Autoencoder for Anomaly Detection in Multivariate Time Series SA2E-AD：用于多变量时间序列异常检测的堆叠注意力自动编码器

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-03-26 DOI: 10.1145/3653677

Mengyao Li, Zhiyong Li, Zhibang Yang, Xu Zhou, Yifan Li, Ziyan Wu, Lingzhao Kong, Ke Nai

{"title":"SA2E-AD: A Stacked Attention Autoencoder for Anomaly Detection in Multivariate Time Series","authors":"Mengyao Li, Zhiyong Li, Zhibang Yang, Xu Zhou, Yifan Li, Ziyan Wu, Lingzhao Kong, Ke Nai","doi":"10.1145/3653677","DOIUrl":"https://doi.org/10.1145/3653677","url":null,"abstract":"Anomaly detection for multivariate time series is an essential task in the modern industrial field. Although several methods have been developed for anomaly detection, they usually fail to effectively exploit the metrical-temporal correlation and the other dependencies among multiple variables. To address this problem, we propose a stacked attention autoencoder for anomaly detection in multivariate time series (SA2E-AD); it focuses on fully utilizing the metrical and temporal relationships among multivariate time series. We design a multiattention block, alternately containing the temporal attention and metrical attention components in a hierarchical structure to better reconstruct normal time series, which is helpful in distinguishing the anomalies from the normal time series. Meanwhile, a two-stage training strategy is designed to further separate the anomalies from the normal data. Experiments on three publicly available datasets show that SA2E-AD outperforms the advanced baseline methods in detection performance and demonstrate the effectiveness of each part of the process in our method.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"52 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical Convolutional Neural Network with Knowledge Complementation for Long-Tailed Classification 用于长尾分类的具有知识补充功能的分层卷积神经网络

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-03-22 DOI: 10.1145/3653717

Hong Zhao, Zhengyu Li, Wenwei He, Yan Zhao

{"title":"Hierarchical Convolutional Neural Network with Knowledge Complementation for Long-Tailed Classification","authors":"Hong Zhao, Zhengyu Li, Wenwei He, Yan Zhao","doi":"10.1145/3653717","DOIUrl":"https://doi.org/10.1145/3653717","url":null,"abstract":"Existing methods based on transfer learning leverage auxiliary information to help tail generalization and improve the performance of the tail classes. However, they cannot fully exploit the relationships between auxiliary information and tail classes and bring irrelevant knowledge to the tail classes. To solve this problem, we propose a hierarchical CNN with knowledge complementation, which regards hierarchical relationships as auxiliary information and transfers relevant knowledge to tail classes. First, we integrate semantics and clustering relationships as hierarchical knowledge into the CNN to guide feature learning. Then, we design a complementary strategy to jointly exploit the two types of knowledge, where semantic knowledge acts as a prior dependence and clustering knowledge reduces the negative information caused by excessive semantic dependence (i.e., semantic gaps). In this way, the CNN facilitates the utilization of the two complementary hierarchical relationships and transfers useful knowledge to tail data to improve long-tailed classification accuracy. Experimental results on public benchmarks show that the proposed model outperforms existing methods. In particular, our model improves accuracy by 3.46% compared with the second-best method on the long-tailed tieredImageNet dataset.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"131 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140196426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0