ACM Transactions on Knowledge Discovery from Data最新文献

ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets. 使用条件生成对抗网络生成隐私保护的时间感知过程数据。

IF 4.8 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-11-01 Epub Date: 2024-11-12 DOI: 10.1145/3687464

Keyi Li, Sen Yang, Travis M Sullivan, Randall S Burd, Ivan Marsic

{"title":"ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets.","authors":"Keyi Li, Sen Yang, Travis M Sullivan, Randall S Burd, Ivan Marsic","doi":"10.1145/3687464","DOIUrl":"https://doi.org/10.1145/3687464","url":null,"abstract":"Process data constructed from event logs provides valuable insights into procedural dynamics over time. The confidential information in process data, together with the data's intricate nature, makes the datasets not sharable and challenging to collect. Consequently, research is limited using process data and analytics in the process mining domain. In this study, we introduced a synthetic process data generation task to address the limitation of sharable process data. We introduced a generative adversarial network, called ProcessGAN, to generate process data with activity sequences and corresponding timestamps. ProcessGAN consists of a transformer-based network as the generator, and a time-aware self-attention network as the discriminator. It can generate privacy-preserving process data from random noise. ProcessGAN considers the duration of the process and time intervals between activities to generate realistic activity sequences with timestamps. We evaluated ProcessGAN on five real-world datasets, two that are public and three collected in medical domains that are private. To evaluate the synthetic data, in addition to statistical metrics, we trained a supervised model to score the synthetic processes. We also used process mining to discover workflows for synthetic medical processes and had domain experts evaluate the clinical applicability of the synthetic workflows. ProcessGAN outperformed the existing generative models in generating complex processes with valid parallel pathways. The synthetic process data generated by ProcessGAN better represented the long-range dependencies between activities, a feature relevant to complicated medical and other processes. The timestamps generated by the ProcessGAN model showed similar distributions with the authentic timestamps. In addition, we trained a transformer-based network to generate synthetic contexts (e.g., patient demographics) that were associated with the synthetic processes. The synthetic contexts generated by our model outperformed the baseline models, with the distributions similar to the authentic contexts. We conclude that ProcessGAN can generate sharable synthetic process data indistinguishable from authentic data. Our source code is available in https://github.com/raaachli/ProcessGAN.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"18 9","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12369952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Structural properties on scale-free tree network with an ultra-large diameter 超大直径无标度树网络的结构特性

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-06-20 DOI: 10.1145/3674146

Fei Ma, Ping Wang

{"title":"Structural properties on scale-free tree network with an ultra-large diameter","authors":"Fei Ma, Ping Wang","doi":"10.1145/3674146","DOIUrl":"https://doi.org/10.1145/3674146","url":null,"abstract":"Scale-free networks are prevalently observed in a great variety of complex systems, which triggers various researches relevant to networked models of such type. In this work, we propose a family of growth tree networks (mathcal{T}_{t}), which turn out to be scale-free, in an iterative manner. As opposed to most of published tree models with scale-free feature, our tree networks have the power-law exponent (gamma=1+ln 5/ln 2) that is obviously larger than (3). At the same time, ”small-world” property can not be found particularly because models (mathcal{T}_{t}) have an ultra-large diameter (D_{t}) (i.e., (D_{t}sim|mathcal{T}_{t}|^{ln 3/ln 5})) and a greater average shortest path length (langlemathcal{W}_{t}rangle) (namely, (langlemathcal{W}_{t}ranglesim|mathcal{T}_{t}|^{ln 3/ln 5})) where (|mathcal{T}_{t}|) represents vertex number. Next, we determine Pearson correlation coefficient and verify that networks (mathcal{T}_{t}) display disassortative mixing structure. In addition, we study random walks on tree networks (mathcal{T}_{t}) and derive exact solution to mean hitting time (langlemathcal{H}_{t}rangle). The results suggest that the analytic formula for quantity (langlemathcal{H}_{t}rangle) as a function of vertex number (|mathcal{T}_{t}|) shows a power-law form, i.e., (langlemathcal{H}_{t}ranglesim|mathcal{T}_{t}|^{1+ln 3/ln 5}). Accordingly, we execute extensive experimental simulations, and demonstrate that empirical analysis is in strong agreement with theoretical results. Lastly, we provide a guide to extend the proposed iterative manner in order to generate more general scale-free tree networks with large diameter.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"345 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Individual Treatment Effects under Heterogeneous Interference in Networks 在网络异质干扰下学习个体治疗效果

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-06-18 DOI: 10.1145/3673761

Ziyu Zhao, Yuqi Bai, Ruoxuan Xiong, Qingyu Cao, Chao Ma, Ning Jiang, Fei Wu, Kun Kuang

{"title":"Learning Individual Treatment Effects under Heterogeneous Interference in Networks","authors":"Ziyu Zhao, Yuqi Bai, Ruoxuan Xiong, Qingyu Cao, Chao Ma, Ning Jiang, Fei Wu, Kun Kuang","doi":"10.1145/3673761","DOIUrl":"https://doi.org/10.1145/3673761","url":null,"abstract":"Estimating individual treatment effects in networked observational data is a crucial and increasingly recognized problem. One major challenge of this problem is violating the Stable Unit Treatment Value Assumption (SUTVA), which posits that a unit’s outcome is independent of others’ treatment assignments. However, in network data, a unit’s outcome is influenced not only by its treatment (i.e., direct effect) but also by the treatments of others (i.e., spillover effect) since the presence of interference. Moreover, the interference from other units is always heterogeneous (e.g., friends with similar interests have a different influence than those with different interests). In this paper, we focus on the problem of estimating individual treatment effects (including direct effect and spillover effect) under heterogeneous interference in networks. To address this problem, we propose a novel Dual Weighting Regression (DWR) algorithm by simultaneously learning attention weights to capture the heterogeneous interference from neighbors and sample weights to eliminate the complex confounding bias in networks. We formulate the learning process as a bi-level optimization problem. Theoretically, we give a generalization error bound for the expected estimation error of the individual treatment effects. Extensive experiments on four benchmark datasets demonstrate that the proposed DWR algorithm outperforms the state-of-the-art methods in estimating individual treatment effects under heterogeneous network interference.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"84 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deconfounding User Preference in Recommendation Systems through Implicit and Explicit Feedback 通过隐性和显性反馈解构推荐系统中的用户偏好

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-06-18 DOI: 10.1145/3673762

Yuliang Liang, Enneng Yang, Guibing Guo, Wei Cai, Linying Jiang, Xingwei Wang

引用次数: 0

Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation 不平衡研究提案主题推断中的跨学科公平性：基于分层变换器的选择性插值法

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-06-08 DOI: 10.1145/3671149

Meng Xiao, Min Wu, Ziyue Qiao, Yanjie Fu, Zhiyuan Ning, Yi Du, Yuanchun Zhou

{"title":"Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation","authors":"Meng Xiao, Min Wu, Ziyue Qiao, Yanjie Fu, Zhiyuan Ning, Yi Du, Yuanchun Zhou","doi":"10.1145/3671149","DOIUrl":"https://doi.org/10.1145/3671149","url":null,"abstract":"The objective of topic inference in research proposals aims to obtain the most suitable disciplinary division from the discipline system defined by a funding agency. The agency will subsequently find appropriate peer review experts from their database based on this division. Automated topic inference can reduce human errors caused by manual topic filling, bridge the knowledge gap between funding agencies and project applicants, and improve system efficiency. Existing methods focus on modeling this as a hierarchical multi-label classification problem, using generative models to iteratively infer the most appropriate topic information. However, these methods overlook the gap in scale between interdisciplinary research proposals and non-interdisciplinary ones, leading to an unjust phenomenon where the automated inference system categorizes interdisciplinary proposals as non-interdisciplinary, causing unfairness during the expert assignment. How can we address this data imbalance issue under a complex discipline system and hence resolve this unfairness? In this paper, we implement a topic label inference system based on a Transformer encoder-decoder architecture. Furthermore, we utilize interpolation techniques to create a series of pseudo-interdisciplinary proposals from non-interdisciplinary ones during training based on non-parametric indicators such as cross-topic probabilities and topic occurrence probabilities. This approach aims to reduce the bias of the system during model training. Finally, we conduct extensive experiments on a real-world dataset to verify the effectiveness of the proposed method. The experimental results demonstrate that our training strategy can significantly mitigate the unfairness generated in the topic inference task. To improve the reproducibility of our research, we have released accompanying code by Dropbox.1.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"42 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Compact Vulnerability Knowledge Graph for Risk Assessment 用于风险评估的紧凑型漏洞知识图谱

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-06-05 DOI: 10.1145/3671005

Jiao Yin, Wei Hong, Hua Wang, Jinli Cao, Yuan Miao, Yanchun Zhang

{"title":"A Compact Vulnerability Knowledge Graph for Risk Assessment","authors":"Jiao Yin, Wei Hong, Hua Wang, Jinli Cao, Yuan Miao, Yanchun Zhang","doi":"10.1145/3671005","DOIUrl":"https://doi.org/10.1145/3671005","url":null,"abstract":"Software vulnerabilities, also known as flaws, bugs or weaknesses, are common in modern information systems, putting critical data of organizations and individuals at cyber risk. Due to the scarcity of resources, initial risk assessment is becoming a necessary step to prioritize vulnerabilities and make better decisions on remediation, mitigation, and patching. Datasets containing historical vulnerability information are crucial digital assets to enable AI-based risk assessments. However, existing datasets focus on collecting information on individual vulnerabilities while simply storing them in relational databases, disregarding their structural connections. This paper constructs a compact vulnerability knowledge graph, VulKG, containing over 276K nodes and 1M relationships to represent the connections between vulnerabilities, exploits, affected products, vendors, referred domain names, and more. We provide a detailed analysis of VulKG modeling and construction, demonstrating VulKG-based query and reasoning, and providing a use case of applying VulKG to a vulnerability risk assessment task, i.e., co-exploitation behavior discovery. Experimental results demonstrate the value of graph connections in vulnerability risk assessment tasks. VulKG offers exciting opportunities for more novel and significant research in areas related to vulnerability risk assessment. The data and codes of this paper are available at https://github.com/happyResearcher/VulKG.git.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"30 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141258961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Utility-oriented Reranking with Counterfactual Context 以效用为导向的重新排名与反事实背景

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-06-04 DOI: 10.1145/3671004

Yunjia Xi, Weiwen Liu, Xinyi Dai, Ruiming Tang, Qing Liu, Weinan Zhang, Yong Yu

{"title":"Utility-oriented Reranking with Counterfactual Context","authors":"Yunjia Xi, Weiwen Liu, Xinyi Dai, Ruiming Tang, Qing Liu, Weinan Zhang, Yong Yu","doi":"10.1145/3671004","DOIUrl":"https://doi.org/10.1145/3671004","url":null,"abstract":"As a critical task for large-scale commercial recommender systems, reranking rearranges items in the initial ranking lists from the previous ranking stage to better meet users’ demands. Foundational work in reranking has shown the potential of improving recommendation results by uncovering mutual influence among items. However, rather than considering the context of initial lists as most existing methods do, an ideal reranking algorithm should consider the counterfactual context – the position and the alignment of the items in the reranked lists. In this work, we propose a novel pairwise reranking framework, Utility-oriented Reranking with Counterfactual Context (URCC), which maximizes the overall utility after reranking efficiently. Specifically, we first design a utility-oriented evaluator, which applies Bi-LSTM and graph attention mechanism to estimate the listwise utility via the counterfactual context modeling. Then, under the guidance of the evaluator, we propose a pairwise reranker model to find the most suitable position for each item by swapping misplaced item pairs. Extensive experiments on two benchmark datasets and a proprietary real-world dataset demonstrate that URCC significantly outperforms the state-of-the-art models in terms of both relevance-based metrics and utility-based metrics.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"26 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141258837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anomaly Detection in Dynamic Graphs: A Comprehensive Survey 动态图中的异常检测：全面调查

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-05-29 DOI: 10.1145/3669906

Ocheme Anthony Ekle, William Eberle

{"title":"Anomaly Detection in Dynamic Graphs: A Comprehensive Survey","authors":"Ocheme Anthony Ekle, William Eberle","doi":"10.1145/3669906","DOIUrl":"https://doi.org/10.1145/3669906","url":null,"abstract":"This survey paper presents a comprehensive and conceptual overview of anomaly detection using dynamic graphs. We focus on existing graph-based anomaly detection (AD) techniques and their applications to dynamic networks. The contributions of this survey paper include the following: i) a comparative study of existing surveys on anomaly detection; ii) a Dynamic Graph-based Anomaly Detection (DGAD) review framework in which approaches for detecting anomalies in dynamic graphs are grouped based on traditional machine-learning models, matrix transformations, probabilistic approaches, and deep-learning approaches; iii) a discussion of graphically representing both discrete and dynamic networks; and iv) a discussion of the advantages of graph-based techniques for capturing the relational structure and complex interactions in dynamic graph data. Finally, this work identifies the potential challenges and future directions for detecting anomalies in dynamic networks. This DGAD survey approach aims to provide a valuable resource for researchers and practitioners by summarizing the strengths and limitations of each approach, highlighting current research trends, and identifying open challenges. In doing so, it can guide future research efforts and promote advancements in anomaly detection in dynamic graphs.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"6 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141193177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Boosting Fair Classifier Generalization through Adaptive Priority Reweighing 通过自适应优先级重权提升公平分类器的通用性

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-05-23 DOI: 10.1145/3665895

Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He

{"title":"Boosting Fair Classifier Generalization through Adaptive Priority Reweighing","authors":"Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He","doi":"10.1145/3665895","DOIUrl":"https://doi.org/10.1145/3665895","url":null,"abstract":"With the increasing penetration of machine learning applications in critical decision-making areas, calls for algorithmic fairness are more prominent. Although there have been various modalities to improve algorithmic fairness through learning with fairness constraints, their performance does not generalize well in the test set. A performance-promising fair algorithm with better generalizability is needed. This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability. Most previous reweighing methods propose to assign a unified weight for each (sub)group. Rather, our method granularly models the distance from the sample predictions to the decision boundary. Our adaptive reweighing method prioritizes samples closer to the decision boundary and assigns a higher weight to improve the generalizability of fair classifiers. Extensive experiments are performed to validate the generalizability of our adaptive priority reweighing method for accuracy and fairness measures (i.e., equal opportunity, equalized odds, and demographic parity) in tabular benchmarks. We also highlight the performance of our method in improving the fairness of language and vision models. The code is available at https://github.com/che2198/APW.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On Mean-Optimal Robust Linear Discriminant Analysis 关于均值最优稳健线性判别分析

IF 3.6 3区计算机科学

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-05-21 DOI: 10.1145/3665500

Xiangyu Li, Hua Wang

{"title":"On Mean-Optimal Robust Linear Discriminant Analysis","authors":"Xiangyu Li, Hua Wang","doi":"10.1145/3665500","DOIUrl":"https://doi.org/10.1145/3665500","url":null,"abstract":"Linear discriminant analysis (LDA) is widely used for dimensionality reduction under supervised learning settings. Traditional LDA objective aims to minimize the ratio of the squared Euclidean distances that may not perform optimally on noisy datasets. Multiple robust LDA objectives have been proposed to address this problem, but their implementations have two major limitations. One is that their mean calculations use the squared (ell_{2})-norm distance to center the data, which is not valid when the objective depends on other distance functions. The second problem is that there is no generalized optimization algorithm to solve different robust LDA objectives. In addition, most existing algorithms can only guarantee the solution to be locally optimal, rather than globally optimal. In this paper, we review multiple robust loss functions and propose a new and generalized robust objective for LDA. Besides, to better remove the mean value within data, our objective uses an optimal way to center the data through learning. As one important algorithmic contribution, we derive an efficient iterative algorithm to optimize the resulting non-smooth and non-convex objective function. We theoretically prove that our solution algorithm guarantees that both the objective and the solution sequences converge to globally optimal solutions at a sub-linear convergence rate. The results of comprehensive experimental evaluations demonstrate the effectiveness of our new method, achieving significant improvements compared to the other competing methods.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"56 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0