Chuchu Zhang, Can Song, Samarth Agarwal, Huayu Wu, Xuejie Zhang, John Jianan Lu
{"title":"A Semantic Search Framework for Similar Audit Issue Recommendation in Financial Industry","authors":"Chuchu Zhang, Can Song, Samarth Agarwal, Huayu Wu, Xuejie Zhang, John Jianan Lu","doi":"10.1145/3539597.3573040","DOIUrl":"https://doi.org/10.1145/3539597.3573040","url":null,"abstract":"Audit issues summarize the findings during audit reviews and provide valuable insights of risks and control gaps in a financial institute. Despite the wide use of data analytics and NLP in financial services, due to the diverse coverage and lack of annotations, there are very few use cases that analyze audit issue writing and derive insights from it. In this paper, we propose a deep learning based semantic search framework to search, rank and recommend similar past issues based on new findings. We adopt a two-step approach. First, a TF-IDF based search algorithm and a Bi-Encoder are used to shortlist a set of issue candidates based on the input query. Then a Cross-Encoder will re-rank the candidates and provide the final recommendation. We will also demonstrate how the models are deployed and integrated with the existing workbench to benefit auditors in their daily work.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115588496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalizing Graph Neural Network across Graphs and Time","authors":"Zhi Wen","doi":"10.1145/3539597.3572986","DOIUrl":"https://doi.org/10.1145/3539597.3572986","url":null,"abstract":"Graph-structured data widely exist in diverse real-world scenarios, analysis of these graphs can uncover valuable insights about their respective application domains. However, most previous works focused on learning node representation from a single fixed graph, while many real-world scenarios require representations to be quickly generated for unseen nodes, new edges, or entirely new graphs. This inductive ability is essential for high-throughtput machine learning systems. However, this inductive graph representation problem is quite difficult, compared to the transductive setting, for that generalizing to unseen nodes requires new subgraphs containing the new nodes to be aligned to the neural network trained already. Meanwhile, following a message passing framework, graphneural network (GNN) is an inductive and powerful graph representation tool. We further explore inductive GNN from more specific perspectives: (1) generalizing GNN across graphs, in which we tackle with the problem of semi-supervised node classification across graphs; (2) generalizing GNN across time, in which we mainly solve the problem of temporal link prediction; (3) generalizing GNN across tasks; (4) generalizing GNN across locations.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121094795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Graph Learning for Anomaly Detection Systems","authors":"F. Febrinanto","doi":"10.1145/3539597.3572990","DOIUrl":"https://doi.org/10.1145/3539597.3572990","url":null,"abstract":"Anomaly detection plays a significant role in preventing from detrimental effects of abnormalities. It brings many benefits in real-world sectors ranging from transportation, finance to cybersecurity. In reality, millions of data do not stand independently, but they might be connected to each other and form graph or network data. A more advanced technique, named graph anomaly detection, is required to model that data type. The current works of graph anomaly detection have achieved state-of-the-art performance compared to regular anomaly detection. However, most models ignore the efficiency aspect, leading to several problems like technical bottlenecks. This project mainly focuses on improving the efficiency aspect of graph anomaly detection while maintaining its performance.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126896616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Next-generation Challenges of Responsible Data Integration","authors":"F. Nargesian, Abolfazl Asudeh, H. Jagadish","doi":"10.1145/3539597.3572727","DOIUrl":"https://doi.org/10.1145/3539597.3572727","url":null,"abstract":"Data integration has been extensively studied by the data management community and is a core task in the data pre-processing step of ML pipelines. When the integrated data is used for analysis and model training, responsible data science requires addressing concerns about data quality and bias. We present a tutorial on data integration and responsibility, highlighting the existing efforts in responsible data integration along with research opportunities and challenges. In this tutorial, we encourage the community to audit data integration tasks with responsibility measures and develop integration techniques that optimize the requirements of responsible data science. We focus on three critical aspects: (1) the requirements to be considered for evaluating and auditing data integration tasks for quality and bias; (2) the data integration tasks that elicit attention to data responsibility measures and methods to satisfy these requirements; and, (3) techniques, tasks, and open problems in data integration that help achieve data responsibility.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126326753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Explicit and Implicit Item relationships for Session-based Recommendation","authors":"Zihao Li, Xianzhi Wang, Chao Yang, L. Yao, Julian McAuley, Guandong Xu","doi":"10.1145/3539597.3570432","DOIUrl":"https://doi.org/10.1145/3539597.3570432","url":null,"abstract":"The session-based recommendation aims to predict users' immediate next actions based on their short-term behaviors reflected by past and ongoing sessions. Graph neural networks (GNNs) recently dominated the related studies, yet their performance heavily relies on graph structures, which are often predefined, task-specific, and designed heuristically. Furthermore, existing graph-based methods either neglect implicit correlations among items or consider explicit and implicit relationships altogether in the same graphs. We propose to decouple explicit and implicit relationships among items. As such, we can capture the prior knowledge encapsulated in explicit dependencies and learned implicit correlations among items simultaneously in a flexible and more interpretable manner for effective recommendations. We design a dual graph neural network that leverages the feature representations extracted by two GNNs: a graph neural network with a single gate (SG-GNN) and an adaptive graph neural network (A-GNN). The former models explicit dependencies among items. The latter employs a self-learning strategy to capture implicit correlations among items. Our experiments on four real-world datasets show our model outperforms state-of-the-art methods by a large margin, achieving 18.46% and 70.72% improvement in HR@20, and 49.10% and 115.29% improvement in MRR@20 on Diginetica and LastFM datasets.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127898705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge-Augmented Methods for Natural Language Processing","authors":"Chenguang Zhu, Yichong Xu, Xiang Ren, Bill Yuchen Lin, Meng Jiang, Wenhao Yu","doi":"10.1145/3539597.3572720","DOIUrl":"https://doi.org/10.1145/3539597.3572720","url":null,"abstract":"Knowledge in natural language processing (NLP) has been a rising trend especially after the advent of large scale pre-trained models. NLP models with attention to knowledge can i) access unlimited amount of external information; ii) delegate the task of storing knowledge from its parameter space to knowledge sources; iii) obtain up-to-date information; iv) make prediction results more explainable via selected knowledge. In this tutorial, we will introduce the key steps in integrating knowledge into NLP, including knowledge grounding from text, knowledge representation and fusing. In addition, we will introduce recent state-of-the-art applications in fusing knowledge into language understanding, language generation and commonsense reasoning.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128051332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios Bouloukakis, Chrysostomos Zeginis, N. Papadakis, K. Magoutis, George Christodoulou, Chrysanthi Kosyfaki, Konstantinos Lampropoulos, N. Mamoulis
{"title":"SmartCityBus - A Platform for Smart Transportation Systems","authors":"Georgios Bouloukakis, Chrysostomos Zeginis, N. Papadakis, K. Magoutis, George Christodoulou, Chrysanthi Kosyfaki, Konstantinos Lampropoulos, N. Mamoulis","doi":"10.1145/3539597.3575781","DOIUrl":"https://doi.org/10.1145/3539597.3575781","url":null,"abstract":"With the growth of the Internet of Things (IoT), Smart(er) Cities have been a research goal of researchers, businesses and local authorities willing to adopt IoT technologies to improve their services. Among them, Smart Transportation [7,8], the integrated application of modern technologies and management strategies in transportation systems, refers to the adoption of new IoT solutions to improve urban mobility. These technologies aim to provide innovative solutions related to different modes of transport and traffic management and enable users to be better informed and make safer and 'smarter' use of transport networks. This talk presents SmartCityBus, a data-driven intelligent transportation system (ITS) whose main objective is to use online and offline data in order to provide accurate statistics and predictions and improve public transportation services in the short and medium/long term.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121496337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenyu Yang, Ge Zhang, Jia Wu, Jian Yang, Quan.Z Sheng, Hao Peng, Ang Li, Shan Xue, Jianlin Su
{"title":"Minimum Entropy Principle Guided Graph Neural Networks","authors":"Zhenyu Yang, Ge Zhang, Jia Wu, Jian Yang, Quan.Z Sheng, Hao Peng, Ang Li, Shan Xue, Jianlin Su","doi":"10.1145/3539597.3570467","DOIUrl":"https://doi.org/10.1145/3539597.3570467","url":null,"abstract":"Graph neural networks (GNNs) are now the mainstream method for mining graph-structured data and learning low-dimensional node- and graph-level embeddings to serve downstream tasks. However, limited by the bottleneck of interpretability that deep neural networks present, existing GNNs have ignored the issue of estimating the appropriate number of dimensions for the embeddings. Hence, we propose a novel framework called Minimum Graph Entropy principle-guided Dimension Estimation, i.e. MGEDE, that learns the appropriate embedding dimensions for both node and graph representations. In terms of node-level estimation, a minimum entropy function that counts both structure and attribute entropy, appraises the appropriate number of dimensions. In terms of graph-level estimation, each graph is assigned a customized embedding dimension from a candidate set based on the number of dimensions estimated for the node-level embeddings. Comprehensive experiments with node and graph classification tasks and nine benchmark datasets verify the effectiveness and generalizability of MGEDE.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124392590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mario Alfonso Prado-Romero, Bardh Prenkaj, G. Stilo
{"title":"Developing and Evaluating Graph Counterfactual Explanation with GRETEL","authors":"Mario Alfonso Prado-Romero, Bardh Prenkaj, G. Stilo","doi":"10.1145/3539597.3573026","DOIUrl":"https://doi.org/10.1145/3539597.3573026","url":null,"abstract":"The black-box nature and the lack of interpretability detract from constant improvements in Graph Neural Networks (GNNs) performance in social network tasks like friendship prediction and community detection. Graph Counterfactual Explanation (GCE) methods aid in understanding the prediction of GNNs by generating counterfactual examples that promote trustworthiness, debiasing, and privacy in social networks. Alas, the literature on GCE lacks standardised definitions, explainers, datasets, and evaluation metrics. To bridge the gap between the performance and interpretability of GNNs in social networks, we discuss GRETEL, a unified framework for GCE methods development and evaluation. We demonstrate how GRETEL comes with fully extensible built-in components that allow users to define ad-hoc explainer methods, generate synthetic datasets, implement custom evaluation metrics, and integrate state-of-the-art prediction models.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134579193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Tutorial on Domain Generalization","authors":"Jindong Wang, Haoliang Li, Sinno Jialin Pan, Xingxu Xie","doi":"10.1145/3539597.3572722","DOIUrl":"https://doi.org/10.1145/3539597.3572722","url":null,"abstract":"With the availability of massive labeled training data, powerful machine learning models can be trained. However, the traditional I.I.D. assumption that the training and testing data should follow the same distribution is often violated in reality. While existing domain adaptation approaches can tackle domain shift, it relies on the target samples for training. Domain generalization is a promising technology that aims to train models with good generalization ability to unseen distributions. In this tutorial, we will present the recent advance of domain generalization. Specifically, we introduce the background, formulation, and theory behind this topic. Our primary focus is on the methodology, evaluation, and applications. We hope this tutorial can draw interest of the community and provide a thorough review of this area. Eventually, more robust systems can be built for responsible AI. All tutorial materials and updates can be found online at https://dgresearch.github.io/.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132896769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}