Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Mengnan Du
{"title":"Explainability for Large Language Models: A Survey","authors":"Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Mengnan Du","doi":"10.1145/3639372","DOIUrl":"https://doi.org/10.1145/3639372","url":null,"abstract":"<p>Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional deep learning models.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fairness-Driven Private Collaborative Machine Learning","authors":"Dana Pessach, Tamir Tassa, Erez Shmueli","doi":"10.1145/3639368","DOIUrl":"https://doi.org/10.1145/3639368","url":null,"abstract":"<p>The performance of machine learning algorithms can be considerably improved when trained over larger datasets. In many domains, such as medicine and finance, larger datasets can be obtained if several parties, each having access to limited amounts of data, collaborate and share their data. However, such data sharing introduces significant privacy challenges. While multiple recent studies have investigated methods for private collaborative machine learning, the fairness of such collaborative algorithms was overlooked. In this work we suggest a feasible privacy-preserving pre-process mechanism for enhancing fairness of collaborative machine learning algorithms. An extensive evaluation of the proposed method shows that it is able to enhance fairness considerably with only a minor compromise in accuracy.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Quyang Pan, Junbo Zhang, Zeju Li, Qingxiang Liu
{"title":"Exploring the Distributed Knowledge Congruence in Proxy-data-free Federated Distillation","authors":"Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Quyang Pan, Junbo Zhang, Zeju Li, Qingxiang Liu","doi":"10.1145/3639369","DOIUrl":"https://doi.org/10.1145/3639369","url":null,"abstract":"<p>Federated learning (FL) is a privacy-preserving machine learning paradigm in which the server periodically aggregates local model parameters from clients without assembling their private data. Constrained communication and personalization requirements pose severe challenges to FL. Federated distillation (FD) is proposed to simultaneously address the above two problems, which exchanges knowledge between the server and clients, supporting heterogeneous local models while significantly reducing communication overhead. However, most existing FD methods require a proxy dataset, which is often unavailable in reality. A few recent proxy-data-free FD approaches can eliminate the need for additional public data, but suffer from remarkable discrepancy among local knowledge due to client-side model heterogeneity, leading to ambiguous representation on the server and inevitable accuracy degradation. To tackle this issue, we propose a proxy-data-free FD algorithm based on distributed knowledge congruence (FedDKC). FedDKC leverages well-designed refinement strategies to narrow local knowledge differences into an acceptable upper bound, so as to mitigate the negative effects of knowledge incongruence. Specifically, from perspectives of peak probability and Shannon entropy of local knowledge, we design kernel-based knowledge refinement (KKR) and searching-based knowledge refinement (SKR) respectively, and theoretically guarantee that the refined-local knowledge can satisfy an approximately-similar distribution and be regarded as congruent. Extensive experiments conducted on three common datasets demonstrate that our proposed FedDKC significantly outperforms the state-of-the-art on various heterogeneous settings while evidently improving the convergence speed.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139070962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Xu, Xinhong Chen, Yechao She, Yang Jin, Guanyi Zhao, Jianping Wang
{"title":"Strengthening Cooperative Consensus in Multi-Robot Confrontation","authors":"Meng Xu, Xinhong Chen, Yechao She, Yang Jin, Guanyi Zhao, Jianping Wang","doi":"10.1145/3639371","DOIUrl":"https://doi.org/10.1145/3639371","url":null,"abstract":"<p>Multi-agent reinforcement learning (MARL) has proven effective in training multi-robot confrontation, such as StarCraft and robot soccer games. However, the current joint action policies utilized in MARL have been unsuccessful in recognizing and preventing actions that often lead to failures on our side. This exacerbates the cooperation dilemma, ultimately resulting in our agents acting independently and being defeated individually by their opponents. To tackle this challenge, we propose a novel joint action policy, referred to as the consensus action policy (CAP). Specifically, CAP records the number of times each joint action has caused our side to fail in the past and computes a cooperation tendency, which is integrated with each agent’s Q-value and Nash bargaining solution to determine a joint action. The cooperation tendency promotes team cooperation by selecting joint actions that have a high tendency of cooperation and avoiding actions that may lead to team failure. Moreover, the proposed CAP policy can be extended to partially observable scenarios by combining it with Deep Q network (DQN) or actor-critic-based methods. We conducted extensive experiments to compare the proposed method with seven existing joint action policies, including four commonly used methods and three state-of-the-art (SOTA) methods, in terms of episode rewards, winning rates, and other metrics. Our results demonstrate that this approach holds great promise for multi-robot confrontation scenarios.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139070955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengyu Chen, Tianshu Bao, Peyman Givi, Can Zheng, Xiaowei Jia
{"title":"Reconstructing Turbulent Flows Using Spatio-Temporal Physical Dynamics","authors":"Shengyu Chen, Tianshu Bao, Peyman Givi, Can Zheng, Xiaowei Jia","doi":"10.1145/3637491","DOIUrl":"https://doi.org/10.1145/3637491","url":null,"abstract":"<p>Accurate simulation of turbulent flows is of crucial importance in many branches of science and engineering. Direct numerical simulation (DNS) provides the highest fidelity means of capturing all intricate physics of turbulent transport. However, the method is computationally expensive because of the wide range of turbulence scales that must be accounted for in such simulations. Large eddy simulation (LES) provides an alternative. In such simulations, the large scales of the flow are resolved and the effects of small scales are modelled. Reconstruction of the DNS field from the low-resolution LES is needed for a wide variety of applications. Thus the construction of super-resolution (SR) methodologies that can provide this reconstruction has become an area of active research. In this work, a new physics-guided neural network is developed for such a reconstruction. The method leverages the partial differential equation that underlies the flow dynamics in the design of spatio-temporal model architecture. A degradation-based refinement method is also developed to enforce physical constraints and to further reduce the accumulated reconstruction errors over long periods. Detailed DNS data on two turbulent flow configurations are used to assess the performance of the model.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138688694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Yuan, Jingtao Ding, Huandong Wang, Depeng Jin
{"title":"Generating Daily Activities with Need Dynamics","authors":"Yuan Yuan, Jingtao Ding, Huandong Wang, Depeng Jin","doi":"10.1145/3637493","DOIUrl":"https://doi.org/10.1145/3637493","url":null,"abstract":"<p>Daily activity data recording individuals’ various activities in daily life are widely used in many applications such as activity scheduling, activity recommendation, and policymaking. Though with high value, its accessibility is limited due to high collection costs and potential privacy issues. Therefore, simulating human activities to produce massive high-quality data is of great importance. However, existing solutions, including rule-based methods with simplified behavior assumptions and data-driven methods directly fitting real-world data, both cannot fully qualify for matching reality. In this paper, motivated by the classic psychological theory, Maslow’s need theory describing human motivation, we propose a knowledge-driven simulation framework based on generative adversarial imitation learning. Our core idea is to model the evolution of human needs as the underlying mechanism that drives activity generation in the simulation model. Specifically, a hierarchical model structure that disentangles different need levels and the use of neural stochastic differential equations successfully capture the piecewise-continuous characteristics of need dynamics. Extensive experiments demonstrate that our framework outperforms the state-of-the-art baselines regarding data fidelity and utility. We also present the insightful interpretability of the need modeling. Moreover, privacy preservation evaluations validate that the generated data does not leak individual privacy. The code is available at https://github.com/tsinghua-fib-lab/Activity-Simulation-SAND.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138630639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fernando Terroso-Saenz, Juan Morales-García, Andres Muñoz
{"title":"Nationwide Air Pollution Forecasting with Heterogeneous Graph Neural Networks","authors":"Fernando Terroso-Saenz, Juan Morales-García, Andres Muñoz","doi":"10.1145/3637492","DOIUrl":"https://doi.org/10.1145/3637492","url":null,"abstract":"<p>Nowadays, air pollution is one of the most relevant environmental problems in most urban settings. Due to the utility in operational terms of anticipating certain pollution levels, several predictors based on Graph Neural Networks (GNN) have been proposed for the last years. Most of these solutions usually encode the relationships among stations in terms of their spatial distance, but they fail when it comes to capture other spatial and feature-based contextual factors. Besides, they assume a <i>homogeneous</i> setting where all the stations are able to capture the same pollutants. However, large-scale settings frequently comprise different types of stations, each one with different measurement capabilities. For that reason, the present paper introduces a novel GNN framework able to capture the similarities among stations related to the land use of their locations and their primary source of pollution. Furthermore, we define a methodology to deal with heterogeneous settings on the top of the GNN architecture. Finally, the proposal has been tested with a nation-wide Spanish air-pollution dataset with very promising results.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138630644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explicit Knowledge Graph Reasoning for Conversational Recommendation","authors":"Xuhui Ren, Tong Chen, Quoc Viet Hung Nguyen, Lizhen Cui, Zi Huang, Hongzhi Yin","doi":"10.1145/3637216","DOIUrl":"https://doi.org/10.1145/3637216","url":null,"abstract":"<p>Traditional recommender systems estimate user preference on items purely based on historical interaction records, thus failing to capture fine-grained yet dynamic user interests and letting users receive recommendation only passively. Recent conversational recommender systems (CRSs) tackle those limitations by enabling recommender systems to interact with the user to obtain her/his current preference through a sequence of clarifying questions. Recently, there has been a rise of using knowledge graphs (KGs) for CRSs, where the core motivation is to incorporate the abundant side information carried by a KG into both the recommendation and conversation processes. However, existing KG-based CRSs are subject to two defects: (1) there is a semantic gap between the learned representations of utterances and KG entities, hindering the retrieval of relevant KG information; (2) the reasoning over KG is mostly performed with the implicitly learned user interests, overlooking the explicit signals from the entities actually mentioned in the conversation. </p><p>To address these drawbacks, we propose a new CRS framework, namely the <b>K</b>nowledge <b>E</b>nhanced <b>C</b>onversational <b>R</b>easoning (KECR) model. As a user can reflect her/his preferences via both attribute- and item-level expressions, KECR jointly embeds the structured knowledge from two levels in the KG. A mutual information maximization constraint is further proposed for semantic alignment between the embedding spaces of utterances and KG entities. Meanwhile, KECR utilizes the connectivity within the KG to conduct explicit reasoning of the user demand, making the model less dependent on the user’s feedback to clarifying questions. As such, the semantic alignment and explicit KG reasoning can jointly facilitate accurate recommendation and quality dialogue generation. By comparing with strong baselines on two real-world datasets, we demonstrate that KECR obtains state-of-the-art recommendation effectiveness, as well as competitive dialogue generation performance.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138572359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalized Fashion Recommendations for Diverse Body Shapes and Local Preferences with Contrastive Multimodal Cross-Attention Network","authors":"Jianghong Ma, Huiyue Sun, Dezhao Yang, Haijun Zhang","doi":"10.1145/3637217","DOIUrl":"https://doi.org/10.1145/3637217","url":null,"abstract":"<p>Fashion recommendation has become a prominent focus in the realm of online shopping, with various tasks being explored to enhance the customer experience. Recent research has particularly emphasized fashion recommendation based on body shapes, yet a critical aspect of incorporating multimodal data relevance has been overlooked. In this paper, we present the Contrastive Multimodal Cross-Attention Network, a novel approach specifically designed for fashion recommendation catering to diverse body shapes. By incorporating multimodal representation learning and leveraging contrastive learning techniques, our method effectively captures both inter- and intra-sample relationships, resulting in improved accuracy in fashion recommendations tailored to individual body types. Additionally, we propose a locality-aware cross-attention module to align and understand the local preferences between body shapes and clothing items, thus enhancing the matching process. Experimental results conducted on a diverse dataset demonstrate the state-of-the-art performance achieved by our approach, reinforcing its potential to significantly enhance the personalized online shopping experience for consumers with varying body shapes and preferences.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138572071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana NEACȘU, Jean-Christophe Pesquet, Corneliu Burileanu
{"title":"EMG-Based Automatic Gesture Recognition Using Lipschitz-Regularized Neural Networks","authors":"Ana NEACȘU, Jean-Christophe Pesquet, Corneliu Burileanu","doi":"10.1145/3635159","DOIUrl":"https://doi.org/10.1145/3635159","url":null,"abstract":"<p>This paper introduces a novel approach for building a robust Automatic Gesture Recognition system based on Surface Electromyographic (sEMG) signals, acquired at the forearm level. Our main contribution is to propose new constrained learning strategies that ensure robustness against adversarial perturbations by controlling the Lipschitz constant of the classifier. We focus on nonnegative neural networks for which accurate Lipschitz bounds can be derived, and we propose different spectral norm constraints offering robustness guarantees from a theoretical viewpoint. Experimental results on four publicly available datasets highlight that a good trade-off in terms of accuracy and performance is achieved. We then demonstrate the robustness of our models, compared to standard trained classifiers in four scenarios, considering both white-box and black-box attacks.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138553238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}