Guest Editorial: Special issue on trustworthy machine learning for behavioural and social computing

IF 7.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology Pub Date : 2024-06-08 DOI:10.1049/cit2.12353

Zhi-Hui Zhan, Jianxin Li, Xuyun Zhang, Deepak Puthal

{"title":"Guest Editorial: Special issue on trustworthy machine learning for behavioural and social computing","authors":"Zhi-Hui Zhan, Jianxin Li, Xuyun Zhang, Deepak Puthal","doi":"10.1049/cit2.12353","DOIUrl":null,"url":null,"abstract":"Machine learning has been extensively applied in behavioural and social computing, encompassing a spectrum of applications such as social network analysis, click stream analysis, recommendation of points of interest, and sentiment analysis. The datasets pertinent to these applications are inherently linked to human behaviour and societal dynamics, posing a risk of disclosing personal or sensitive information if mishandled or subjected to attacks. To safeguard individuals from potential privacy breaches, numerous governments have enacted a range of legal frameworks and regulatory measures. Examples include the Personal Information Protection Law of the People's Republic of China, the European Union's GDPR for privacy, and Australia's Artificial Intelligence Ethics Framework for many ethical aspects like fairness and reliability. Despite these legislative efforts, the technical implementation of these regulations to ensure trustworthy machine learning in behavioural and social computing remains a significant challenge. Trustworthy machine learning, being a fast-developing field, necessitates further in-depth exploration across multiple dimensions, including but not limited to fairness, privacy, reliability, explainability, robustness, and security, from a holistic and interdisciplinary viewpoint. This special issue is dedicated to facilitating the exchange and discussion of state-of-the-art research findings from academia and industry alike. The seven high-quality papers collected in this special issue place a particular emphasis on showcasing the latest advancements in concepts, algorithms, systems, platforms, and applications, as well as exploring future trends pertinent to the field of trustworthy machine learning for behavioural and social computing.In the first paper, ‘Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification’, Yong Li et al. have developed a semi-supervised framework for the detection of anomalous merchants within the logistics sector. The methodology begins with an extensive data-driven examination comparing the behaviours of regular and anomalous customers. Utilising the insights from this analysis, the authors then implemented a contrastive learning for data augmentation, which capitalises on the imprecise labelling of customer data. Subsequently, their model is employed to identify customers exhibiting abnormal package reception and dispatch patterns in logistics operations. The framework's efficacy is substantiated by an empirical study that leverages 8 months of authentic order data, sourced from Beijing and provided by one of China's foremost logistics corporations.The second paper, entitled ‘Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs’ by Sandra Carrasco Limeros et al., is advancing toward the creation of dependable motion prediction models, with a focus on the evaluation, robustness, and interpretability of the outcomes. The paper firstly underscored the principal disparities and deficiencies in existing evaluation methods, particularly the absence of diversity assessment and compatibility with traffic scenarios. Then, via a robustness analysis, the authors demonstrated that the inability to perceive road topography has a more pronounced effect on system performance than the failure to perceive other road users. Based on the above, the authors have presented outputs from the DenseTNT-intent model, which exhibit high-level intentions that are diverse, compliant, and precise, thereby enhancing the overall quality of the predictions. In general, the proposed methodology and our findings contribute significantly to the advancement of trustworthy motion prediction systems for autonomous vehicles.The third paper, entitled ‘Ada-FFL: Adaptive computing fairness federated learning’ by Yue Cong et al., introduced an adaptive fairness federated learning approach, an adaptive fairness aggregation technique that accounts for the variances in local model updates during the federated learning process. This method offers a more flexible aggregation mechanism, enabling it to be adapted to a variety of federated datasets. Subsequently, the authors conducted a detailed examination of the impact that individual clients have on the fairness coefficient. Building on these insights, the authors proposed a novel approach that can significantly enhance the performance and fairness of federated learning systems in a more effective manner. A comprehensive series of experimental evaluations conducted on various federated datasets has yielded results that substantiate the superiority of the proposed approach. These results highlight the distinct advantages of the proposed method in terms of both model performance and fairness when compared to existing baseline methods.The fourth paper, entitled ‘A topic-controllable keywords-to-text generator with knowledge base network’ by Li He et al., aims to craft informative and controllable text using colloquial social media language, by integrating topic-specific knowledge into a keyword-to-text generation framework. The paper introduces a novel generator to addresses the shortcomings of previous research that often overlooked unordered keywords and failed to leverage subject-specific information. The proposed generator is constructed upon the foundation of conditional language encoders. To steer the model towards generating informative and topic-driven text, the generator initially inputs a set of unordered keywords and employs subject matter to replicate prior human understanding. By incorporating an additional probability factor, the model enhances the probability of topic-related words appearing in the generated text, thereby influencing the overall distribution. Empirical research, based on both automatic evaluation metrics and human annotations, demonstrates that the proposed generator is capable of producing more informative and controllable text. It outperforms current state-of-the-art models, marking a significant advancement in the field of natural language generation.The fifth paper, entitled ‘An intelligent prediction model of epidemic characters based on multi-feature’ by Xiaoying Wang et al., delves into the epidemiological traits of COVID-19, particularly in the context of the Omicron variant's dominance. Their proposed approach harnesses the power of big data analytics to provide a visual examination and representation of the disease's spread, thereby rendering the characteristics of the COVID-19 pandemic more precise and graphically interpretable. Despite the utility of current predictive models, the existing models require refinement to more accurately address the unique attributes of the Omicron strain. To this end, the paper has formulated two models: the logistic growth model and the β-SEIDR model, both of which are specifically tailored to predict the epidemiological patterns of the Omicron variant. The logistic growth model, in particular, is grounded in a substantial corpus of empirical data. The predictions, generated through this model, have demonstrated a high degree of congruence with actual observed data, suggesting its effectiveness in forecasting the trajectory of the Omicron-driven COVID-19 wave. This targeted modelling approach represents a step forward in enhancing the specificity and accuracy of COVID-19 predictions during the Omicron surge.The sixth, paper entitled ‘Heterogeneous decentralised machine unlearning with seed model distillation’ by Guanhua Ye et al., has introduced a novel decentralised unlearning framework termed Heterogeneous Decentralised Unlearning with Seed (HDUS). This framework is distinguished by its use of distilled seed models to establish erasable ensembles for all participating clients. Furthermore, HDUS demonstrates compatibility with diverse on-device models, thereby enhancing its scalability and applicability in real-world scenarios. Through rigorous testing on three distinct real-world datasets, the authors have ascertained that HDUS delivers better performance than the state-of-the-art baselines in the field. This underscores the efficacy and potential of the HDUS framework in advancing the frontiers of decentralised unlearning technologies.The seventh paper, entitled ‘Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems’ by Junjian Zhang et al., is motivated by the optimisation strategy with spatial information on the perturbed paths and samples. The authors propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method (DS-MI-FGSM), which is designed to enhance the transferability of black-box attacks against speaker recognition systems. The DS-MI-FGSM method is notable for its minimal input requirements; it operates effectively with just a single data point and one model. By extending its operations to the neighbouring spaces of data and models, it is capable of generating adversarial examples that can deceive integrated models. To mitigate the risk of overfitting, the DS-MI-FGSM also incorporates a gradient masking technique, which further enhance its transferability. The authors have conducted a series of extensive experiments focusing on the task of speaker recognition. The experimental outcomes are compelling, illustrating the effectiveness of the DS-MI-FGSM method. It has been shown to achieve an impressive attack success rate of up to 92% on the target model within a black-box context, utilising only a single known model. These results underscore the potential of the DS-MI-FGSM as a powerful tool for enhancing the efficacy of black-box attacks against SRSs.Based on the above, the collection of high-quality papers in this special issue presents engaging themes and underscores pivotal trends in trustworthy machine learning for behavioural and social computing. It is our hope that these selected works deepen the community's understanding of prevailing currents pathways for future exploration. We express our heartfelt thanks to all contributors for selecting this special issue as a venue to share their scholarly insights. Our appreciation goes to the readers whose insightful and constructive critiques have been immensely beneficial to the authors. Furthermore, we are grateful to the IET team for their unwavering support and guidance throughout the development of this special issue.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"541-543"},"PeriodicalIF":7.3000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12353","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.12353","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning has been extensively applied in behavioural and social computing, encompassing a spectrum of applications such as social network analysis, click stream analysis, recommendation of points of interest, and sentiment analysis. The datasets pertinent to these applications are inherently linked to human behaviour and societal dynamics, posing a risk of disclosing personal or sensitive information if mishandled or subjected to attacks. To safeguard individuals from potential privacy breaches, numerous governments have enacted a range of legal frameworks and regulatory measures. Examples include the Personal Information Protection Law of the People's Republic of China, the European Union's GDPR for privacy, and Australia's Artificial Intelligence Ethics Framework for many ethical aspects like fairness and reliability. Despite these legislative efforts, the technical implementation of these regulations to ensure trustworthy machine learning in behavioural and social computing remains a significant challenge. Trustworthy machine learning, being a fast-developing field, necessitates further in-depth exploration across multiple dimensions, including but not limited to fairness, privacy, reliability, explainability, robustness, and security, from a holistic and interdisciplinary viewpoint. This special issue is dedicated to facilitating the exchange and discussion of state-of-the-art research findings from academia and industry alike. The seven high-quality papers collected in this special issue place a particular emphasis on showcasing the latest advancements in concepts, algorithms, systems, platforms, and applications, as well as exploring future trends pertinent to the field of trustworthy machine learning for behavioural and social computing.

In the first paper, ‘Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification’, Yong Li et al. have developed a semi-supervised framework for the detection of anomalous merchants within the logistics sector. The methodology begins with an extensive data-driven examination comparing the behaviours of regular and anomalous customers. Utilising the insights from this analysis, the authors then implemented a contrastive learning for data augmentation, which capitalises on the imprecise labelling of customer data. Subsequently, their model is employed to identify customers exhibiting abnormal package reception and dispatch patterns in logistics operations. The framework's efficacy is substantiated by an empirical study that leverages 8 months of authentic order data, sourced from Beijing and provided by one of China's foremost logistics corporations.

The second paper, entitled ‘Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs’ by Sandra Carrasco Limeros et al., is advancing toward the creation of dependable motion prediction models, with a focus on the evaluation, robustness, and interpretability of the outcomes. The paper firstly underscored the principal disparities and deficiencies in existing evaluation methods, particularly the absence of diversity assessment and compatibility with traffic scenarios. Then, via a robustness analysis, the authors demonstrated that the inability to perceive road topography has a more pronounced effect on system performance than the failure to perceive other road users. Based on the above, the authors have presented outputs from the DenseTNT-intent model, which exhibit high-level intentions that are diverse, compliant, and precise, thereby enhancing the overall quality of the predictions. In general, the proposed methodology and our findings contribute significantly to the advancement of trustworthy motion prediction systems for autonomous vehicles.

The third paper, entitled ‘Ada-FFL: Adaptive computing fairness federated learning’ by Yue Cong et al., introduced an adaptive fairness federated learning approach, an adaptive fairness aggregation technique that accounts for the variances in local model updates during the federated learning process. This method offers a more flexible aggregation mechanism, enabling it to be adapted to a variety of federated datasets. Subsequently, the authors conducted a detailed examination of the impact that individual clients have on the fairness coefficient. Building on these insights, the authors proposed a novel approach that can significantly enhance the performance and fairness of federated learning systems in a more effective manner. A comprehensive series of experimental evaluations conducted on various federated datasets has yielded results that substantiate the superiority of the proposed approach. These results highlight the distinct advantages of the proposed method in terms of both model performance and fairness when compared to existing baseline methods.

The fourth paper, entitled ‘A topic-controllable keywords-to-text generator with knowledge base network’ by Li He et al., aims to craft informative and controllable text using colloquial social media language, by integrating topic-specific knowledge into a keyword-to-text generation framework. The paper introduces a novel generator to addresses the shortcomings of previous research that often overlooked unordered keywords and failed to leverage subject-specific information. The proposed generator is constructed upon the foundation of conditional language encoders. To steer the model towards generating informative and topic-driven text, the generator initially inputs a set of unordered keywords and employs subject matter to replicate prior human understanding. By incorporating an additional probability factor, the model enhances the probability of topic-related words appearing in the generated text, thereby influencing the overall distribution. Empirical research, based on both automatic evaluation metrics and human annotations, demonstrates that the proposed generator is capable of producing more informative and controllable text. It outperforms current state-of-the-art models, marking a significant advancement in the field of natural language generation.

The fifth paper, entitled ‘An intelligent prediction model of epidemic characters based on multi-feature’ by Xiaoying Wang et al., delves into the epidemiological traits of COVID-19, particularly in the context of the Omicron variant's dominance. Their proposed approach harnesses the power of big data analytics to provide a visual examination and representation of the disease's spread, thereby rendering the characteristics of the COVID-19 pandemic more precise and graphically interpretable. Despite the utility of current predictive models, the existing models require refinement to more accurately address the unique attributes of the Omicron strain. To this end, the paper has formulated two models: the logistic growth model and the β-SEIDR model, both of which are specifically tailored to predict the epidemiological patterns of the Omicron variant. The logistic growth model, in particular, is grounded in a substantial corpus of empirical data. The predictions, generated through this model, have demonstrated a high degree of congruence with actual observed data, suggesting its effectiveness in forecasting the trajectory of the Omicron-driven COVID-19 wave. This targeted modelling approach represents a step forward in enhancing the specificity and accuracy of COVID-19 predictions during the Omicron surge.

The sixth, paper entitled ‘Heterogeneous decentralised machine unlearning with seed model distillation’ by Guanhua Ye et al., has introduced a novel decentralised unlearning framework termed Heterogeneous Decentralised Unlearning with Seed (HDUS). This framework is distinguished by its use of distilled seed models to establish erasable ensembles for all participating clients. Furthermore, HDUS demonstrates compatibility with diverse on-device models, thereby enhancing its scalability and applicability in real-world scenarios. Through rigorous testing on three distinct real-world datasets, the authors have ascertained that HDUS delivers better performance than the state-of-the-art baselines in the field. This underscores the efficacy and potential of the HDUS framework in advancing the frontiers of decentralised unlearning technologies.

The seventh paper, entitled ‘Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems’ by Junjian Zhang et al., is motivated by the optimisation strategy with spatial information on the perturbed paths and samples. The authors propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method (DS-MI-FGSM), which is designed to enhance the transferability of black-box attacks against speaker recognition systems. The DS-MI-FGSM method is notable for its minimal input requirements; it operates effectively with just a single data point and one model. By extending its operations to the neighbouring spaces of data and models, it is capable of generating adversarial examples that can deceive integrated models. To mitigate the risk of overfitting, the DS-MI-FGSM also incorporates a gradient masking technique, which further enhance its transferability. The authors have conducted a series of extensive experiments focusing on the task of speaker recognition. The experimental outcomes are compelling, illustrating the effectiveness of the DS-MI-FGSM method. It has been shown to achieve an impressive attack success rate of up to 92% on the target model within a black-box context, utilising only a single known model. These results underscore the potential of the DS-MI-FGSM as a powerful tool for enhancing the efficacy of black-box attacks against SRSs.

Based on the above, the collection of high-quality papers in this special issue presents engaging themes and underscores pivotal trends in trustworthy machine learning for behavioural and social computing. It is our hope that these selected works deepen the community's understanding of prevailing currents pathways for future exploration. We express our heartfelt thanks to all contributors for selecting this special issue as a venue to share their scholarly insights. Our appreciation goes to the readers whose insightful and constructive critiques have been immensely beneficial to the authors. Furthermore, we are grateful to the IET team for their unwavering support and guidance throughout the development of this special issue.

Abstract Image

查看原文本刊更多论文

客座编辑：行为和社交计算的可信机器学习特刊

机器学习已广泛应用于行为和社交计算领域，包括社交网络分析、点击流分析、兴趣点推荐和情感分析等一系列应用。与这些应用相关的数据集与人类行为和社会动态有着内在联系，如果处理不当或受到攻击，就有可能泄露个人或敏感信息。为了保护个人隐私不被侵犯，许多国家的政府制定了一系列法律框架和监管措施。这方面的例子包括《中华人民共和国个人信息保护法》、欧盟针对隐私的 GDPR 以及澳大利亚针对公平性和可靠性等诸多伦理方面的《人工智能伦理框架》。尽管做出了这些立法努力，但如何在技术上落实这些法规，以确保行为和社交计算中的机器学习值得信赖，仍然是一项重大挑战。值得信赖的机器学习是一个快速发展的领域，需要从整体和跨学科的角度，从多个维度进一步深入探讨，包括但不限于公平性、隐私、可靠性、可解释性、稳健性和安全性。本特刊致力于促进学术界和业界对最新研究成果的交流和讨论。本特刊收录的七篇高质量论文特别强调展示概念、算法、系统、平台和应用方面的最新进展，以及探索与行为和社交计算领域可信机器学习相关的未来趋势。在第一篇论文《商户识别中从线上到线下物流业务的可信半监督异常检测》中，李勇等人开发了一个半监督框架，用于检测物流行业中的异常商户。该方法首先对常规客户和异常客户的行为进行了广泛的数据驱动检查。利用这一分析的洞察力，作者随后实施了一种用于数据增强的对比学习方法，该方法利用了客户数据的不精确标签。随后，他们采用该模型来识别物流运营中表现出异常包裹接收和发送模式的客户。第二篇论文题为 "迈向可信的多模式运动预测"：Sandra Carrasco Limeros 等人撰写的第二篇论文题为 "迈向值得信赖的多模式运动预测：输出的整体评估和可解释性"，该论文致力于创建可靠的运动预测模型，重点关注结果的评估、稳健性和可解释性。论文首先强调了现有评估方法的主要差异和不足，尤其是缺乏多样性评估和与交通场景的兼容性。然后，通过稳健性分析，作者证明了无法感知道路地形比无法感知其他道路使用者对系统性能的影响更为明显。在此基础上，作者提出了 DenseTNT-意图模型的输出结果，该模型展示了多样化、合规和精确的高级意图，从而提高了预测的整体质量。总的来说，我们提出的方法和研究结果为自动驾驶汽车可信运动预测系统的发展做出了重大贡献：Yue Cong 等人的第三篇论文题为 "Ada-FFL：自适应计算公平性联合学习"，介绍了一种自适应公平性联合学习方法，这是一种自适应公平性聚合技术，可在联合学习过程中考虑本地模型更新的差异。这种方法提供了一种更灵活的聚合机制，使其能够适应各种联合数据集。随后，作者详细研究了单个客户端对公平系数的影响。基于这些见解，作者提出了一种新方法，能以更有效的方式显著提高联合学习系统的性能和公平性。在各种联合数据集上进行的一系列综合实验评估得出的结果证明了所提方法的优越性。与现有的基线方法相比，这些结果凸显了所提出的方法在模型性能和公平性方面的明显优势。我们希望这些入选作品能加深社会各界对当前流行趋势的理解，为今后的探索提供路径。我们衷心感谢所有投稿者选择本特刊作为分享其学术见解的场所。我们还要感谢读者，他们富有洞察力和建设性的评论对作者大有裨益。此外，我们还要感谢 IET 团队在本特刊编写过程中给予的坚定支持和指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.