{"title":"Guest Editorial: Special issue on trustworthy machine learning for behavioural and social computing","authors":"Zhi-Hui Zhan, Jianxin Li, Xuyun Zhang, Deepak Puthal","doi":"10.1049/cit2.12353","DOIUrl":null,"url":null,"abstract":"<p>Machine learning has been extensively applied in behavioural and social computing, encompassing a spectrum of applications such as social network analysis, click stream analysis, recommendation of points of interest, and sentiment analysis. The datasets pertinent to these applications are inherently linked to human behaviour and societal dynamics, posing a risk of disclosing personal or sensitive information if mishandled or subjected to attacks. To safeguard individuals from potential privacy breaches, numerous governments have enacted a range of legal frameworks and regulatory measures. Examples include the Personal Information Protection Law of the People's Republic of China, the European Union's GDPR for privacy, and Australia's Artificial Intelligence Ethics Framework for many ethical aspects like fairness and reliability. Despite these legislative efforts, the technical implementation of these regulations to ensure trustworthy machine learning in behavioural and social computing remains a significant challenge. Trustworthy machine learning, being a fast-developing field, necessitates further in-depth exploration across multiple dimensions, including but not limited to fairness, privacy, reliability, explainability, robustness, and security, from a holistic and interdisciplinary viewpoint. This special issue is dedicated to facilitating the exchange and discussion of state-of-the-art research findings from academia and industry alike. The seven high-quality papers collected in this special issue place a particular emphasis on showcasing the latest advancements in concepts, algorithms, systems, platforms, and applications, as well as exploring future trends pertinent to the field of trustworthy machine learning for behavioural and social computing.</p><p>In the first paper, ‘Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification’, Yong Li et al. have developed a semi-supervised framework for the detection of anomalous merchants within the logistics sector. The methodology begins with an extensive data-driven examination comparing the behaviours of regular and anomalous customers. Utilising the insights from this analysis, the authors then implemented a contrastive learning for data augmentation, which capitalises on the imprecise labelling of customer data. Subsequently, their model is employed to identify customers exhibiting abnormal package reception and dispatch patterns in logistics operations. The framework's efficacy is substantiated by an empirical study that leverages 8 months of authentic order data, sourced from Beijing and provided by one of China's foremost logistics corporations.</p><p>The second paper, entitled ‘Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs’ by Sandra Carrasco Limeros et al., is advancing toward the creation of dependable motion prediction models, with a focus on the evaluation, robustness, and interpretability of the outcomes. The paper firstly underscored the principal disparities and deficiencies in existing evaluation methods, particularly the absence of diversity assessment and compatibility with traffic scenarios. Then, via a robustness analysis, the authors demonstrated that the inability to perceive road topography has a more pronounced effect on system performance than the failure to perceive other road users. Based on the above, the authors have presented outputs from the DenseTNT-intent model, which exhibit high-level intentions that are diverse, compliant, and precise, thereby enhancing the overall quality of the predictions. In general, the proposed methodology and our findings contribute significantly to the advancement of trustworthy motion prediction systems for autonomous vehicles.</p><p>The third paper, entitled ‘Ada-FFL: Adaptive computing fairness federated learning’ by Yue Cong et al., introduced an adaptive fairness federated learning approach, an adaptive fairness aggregation technique that accounts for the variances in local model updates during the federated learning process. This method offers a more flexible aggregation mechanism, enabling it to be adapted to a variety of federated datasets. Subsequently, the authors conducted a detailed examination of the impact that individual clients have on the fairness coefficient. Building on these insights, the authors proposed a novel approach that can significantly enhance the performance and fairness of federated learning systems in a more effective manner. A comprehensive series of experimental evaluations conducted on various federated datasets has yielded results that substantiate the superiority of the proposed approach. These results highlight the distinct advantages of the proposed method in terms of both model performance and fairness when compared to existing baseline methods.</p><p>The fourth paper, entitled ‘A topic-controllable keywords-to-text generator with knowledge base network’ by Li He et al., aims to craft informative and controllable text using colloquial social media language, by integrating topic-specific knowledge into a keyword-to-text generation framework. The paper introduces a novel generator to addresses the shortcomings of previous research that often overlooked unordered keywords and failed to leverage subject-specific information. The proposed generator is constructed upon the foundation of conditional language encoders. To steer the model towards generating informative and topic-driven text, the generator initially inputs a set of unordered keywords and employs subject matter to replicate prior human understanding. By incorporating an additional probability factor, the model enhances the probability of topic-related words appearing in the generated text, thereby influencing the overall distribution. Empirical research, based on both automatic evaluation metrics and human annotations, demonstrates that the proposed generator is capable of producing more informative and controllable text. It outperforms current state-of-the-art models, marking a significant advancement in the field of natural language generation.</p><p>The fifth paper, entitled ‘An intelligent prediction model of epidemic characters based on multi-feature’ by Xiaoying Wang et al., delves into the epidemiological traits of COVID-19, particularly in the context of the Omicron variant's dominance. Their proposed approach harnesses the power of big data analytics to provide a visual examination and representation of the disease's spread, thereby rendering the characteristics of the COVID-19 pandemic more precise and graphically interpretable. Despite the utility of current predictive models, the existing models require refinement to more accurately address the unique attributes of the Omicron strain. To this end, the paper has formulated two models: the logistic growth model and the <i>β</i>-SEIDR model, both of which are specifically tailored to predict the epidemiological patterns of the Omicron variant. The logistic growth model, in particular, is grounded in a substantial corpus of empirical data. The predictions, generated through this model, have demonstrated a high degree of congruence with actual observed data, suggesting its effectiveness in forecasting the trajectory of the Omicron-driven COVID-19 wave. This targeted modelling approach represents a step forward in enhancing the specificity and accuracy of COVID-19 predictions during the Omicron surge.</p><p>The sixth, paper entitled ‘Heterogeneous decentralised machine unlearning with seed model distillation’ by Guanhua Ye et al., has introduced a novel decentralised unlearning framework termed Heterogeneous Decentralised Unlearning with Seed (HDUS). This framework is distinguished by its use of distilled seed models to establish erasable ensembles for all participating clients. Furthermore, HDUS demonstrates compatibility with diverse on-device models, thereby enhancing its scalability and applicability in real-world scenarios. Through rigorous testing on three distinct real-world datasets, the authors have ascertained that HDUS delivers better performance than the state-of-the-art baselines in the field. This underscores the efficacy and potential of the HDUS framework in advancing the frontiers of decentralised unlearning technologies.</p><p>The seventh paper, entitled ‘Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems’ by Junjian Zhang et al., is motivated by the optimisation strategy with spatial information on the perturbed paths and samples. The authors propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method (DS-MI-FGSM), which is designed to enhance the transferability of black-box attacks against speaker recognition systems. The DS-MI-FGSM method is notable for its minimal input requirements; it operates effectively with just a single data point and one model. By extending its operations to the neighbouring spaces of data and models, it is capable of generating adversarial examples that can deceive integrated models. To mitigate the risk of overfitting, the DS-MI-FGSM also incorporates a gradient masking technique, which further enhance its transferability. The authors have conducted a series of extensive experiments focusing on the task of speaker recognition. The experimental outcomes are compelling, illustrating the effectiveness of the DS-MI-FGSM method. It has been shown to achieve an impressive attack success rate of up to 92% on the target model within a black-box context, utilising only a single known model. These results underscore the potential of the DS-MI-FGSM as a powerful tool for enhancing the efficacy of black-box attacks against SRSs.</p><p>Based on the above, the collection of high-quality papers in this special issue presents engaging themes and underscores pivotal trends in trustworthy machine learning for behavioural and social computing. It is our hope that these selected works deepen the community's understanding of prevailing currents pathways for future exploration. We express our heartfelt thanks to all contributors for selecting this special issue as a venue to share their scholarly insights. Our appreciation goes to the readers whose insightful and constructive critiques have been immensely beneficial to the authors. Furthermore, we are grateful to the IET team for their unwavering support and guidance throughout the development of this special issue.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"541-543"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12353","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12353","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning has been extensively applied in behavioural and social computing, encompassing a spectrum of applications such as social network analysis, click stream analysis, recommendation of points of interest, and sentiment analysis. The datasets pertinent to these applications are inherently linked to human behaviour and societal dynamics, posing a risk of disclosing personal or sensitive information if mishandled or subjected to attacks. To safeguard individuals from potential privacy breaches, numerous governments have enacted a range of legal frameworks and regulatory measures. Examples include the Personal Information Protection Law of the People's Republic of China, the European Union's GDPR for privacy, and Australia's Artificial Intelligence Ethics Framework for many ethical aspects like fairness and reliability. Despite these legislative efforts, the technical implementation of these regulations to ensure trustworthy machine learning in behavioural and social computing remains a significant challenge. Trustworthy machine learning, being a fast-developing field, necessitates further in-depth exploration across multiple dimensions, including but not limited to fairness, privacy, reliability, explainability, robustness, and security, from a holistic and interdisciplinary viewpoint. This special issue is dedicated to facilitating the exchange and discussion of state-of-the-art research findings from academia and industry alike. The seven high-quality papers collected in this special issue place a particular emphasis on showcasing the latest advancements in concepts, algorithms, systems, platforms, and applications, as well as exploring future trends pertinent to the field of trustworthy machine learning for behavioural and social computing.
In the first paper, ‘Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification’, Yong Li et al. have developed a semi-supervised framework for the detection of anomalous merchants within the logistics sector. The methodology begins with an extensive data-driven examination comparing the behaviours of regular and anomalous customers. Utilising the insights from this analysis, the authors then implemented a contrastive learning for data augmentation, which capitalises on the imprecise labelling of customer data. Subsequently, their model is employed to identify customers exhibiting abnormal package reception and dispatch patterns in logistics operations. The framework's efficacy is substantiated by an empirical study that leverages 8 months of authentic order data, sourced from Beijing and provided by one of China's foremost logistics corporations.
The second paper, entitled ‘Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs’ by Sandra Carrasco Limeros et al., is advancing toward the creation of dependable motion prediction models, with a focus on the evaluation, robustness, and interpretability of the outcomes. The paper firstly underscored the principal disparities and deficiencies in existing evaluation methods, particularly the absence of diversity assessment and compatibility with traffic scenarios. Then, via a robustness analysis, the authors demonstrated that the inability to perceive road topography has a more pronounced effect on system performance than the failure to perceive other road users. Based on the above, the authors have presented outputs from the DenseTNT-intent model, which exhibit high-level intentions that are diverse, compliant, and precise, thereby enhancing the overall quality of the predictions. In general, the proposed methodology and our findings contribute significantly to the advancement of trustworthy motion prediction systems for autonomous vehicles.
The third paper, entitled ‘Ada-FFL: Adaptive computing fairness federated learning’ by Yue Cong et al., introduced an adaptive fairness federated learning approach, an adaptive fairness aggregation technique that accounts for the variances in local model updates during the federated learning process. This method offers a more flexible aggregation mechanism, enabling it to be adapted to a variety of federated datasets. Subsequently, the authors conducted a detailed examination of the impact that individual clients have on the fairness coefficient. Building on these insights, the authors proposed a novel approach that can significantly enhance the performance and fairness of federated learning systems in a more effective manner. A comprehensive series of experimental evaluations conducted on various federated datasets has yielded results that substantiate the superiority of the proposed approach. These results highlight the distinct advantages of the proposed method in terms of both model performance and fairness when compared to existing baseline methods.
The fourth paper, entitled ‘A topic-controllable keywords-to-text generator with knowledge base network’ by Li He et al., aims to craft informative and controllable text using colloquial social media language, by integrating topic-specific knowledge into a keyword-to-text generation framework. The paper introduces a novel generator to addresses the shortcomings of previous research that often overlooked unordered keywords and failed to leverage subject-specific information. The proposed generator is constructed upon the foundation of conditional language encoders. To steer the model towards generating informative and topic-driven text, the generator initially inputs a set of unordered keywords and employs subject matter to replicate prior human understanding. By incorporating an additional probability factor, the model enhances the probability of topic-related words appearing in the generated text, thereby influencing the overall distribution. Empirical research, based on both automatic evaluation metrics and human annotations, demonstrates that the proposed generator is capable of producing more informative and controllable text. It outperforms current state-of-the-art models, marking a significant advancement in the field of natural language generation.
The fifth paper, entitled ‘An intelligent prediction model of epidemic characters based on multi-feature’ by Xiaoying Wang et al., delves into the epidemiological traits of COVID-19, particularly in the context of the Omicron variant's dominance. Their proposed approach harnesses the power of big data analytics to provide a visual examination and representation of the disease's spread, thereby rendering the characteristics of the COVID-19 pandemic more precise and graphically interpretable. Despite the utility of current predictive models, the existing models require refinement to more accurately address the unique attributes of the Omicron strain. To this end, the paper has formulated two models: the logistic growth model and the β-SEIDR model, both of which are specifically tailored to predict the epidemiological patterns of the Omicron variant. The logistic growth model, in particular, is grounded in a substantial corpus of empirical data. The predictions, generated through this model, have demonstrated a high degree of congruence with actual observed data, suggesting its effectiveness in forecasting the trajectory of the Omicron-driven COVID-19 wave. This targeted modelling approach represents a step forward in enhancing the specificity and accuracy of COVID-19 predictions during the Omicron surge.
The sixth, paper entitled ‘Heterogeneous decentralised machine unlearning with seed model distillation’ by Guanhua Ye et al., has introduced a novel decentralised unlearning framework termed Heterogeneous Decentralised Unlearning with Seed (HDUS). This framework is distinguished by its use of distilled seed models to establish erasable ensembles for all participating clients. Furthermore, HDUS demonstrates compatibility with diverse on-device models, thereby enhancing its scalability and applicability in real-world scenarios. Through rigorous testing on three distinct real-world datasets, the authors have ascertained that HDUS delivers better performance than the state-of-the-art baselines in the field. This underscores the efficacy and potential of the HDUS framework in advancing the frontiers of decentralised unlearning technologies.
The seventh paper, entitled ‘Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems’ by Junjian Zhang et al., is motivated by the optimisation strategy with spatial information on the perturbed paths and samples. The authors propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method (DS-MI-FGSM), which is designed to enhance the transferability of black-box attacks against speaker recognition systems. The DS-MI-FGSM method is notable for its minimal input requirements; it operates effectively with just a single data point and one model. By extending its operations to the neighbouring spaces of data and models, it is capable of generating adversarial examples that can deceive integrated models. To mitigate the risk of overfitting, the DS-MI-FGSM also incorporates a gradient masking technique, which further enhance its transferability. The authors have conducted a series of extensive experiments focusing on the task of speaker recognition. The experimental outcomes are compelling, illustrating the effectiveness of the DS-MI-FGSM method. It has been shown to achieve an impressive attack success rate of up to 92% on the target model within a black-box context, utilising only a single known model. These results underscore the potential of the DS-MI-FGSM as a powerful tool for enhancing the efficacy of black-box attacks against SRSs.
Based on the above, the collection of high-quality papers in this special issue presents engaging themes and underscores pivotal trends in trustworthy machine learning for behavioural and social computing. It is our hope that these selected works deepen the community's understanding of prevailing currents pathways for future exploration. We express our heartfelt thanks to all contributors for selecting this special issue as a venue to share their scholarly insights. Our appreciation goes to the readers whose insightful and constructive critiques have been immensely beneficial to the authors. Furthermore, we are grateful to the IET team for their unwavering support and guidance throughout the development of this special issue.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.