{"title":"Knowledge-Aware Intent-Guided Contrastive Learning for Next-basket Recommendation","authors":"Chuyuan Wei;Baojie Yuan;Chuanhao Hu;Jinzhe Li;Chang-Dong Wang;Mohsen Guizani","doi":"10.1109/TETCI.2024.3485731","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3485731","url":null,"abstract":"Next-Basket Recommendation (NBR) aims to predict a series of items in the next basket based on users' current basket sequence. However, the existing works merely consider the explicit auxiliary signals, and intent may contribute to the refinement of basket representations but bring some uncertain bias.To deal with the problems mentioned above, this paper proposes a knowledge-aware intent-guided contrastive learning method called KICL for NBR. Specifically, we construct a collaborative bipartite graph to learn basket representations and item representations, while at the same time, a knowledge graph is constructed based on items and their attributes to capture implicit auxiliary signals. Furthermore, the item attributes within a basket are weighted and summed up to extract the corresponding intent. To reduce the uncertainty bias brought from item diversity, a contrastive regularizer is designed for better basket representation refinement. Extensive experiments on two real-world datasets demonstrate the effectiveness of KICL, where the maximum improvement can reach 15.91% in terms of F1@10 on Dunnhumby.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1990-2000"},"PeriodicalIF":5.3,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leaders and Collaborators: Addressing Sparse Reward Challenges in Multi-Agent Reinforcement Learning","authors":"Shaoqi Sun;Hui Liu;Kele Xu;Bo Ding","doi":"10.1109/TETCI.2024.3488772","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3488772","url":null,"abstract":"Cooperative multi-agent reinforcement learning (MARL) has emerged as an effective tool for addressing complex control tasks. However, sparse team rewards present significant challenges for MARL, leading to low exploration efficiency, slow learning speed, and homogenized behaviors among agents. To address these issues, we propose a novel Leader-Collaborator (LC) MARL framework inspired by human social collaboration. The LC framework introduces parallel online knowledge distillation for policy networks (KDPN). KDPN extracts knowledge from two policy networks with different training objectives: one aims to maximize individual rewards, while the other aims to maximize team rewards. The extracted knowledge is utilized to construct team leaders and collaborators. By effectively balancing individual and team rewards, our approach enhances exploration efficiency and promotes behavioral diversity among agents. This addresses the issue of low learning efficiency caused by the lack of objectives early in the agent's learning process and facilitates the development of more effective and differentiated team interaction policies. Additionally, we present the Self-Repairing Strategy (SRS) and Self-Augmenting Strategy (SAS) to facilitate team policies learning while preserving the initial team goal. We evaluate the effectiveness of the LC framework by conducting extensive experiments on the Multi-Agent Particle Environment (MPE), the Google Research Football (GRF), and StarCraft Multi-Agent Challenge (SMAC) with varying levels of difficulty. Our experimental results demonstrate that LC significantly improves the efficiency of the agent's exploration, achieves state-of-the-art performance, and accelerates the learning of the optimal policy. Specifically, in the SMAC scenarios, our method increases the winning rate by 21.9%, increases the average cumulative reward by 12%, and reduces the training time by 57% to achieve optimal performance.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1976-1989"},"PeriodicalIF":5.3,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NeuronsGym: A Hybrid Framework and Benchmark for Robot Navigation With Sim2Real Policy Learning","authors":"Haoran Li;Guangzheng Hu;Shasha Liu;Mingjun Ma;Yaran Chen;Dongbin Zhao","doi":"10.1109/TETCI.2024.3488732","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3488732","url":null,"abstract":"The rise of embodied AI has greatly improved the possibility of general mobile agent systems. At present, many evaluation platforms with rich scenes, high visual fidelity, and various application scenarios have been developed. In this paper, we present a hybrid framework named NeuronsGym that can be used for policy learning of robot tasks, covering a simulation platform for training policy, and a physical system for studying sim2real problems. Unlike most current single-task, slow-moving robotic platforms, our framework provides agile physical robots with a wider range of speeds and can be employed to train robotic navigation policies. At the same time, in order to evaluate the safety of robot navigation, we propose a safety-weighted path length (SFPL) to improve the safety evaluation in the current mobile robot navigation. Based on this platform, we build a new benchmark for navigation tasks under this platform by comparing the current mainstream sim2real methods, and hold the 2022 IEEE Conference on Games (CoG) RoboMaster sim2real challenge. We release the codes of this framework and hope that this platform can promote the development of more flexible and agile general mobile agent algorithms.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2491-2505"},"PeriodicalIF":5.3,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peixuan Ge;Tao Yan;Pak Kin Wong;Zheng Li;In Neng Chan;Hon Ho Yu;Chon In Chan;Liang Yao;Ying Hu;Shan Gao
{"title":"Simultaneous Segmentation and Classification of Esophageal Lesions Using Attention Gating Pyramid Vision Transformer","authors":"Peixuan Ge;Tao Yan;Pak Kin Wong;Zheng Li;In Neng Chan;Hon Ho Yu;Chon In Chan;Liang Yao;Ying Hu;Shan Gao","doi":"10.1109/TETCI.2024.3485704","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3485704","url":null,"abstract":"Automatic and accurate segmentation and classification of esophageal lesions are two essential tasks to assist endoscopists in Upper Gastrointestinal Endoscopy. However, there is no intelligent system that can diagnose more lesion types, handle multiple tasks simultaneously, and be more accurate in clinical work. Therefore, we present an innovative Multi-Task deep learning architecture named Attention Gating Pyramid Vision Transformer (AGPVT), which provides a solution for the accurate classification and precise segmentation of lesion types and regions simultaneously. The proposed AGPVT combines the benefits of cutting-edge deep learning model designs with Multi-Task Learning (MTL) in order to advance the field. Furthermore, a patch-wise multi-head attention gating method alongside a hybrid design MTL decoder, is employed as the core driving architecture of the AGPVT. Comprehensive experiments are conducted on a multicenter dataset which contains esophageal cancer, Barrett's esophagus, esophageal protruded lesions, esophagitis, and normal esophagus. Experimental results show that the proposed AGPVT achieves a classification accuracy of 96.84%, an IoU score of 85.61%, and a Dice score of 90.75%, outperforming existing methods and demonstrating its effectiveness in this domain.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1961-1975"},"PeriodicalIF":5.3,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative Probabilistic Meta-Learning for Few-Shot Image Classification","authors":"Meijun Fu;Xiaomin Wang;Jun Wang;Zhang Yi","doi":"10.1109/TETCI.2024.3483255","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3483255","url":null,"abstract":"Meta-learning, a rapidly advancing area in computational intelligence, leverages prior knowledge from related tasks to facilitate the swift adaptation to new tasks with limited data. A critical challenge in meta-learning is the quantification of model uncertainty. In this paper, we propose a novel meta-learning method, Generative Probabilistic Meta-Learning (GPML), designed for few-shot image classification. GPML extends the Probably Approximately Correct-Bayes (PAC-Bayes) framework, initially formulated for single-task scenarios, to meta-learning across multiple tasks. This extension not only provides theoretical generalization guarantees for meta-learning but also effectively captures model uncertainty through variational parameters. To enhance the expressiveness of approximated posteriors in Bayesian inference, GPML incorporates implicit modeling, which defines probability distributions over task-specific parameters in a data-driven manner. This is achieved by designing a generative model structure that integrates task-dependent prior knowledge into the model inference process. We conduct extensive multidimensional performance evaluations on few-shot image classification tasks across various benchmarks, demonstrating that GPML outperforms existing state-of-the-art meta-learning methods. Additionally, ablation studies focusing on model components, the PAC-Bayes framework, and implicit modeling validate the performance improvements attributed to the proposed generative model structure, learning framework, and modeling approach.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1947-1960"},"PeriodicalIF":5.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Accuracy-Privacy Trade-Off in Differentially Private Split Learning","authors":"Ngoc Duy Pham;Khoa T. Phan;Naveen Chilamkurti","doi":"10.1109/TETCI.2024.3485723","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3485723","url":null,"abstract":"Split learning (SL) aims to protect user data privacy by distributing deep models between the client-server and keeping private data locally. Only processed or ‘smashed’ data can be transmitted from the clients to the server during the SL process. However, recently proposed model inversion attacks can recover original data from smashed data. To enhance privacy protection against such attacks, one strategy is to adopt differential privacy (DP), which involves safeguarding the smashed data at the expense of some accuracy loss. This paper presents the first investigation into the impact on accuracy when training multiple clients in SL with various privacy requirements. Subsequently, we propose an approach that reviews the DP noise distributions of other clients during client training to address the identified accuracy degradation. We also examine the application of DP to the local model of SL to gain insights into the trade-off between accuracy and privacy. Specifically, the findings reveal that introducing noise in the later local layers offers the most favorable balance between accuracy and privacy. Drawing from our insights in the shallower layers, we propose an approach to reduce the size of smashed data to minimize data leakage while maintaining higher accuracy, optimizing the accuracy-privacy trade-off. Additionally, smashed data of a smaller size reduces communication overhead on the client side, mitigating one of the notable drawbacks of SL. Intensive experiments on various datasets demonstrate that our proposed approaches provide an optimal trade-off for incorporating DP into SL, ultimately enhancing the training accuracy for multi-client SL with varying privacy requirements.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"988-1000"},"PeriodicalIF":5.3,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning","authors":"Amit Kumar Bhuyan;Hrishikesh Dutta;Subir Biswas","doi":"10.1109/TETCI.2024.3482855","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3482855","url":null,"abstract":"This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without the requirement of a large audio database for training. An unsupervised online update mechanism is proposed for the Federated Learning model which depends on cosine similarity of speaker embeddings. Moreover, the proposed diarization system solves the problem of speaker change detection via. unsupervised segmentation techniques using Hotelling's t-squared Statistic and Bayesian Information Criterion. In this new approach, speaker change detection is biased around detected quasi-silences, which reduces the severity of the trade-off between the missed detection and false detection rates. Additionally, the computational overhead due to frame-by-frame identification of speakers is reduced via. unsupervised clustering of speech segments. The results demonstrate the effectiveness of the proposed training method in the presence of non-IID speech data. It also shows a considerable improvement in the reduction of false and missed detection at the segmentation stage, while reducing the computational overhead. Improved accuracy and reduced computational cost makes the mechanism suitable for real-time speaker diarization across a distributed IoT audio network.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1934-1946"},"PeriodicalIF":5.3,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Segmentation Method of Road Surface Covering Objects Based on CBAM UNET++","authors":"Yang Sen;Wang Zhenmin;Song Wenlong;Yang Changqun","doi":"10.1109/TETCI.2024.3462854","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3462854","url":null,"abstract":"Dangerous road surface covering objects such as wet slippery, ice and snow will directly affect the safety performance. Therefore, the detection and visualization of road surface covering objects' status under complex weather and road conditions are of great significance to the safety of human driving and unmanned driving. However, the complex road conditions (vehicles and pedestrians blocking the road surface, the area of the measured coverage is small, and the ambient light changes drastically) limit the accuracy of road surface coverage objects' state detection in the natural environment. Given the above problems, this paper reconstructs the image prepossessing process in road ice and snow cover segmentation by introducing background extraction before image segmentation, and then proposes a road surface coverage objects segmentation method based on Convolutional Block Attention Module UNet++ (CBAM UNet++). First, through the performance comparison of different background extraction algorithms, the Content-adaptive Resizing Framework (CARF) background extraction algorithm is used to eliminate the interference of vehicles, pedestrians and other objects in complex road conditions. Then, the CBAM UNet++ model is established to segment the four types of road surface coverings objects in the outfield to improve detection accuracy under conditions of small area coverage objects and severe illumination changes. Experimental results indicate, after introducing background extraction, the segmentation accuracy under different lighting conditions can be improved by 5.6%--17.7%; Compared with traditional methods for segmenting objects on road surfaces, the CBAM UNet++ method demonstrates an average segmentation accuracy improvement of at least 6.5% under six different lighting conditions.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1924-1933"},"PeriodicalIF":5.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunyan Diao;Dafang Zhang;Wei Liang;Man Jiang;Kuanching Li
{"title":"A Novel Attention-Based Dynamic Multi-Graph Spatial-Temporal Graph Neural Network Model for Traffic Prediction","authors":"Chunyan Diao;Dafang Zhang;Wei Liang;Man Jiang;Kuanching Li","doi":"10.1109/TETCI.2024.3462513","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3462513","url":null,"abstract":"Traffic flow prediction is a non-negligible part of intelligent transportation and mobility. Unfortunately, the unique non-linearity and complex spatial-ST-correlation of transport flow data suggest considerable challenges in prediction. The dynamic interaction of multiple spatial relations greatly influences traffic flow prediction. However, the existing spatial-temporal prediction algorithms are based on graph convolution to capture global or heterogeneous relationships, and simpler graph convolution models cannot accurately capture complex dynamic spatial relationships. To address the issues as mentioned above, this study proposes an attention-based multi-graph dynamic spatial-temporal prediction model ADMSTGCN to capture a variety of dynamic interaction relationships in traffic flow. First, we use a distance graph to explore the relationships between adjacent distances and use a semantic graph to mine spatial relationships between nodes that are far apart but have similar relationships, then fuse these two graphs to obtain a fusion graph with multiple spatial interaction relationships. The correlations between different neighbors are then further learned through a dynamic multi-graph spatial-temporal learning module that aggregates the features of different neighbors through gated graph convolution and attention mechanisms to capture various dynamic and complex spatial-temporal interactions. Experimental evaluations show that the framework proposed outperforms existing methods with better results in the analysis performed with publicly available datasets and also demonstrates the importance of capturing multiple interactions of spatial-temporal relationships.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1910-1923"},"PeriodicalIF":5.3,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junzhe Zhang;Gexin Liu;Junteng Zhang;Dandan Ding;Zhan Ma
{"title":"DeepPCC: Learned Lossy Point Cloud Compression","authors":"Junzhe Zhang;Gexin Liu;Junteng Zhang;Dandan Ding;Zhan Ma","doi":"10.1109/TETCI.2024.3467192","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3467192","url":null,"abstract":"We propose DeepPCC, an end-to-end learning-based approach for the lossy compression of large-scale object point clouds. For both geometry and attribute components, we introduce the Multiscale Neighborhood Information Aggregation (NIA) mechanism, which applies resolution downscaling progressively (<italic>i.e.</i>, dyadic downsampling of geometry and average pooling of attribute) and combines sparse convolution and local self-attention at each resolution scale for effective feature representation. Under a simple autoencoder structure, scale-wise NIA blocks are stacked as the analysis and synthesis transform in the encoder-decoder pair to best characterize spatial neighbors for accurate approximation of geometry occupancy probability and attribute intensity. Experiments demonstrate that DeepPCC remarkably outperforms state-of-the-art rules-based MPEG G-PCC and learning-based solutions both quantitatively and qualitatively, providing strong evidence that DeepPCC is a promising solution for emerging AI-based PCC.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1897-1909"},"PeriodicalIF":5.3,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}