{"title":"Hierarchical Spatial-Temporal Masked Contrast for Skeleton Action Recognition","authors":"Wenming Cao;Aoyu Zhang;Zhihai He;Yicha Zhang;Xinpeng Yin","doi":"10.1109/TAI.2024.3430260","DOIUrl":"https://doi.org/10.1109/TAI.2024.3430260","url":null,"abstract":"In the field of 3-D action recognition, self-supervised learning has shown promising results but remains a challenging task. Previous approaches to motion modeling often relied on selecting features solely from the temporal or spatial domain, which limited the extraction of higher-level semantic information. Additionally, traditional one-to-one approaches in multilevel comparative learning overlooked the relationships between different levels, hindering the learning representation of the model. To address these issues, we propose the hierarchical spatial-temporal masked network (HSTM) for learning 3-D action representations. HSTM introduces a novel masking method that operates simultaneously in both the temporal and spatial dimensions. This approach leverages semantic relevance to identify meaningful regions in time and space, guiding the masking process based on semantic richness. This guidance is crucial for learning useful feature representations effectively. Furthermore, to enhance the learning of potential features, we introduce cross-level distillation (CLD) to extend the comparative learning approach. By training the model with two types of losses simultaneously, each level of the multilevel comparative learning process can be guided by levels rich in semantic information. This allows for more effective supervision of comparative learning, leading to improved performance. Extensive experiments conducted on the NTU-60, NTU-120, and PKU-MMD datasets demonstrate the effectiveness of our proposed framework. The learned action representations exhibit strong transferability and achieve state-of-the-art results.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5801-5814"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study of Enhancing Federated Learning on Non-IID Data With Server Learning","authors":"Van Sy Mai;Richard J. La;Tao Zhang","doi":"10.1109/TAI.2024.3430250","DOIUrl":"10.1109/TAI.2024.3430250","url":null,"abstract":"Federated learning (FL) has emerged as a means of distributed learning using local data stored at clients with a coordinating server. Recent studies showed that FL can suffer from poor performance and slower convergence when training data at the clients are not independent and identically distributed (IID). Here, we consider auxiliary server learning (SL) as a \u0000<italic>complementary</i>\u0000 approach to improving the performance of FL on non-IID data. Our analysis and experiments show that this approach can achieve significant improvements in both model accuracy and convergence time even when the dataset utilized by the server is small and its distribution differs from that of the clients’ aggregate data. Moreover, experimental results suggest that auxiliary SL delivers benefits when employed together with other techniques proposed to mitigate the performance degradation of FL on non-IID data.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5589-5604"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Observer-Based Adaptive Fuzzy Control for Singular Systems with Nonlinear Perturbation and Actuator Saturation","authors":"Qingtan Meng;Qian Ma","doi":"10.1109/TAI.2024.3429052","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429052","url":null,"abstract":"This article investigates the adaptive fuzzy control problem for singular systems with actuator saturation and nonlinear perturbation, where the system consists of two coupled differential and algebraic subsystems. To cope with the actuator saturation, a new auxiliary system whose order is the same as the differential subsystem is introduced. With the help of the backstepping method and adaptive fuzzy control method, an observer-based adaptive output feedback tracking control approach is utilized. Under the designed controller, it is proved that the closed-loop system is impulse-free and regular, and all the involved signals are bounded. Furthermore, it is ensured that the tracking error can be adjusted by the errors between the control inputs and the corresponding saturated inputs, as well as the design parameters. Finally, simulation studies demonstrate the validity of the control approach.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5090-5099"},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142443115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xue Hu;Fabrizio Cutolo;Hisham Iqbal;Johann Henckel;Ferdinando Rodriguez y Baena
{"title":"Artificial Intelligence-Driven Framework for Augmented Reality Markerless Navigation in Knee Surgery","authors":"Xue Hu;Fabrizio Cutolo;Hisham Iqbal;Johann Henckel;Ferdinando Rodriguez y Baena","doi":"10.1109/TAI.2024.3429048","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429048","url":null,"abstract":"Conventional orthopedic navigation systems depend on marker-based tracking, which may introduce additional skin incisions, increase the risk and discomfort for the patient, and entail increased workflow complexity. The guidance is conveyed via 2-D monitors, which may distract the surgeon and increase the cognitive burden. This study presents an artificial intelligence (AI)—driven surgical navigation framework for knee replacement surgery. The system comprises an augmented reality (AR) interface that combines an occlusions-robust deep learning-based markerless bone tracking and registration algorithm with a commercial HoloLens 2 headset calibrated for the user's perspective on both eyes. The feasibility of such a system in navigating a bone drilling task is investigated with an experienced orthopedic surgeon on three cadaveric knees under realistic operating room (OR) conditions. After registering an implant model to computed tomography (CT) scans, the preoperative plans are determined based on the location of the fixation pins. Navigation accuracy is quantified using a highly accurate optical tracking system. The achieved drilling error is 7.88 \u0000<inline-formula><tex-math>$pm$</tex-math></inline-formula>\u0000 2.41 mm in translation and 7.36 \u0000<inline-formula><tex-math>$pm$</tex-math></inline-formula>\u0000 1.77\u0000<inline-formula><tex-math>${}^{boldsymbol{circ}}$</tex-math></inline-formula>\u0000 in orientation. The results demonstrate the viability of integrating AI and AR technology to navigate knee surgery.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5205-5215"},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10599938","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zihao Li;Pan Gao;Kang You;Chuan Yan;Manoranjan Paul
{"title":"Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation","authors":"Zihao Li;Pan Gao;Kang You;Chuan Yan;Manoranjan Paul","doi":"10.1109/TAI.2024.3429050","DOIUrl":"10.1109/TAI.2024.3429050","url":null,"abstract":"Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a global attention-guided dual-domain feature learning network (GAD) to address the above-mentioned issues. We first devise the contextual position-enhanced transformer (CPT) module, which is armed with an improved global attention mechanism, to produces a global-aware input embedding that serves as the guidance to subsequent aggregations. Then, the dual-domain K-nearest neighbor feature fusion (DKFF) is cascaded to conduct effective feature aggregation through novel dual-domain feature learning which appreciates both local geometric relations and long-distance semantic connections. Extensive experiments on multiple point cloud analysis tasks (e.g., classification, part segmentation, and scene semantic segmentation) demonstrate the superior performance of the proposed method and the efficacy of the devised modules.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5167-5178"},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141655081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiakai Gong;Nuo Yu;Fen Han;Bin Tang;Haolong Wu;Yuan Ge
{"title":"Energy Scheduling Optimization for Microgrids Based on Partially Observable Markov Game","authors":"Jiakai Gong;Nuo Yu;Fen Han;Bin Tang;Haolong Wu;Yuan Ge","doi":"10.1109/TAI.2024.3428510","DOIUrl":"https://doi.org/10.1109/TAI.2024.3428510","url":null,"abstract":"Microgrids (MGs) are essential for enhancing energy efficiency and minimizing power usage through the regulation of energy storage systems. Nevertheless, privacy-related concerns obstruct the real-time precise regulation of these systems due to unavailable state-of-charge (SOC) data. This article introduces a self-adaptive energy scheduling optimization framework for MGs that operates without SOC information, utilizing a partially observable Markov game (POMG) to decrease energy usage. Furthermore, to develop an optimal energy scheduling strategy, a MG system optimization approach using recurrent multiagent deep deterministic policy gradient (RMADDPG) is presented. This method is evaluated against other existing techniques such as MADDPG, deterministic recurrent policy gradient (DRPG), and independent Q-learning (IQL), demonstrating reductions in electrical energy consumption by 4.29%, 5.56%, and 12.95%, respectively, according to simulation outcomes.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5371-5380"},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"U-Park: A User-Centric Smart Parking Recommendation System for Electric Shared Micromobility Services","authors":"Sen Yan;Noel E. O’Connor;Mingming Liu","doi":"10.1109/TAI.2024.3428513","DOIUrl":"https://doi.org/10.1109/TAI.2024.3428513","url":null,"abstract":"Electric shared micromobility services (ESMSs) has become a vital element within the mobility as a service framework, contributing to sustainable transportation systems. However, existing ESMS face notable design challenges such as shortcomings in integration, transparency, and user-centered approaches, resulting in increased operational costs and decreased service quality. A key operational issue for ESMS revolves around parking, particularly ensuring the availability of parking spaces as users approach their destinations. For instance, a recent study illustrated that nearly 13% of shared e-bike users in Dublin, Ireland, encounter difficulties parking their e-bikes due to inadequate planning and guidance. In response, we introduce U-Park, a user-centric smart parking recommendation system designed for ESMS, providing tailored recommendations to users by analyzing their historical mobility data, trip trajectory, and parking space availability. We present the system architecture, implement it, and evaluate its performance using real-world data from an Irish-based shared e-bike provider, MOBY Bikes. Our results illustrate U-Park's ability to predict a user's destination within a shared e-bike system, achieving an approximate accuracy rate of over 97.60%, all without requiring direct user input. Experiments have proven that this predictive capability empowers U-Park to suggest the optimal parking station to users based on the availability of predicted parking spaces, improving the probability of obtaining a parking spot by 24.91% on average and 29.66% on maximum when parking availability is limited.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5179-5193"},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10599560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Human-in-the-Middle Attack Against Object Detection Systems","authors":"Han Wu;Sareh Rowlands;Johan Wahlström","doi":"10.1109/TAI.2024.3428520","DOIUrl":"https://doi.org/10.1109/TAI.2024.3428520","url":null,"abstract":"Object detection systems using deep learning models have become increasingly popular in robotics thanks to the rising power of central processing units (CPUs) and graphics processing units (GPUs) in embedded systems. However, these models are susceptible to adversarial attacks. While some attacks are limited by strict assumptions on access to the detection system, we propose a novel hardware attack inspired by Man-in-the-Middle attacks in cryptography. This attack generates a universal adversarial perturbations (UAPs) and injects the perturbation between the universal serial bus (USB) camera and the detection system via a hardware attack. Besides, prior research is misled by an evaluation metric that measures the model accuracy rather than the attack performance. In combination with our proposed evaluation metrics, we significantly increased the strength of adversarial perturbations. These findings raise serious concerns for applications of deep learning models in safety-critical systems, such as autonomous driving.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"4884-4892"},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142443113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zidong Cao;Hao Ai;Athanasios V. Vasilakos;Lin Wang
{"title":"360° High-Resolution Depth Estimation via Uncertainty-Aware Structural Knowledge Transfer","authors":"Zidong Cao;Hao Ai;Athanasios V. Vasilakos;Lin Wang","doi":"10.1109/TAI.2024.3427068","DOIUrl":"https://doi.org/10.1109/TAI.2024.3427068","url":null,"abstract":"To predict high-resolution (HR) omnidirectional depth maps, existing methods typically leverage HR omnidirectional image (ODI) as the input via fully supervised learning. However, in practice, taking HR ODI as input is undesired due to resource-constrained devices. In addition, depth maps are often with lower resolution than color images. Therefore, in this article, we explore for the first time to estimate the HR omnidirectional depth directly from a low-resolution (LR) ODI, when no HR depth ground truth (GT) map is available. Our key idea is to transfer the scene structural knowledge from the HR image modality and the corresponding LR depth maps to achieve the goal of HR depth estimation without any extra inference cost. Specifically, we introduce ODI super-resolution (SR) as an auxiliary task and train both tasks collaboratively in a weakly supervised manner to boost the performance of HR depth estimation. The ODI SR task extracts the scene structural knowledge via uncertainty estimation. Buttressed by this, a scene structural knowledge transfer (SSKT) module is proposed with two key components. First, we employ a cylindrical implicit interpolation function (CIIF) to learn cylindrical neural interpolation weights for feature up-sampling and share the parameters of CIIFs between the two tasks. Then, we propose a feature distillation (FD) loss that provides extra structural regularization to help the HR depth estimation task learn more scene structural knowledge. Extensive experiments demonstrate that our weakly supervised method outperforms baseline methods, and even achieves comparable performance with the fully supervised methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5392-5402"},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}