{"title":"Direct Adversarial Latent Estimation to Evaluate Decision Boundary Complexity in Black Box Models","authors":"Ashley S. Dale;Lauren Christopher","doi":"10.1109/TAI.2024.3455308","DOIUrl":"https://doi.org/10.1109/TAI.2024.3455308","url":null,"abstract":"A trustworthy artificial intelligence (AI) model should be robust to perturbed data, where robustness correlates with the dimensionality and linearity of feature representations in the model latent space. Existing methods for evaluating feature representations in the latent space are restricted to white-box models. In this work, we introduce \u0000<italic>direct adversarial latent estimation</i>\u0000 (DALE) for evaluating the robustness of feature representations and decision boundaries for target black-box models. A surrogate latent space is created using a variational autoencoder (VAE) trained on a disjoint dataset from an object classification backbone, then the VAE latent space is traversed to create sets of adversarial images. An object classification model is trained using transfer learning on the VAE image reconstructions, then classifies instances in the adversarial image set. We propose that the number of times the classification changes in an image set indicates the complexity of the decision boundaries in the classifier latent space; more complex decision boundaries are found to be more robust. This is confirmed by comparing the DALE distributions to the degradation of the classifier F1 scores in the presence of adversarial attacks. This work enables the first comparisons of latent-space complexity between black box models by relating model robustness to complex decision boundaries.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6043-6053"},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AugDiff: Diffusion-Based Feature Augmentation for Multiple Instance Learning in Whole Slide Image","authors":"Zhuchen Shao;Liuxi Dai;Yifeng Wang;Haoqian Wang;Yongbing Zhang","doi":"10.1109/TAI.2024.3454591","DOIUrl":"https://doi.org/10.1109/TAI.2024.3454591","url":null,"abstract":"Multiple instance learning (MIL), a powerful strategy for weakly supervised learning, is able to perform various prediction tasks on gigapixel whole slide images (WSIs). However, the tens of thousands of patches in WSIs usually incur a vast computational burden for image augmentation, limiting the performance improvement in MIL. Currently, the feature augmentation-based MIL framework is a promising solution, while existing methods such as mixup often produce unrealistic features. To explore a more efficient and practical augmentation method, we introduce the diffusion model (DM) into MIL for the first time and propose a feature augmentation framework called AugDiff. The diverse generation capabilities of DM guarantee a various range of feature augmentations, while its iterative generation approach effectively preserves semantic integrity during these augmentations. We conduct extensive experiments over four distinct cancer datasets, two different feature extractors, and three prevalent MIL algorithms to evaluate the performance of AugDiff. Ablation study and visualization further verify the effectiveness. Moreover, we highlight AugDiff's higher quality augmented feature over image augmentation and its superiority over self-supervised learning. The generalization over external datasets indicates its broader applications. The code is open-sourced on \u0000<uri>https://github.com/szc19990412/AugDiff</uri>\u0000.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6617-6628"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring","authors":"Kristina Telegraph;Christos Kyrkou","doi":"10.1109/TAI.2024.3454566","DOIUrl":"https://doi.org/10.1109/TAI.2024.3454566","url":null,"abstract":"This work presents advancements in multiclass vehicle detection using unmanned aerial vehicle (UAV) cameras through the development of spatiotemporal object detection models. The study introduces a spatiotemporal vehicle detection dataset (STVD) containing \u0000<inline-formula><tex-math>$6600$</tex-math></inline-formula>\u0000 annotated sequential frame images captured by UAVs, enabling comprehensive training and evaluation of algorithms for holistic spatiotemporal perception. A YOLO-based object detection algorithm is enhanced to incorporate temporal dynamics, resulting in improved performance over single frame models. The integration of attention mechanisms into spatiotemporal models is shown to further enhance performance. Experimental validation demonstrates significant progress, with the best spatiotemporal model exhibiting a 16.22% improvement over single frame models, while it is demonstrated that attention mechanisms hold the potential for additional performance gains.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6159-6171"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spiking Diffusion Models","authors":"Jiahang Cao;Hanzhong Guo;Ziqing Wang;Deming Zhou;Hao Cheng;Qiang Zhang;Renjing Xu","doi":"10.1109/TAI.2024.3453229","DOIUrl":"https://doi.org/10.1109/TAI.2024.3453229","url":null,"abstract":"Recent years have witnessed spiking neural networks (SNNs) gaining attention for their ultra-low energy consumption and high biological plausibility compared with traditional artificial neural networks (ANNs). Despite their distinguished properties, the application of SNNs in the computationally intensive field of image generation is still under exploration. In this article, we propose the spiking diffusion models (SDMs), an innovative family of SNN-based generative models that excel in producing high-quality samples with significantly reduced energy consumption. In particular, we propose a temporal-wise spiking mechanism (TSM) that allows SNNs to capture more temporal features from a bio-plasticity perspective. In addition, we propose a threshold-guided strategy that can further improve the performances by up to 16.7% without any additional training. We also make the first attempt to use the ANN-SNN approach for SNN-based generation tasks. Extensive experimental results reveal that our approach not only exhibits comparable performance to its ANN counterpart with few spiking time steps, but also outperforms previous SNN-based generative models by a large margin. Moreover, we also demonstrate the high-quality generation ability of SDM on large-scale datasets, e.g., LSUN bedroom. This development marks a pivotal advancement in the capabilities of SNN-based generation, paving the way for future research avenues to realize low-energy and low-latency generative applications.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"132-143"},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constrained Multiobjective Optimization via Relaxations on Both Constraints and Objectives","authors":"Fei Ming;Bing Xue;Mengjie Zhang;Wenyin Gong;Huixiang Zhen","doi":"10.1109/TAI.2024.3454025","DOIUrl":"https://doi.org/10.1109/TAI.2024.3454025","url":null,"abstract":"Since most multiobjective optimization problems in real-world applications contain constraints, constraint-handling techniques (CHTs) are necessary for a multiobjective optimizer. However, existing CHTs give no relaxation to objectives, resulting in the elimination of infeasible dominated solutions that are promising (potentially useful but inferior) for detecting feasible regions and the constrained Pareto front (CPF). To overcome this drawback, in this work, we propose an objective relaxation technique that can preserve promising by relaxing objective function values, i.e., convergence, through an adaptively adjusted relaxation factor. Further, we develop a new constrained multiobjective optimization evolutionary algorithm (CMOEA) based on relaxations on both constraints and objectives. The proposed algorithm evolves one population by the constraint relaxation technique to preserve promising infeasible solutions and the other population by both objective and constraint relaxation techniques to preserve promising infeasible dominated solutions. In this way, our method can overcome the drawback of existing CHTs. Besides, an archive update strategy is designed to maintain encountered feasible solutions by the two populations to approximate the CPF. Experiments on challenging benchmark problems and real-world problems have demonstrated the superiority or at least competitiveness of our proposed CMOEA. Moreover, to verify the generality of the objective relaxation technique, we embed it into two existing CMOEA frameworks and the results show that it can significantly improve the performance in handling challenging problems.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6709-6722"},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristiano da Costa Cunha;Wei Liu;Tim French;Ajmal Mian
{"title":"Q-Cogni: An Integrated Causal Reinforcement Learning Framework","authors":"Cristiano da Costa Cunha;Wei Liu;Tim French;Ajmal Mian","doi":"10.1109/TAI.2024.3453230","DOIUrl":"https://doi.org/10.1109/TAI.2024.3453230","url":null,"abstract":"We present \u0000<italic>Q-Cogni</i>\u0000, an algorithmically integrated causal reinforcement learning framework that redesigns \u0000<italic>Q-Learning</i>\u0000 to improve the learning process with causal inference. \u0000<italic>Q-Cogni</i>\u0000 achieves improved policy quality and learning efficiency with a prelearned structural causal model of the environment, queried to guide the policy learning process with an understanding of cause-and-effect relationships in a state-action space. By doing so, we not only leverage the sample efficient techniques of reinforcement learning but also enable reasoning about a broader set of policies and bring higher degrees of interpretability to decisions made by the reinforcement learning agent. We apply \u0000<italic>Q-Cogni</i>\u0000 on vehicle routing problem (VRP) environments including a real-world dataset of taxis in New York City using the Taxi and Limousine Commission trip record data. We show \u0000<italic>Q-Cogni's</i>\u0000 capability to achieve an optimally guaranteed policy (total trip distance) in 76% of the cases when comparing to shortest-path-search methods and outperforming (shorter distances) state-of-the-art reinforcement learning algorithms in 66% of cases. Additionally, since \u0000<italic>Q-Cogni</i>\u0000 does not require a complete global map, we show that it can start efficiently routing with partial information and improve as more data is collected, such as traffic disruptions and changes in destination, making it ideal for deployment in real-world dynamic settings.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6186-6195"},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Policy Consensus-Based Distributed Deterministic Multi-Agent Reinforcement Learning Over Directed Graphs","authors":"Yifan Hu;Junjie Fu;Guanghui Wen;Changyin Sun","doi":"10.1109/TAI.2024.3452678","DOIUrl":"https://doi.org/10.1109/TAI.2024.3452678","url":null,"abstract":"Learning efficient coordination policies over continuous state and action spaces remains a huge challenge for existing distributed multi-agent reinforcement learning (MARL) algorithms. In this article, the classic deterministic policy gradient (DPG) method is extended to the distributed MARL domain to handle the continuous control policy learning issue for a team of homogeneous agents connected through a directed graph. A theoretical on-policy distributed actor–critic algorithm is first proposed based on a local DPG theorem, which considers observation-based policies, and incorporates consensus updates for the critic and actor parameters. Stochastic approximation theory is then used to obtain asymptotic convergence results of the algorithm under standard assumptions. Thereafter, a practical distributed deterministic actor–critic algorithm is proposed by integrating the theoretical algorithm with the deep reinforcement learning training architecture, which achieves better scalability, exploration ability, and data efficiency. Simulations are carried out in standard MARL environments with continuous action spaces, where the results demonstrate that the proposed distributed algorithm achieves comparable learning performance to solid centralized trained baselines while demanding much less communication resources.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"118-131"},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142975992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FIMKD: Feature-Implicit Mapping Knowledge Distillation for RGB-D Indoor Scene Semantic Segmentation","authors":"Wujie Zhou;Yuxiang Xiao;Yuanyuan Liu;Qiuping Jiang","doi":"10.1109/TAI.2024.3452052","DOIUrl":"https://doi.org/10.1109/TAI.2024.3452052","url":null,"abstract":"Depth images are often used to improve the geometric understanding of scenes owing to their intuitive distance properties. Although there have been significant advancements in semantic segmentation tasks using red–green–blue-depth (RGB-D) images, the complexity of existing methods remains high. Furthermore, the requirement for high-quality depth images increases the model inference time, which limits the practicality of these methods. To address this issue, we propose a feature-implicit mapping knowledge distillation (FIMKD) method and a cross-modal knowledge distillation (KD) architecture to leverage deep modal information for training and reduce the model dependence on this information during inference. The approach comprises two networks: FIMKD-T, a teacher network that uses RGB-D data, and FIMKD-S, a student network that uses only RGB data. FIMKD-T extracts high-frequency information using the depth modality and compensates for the loss of RGB details due to a reduction in resolution during feature extraction by the high-frequency feature enhancement module, thereby enhancing the geometric perception of semantic features. In contrast, the FIMKD-S network does not employ deep learning techniques; instead, it uses a nonlearning approach to extract high-frequency information. To enable the FIMKD-S network to learn deep features, we propose a feature-implicit mapping KD for feature distillation. This mapping technique maps the features in channel and space to a low-dimensional hidden layer, which helps to avoid inefficient single-pattern student learning. We evaluated the proposed FIMKD-S* (FIMKD-S with KD) on the NYUv2 and SUN-RGBD datasets. The results demonstrate that both FIMKD-T and FIMKD-S* achieve state-of-the-art performance. Furthermore, FIMKD-S* provides the best performance balance.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6488-6499"},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneous Hypergraph Embedding for Node Classification in Dynamic Networks","authors":"Malik Khizar Hayat;Shan Xue;Jia Wu;Jian Yang","doi":"10.1109/TAI.2024.3450658","DOIUrl":"https://doi.org/10.1109/TAI.2024.3450658","url":null,"abstract":"Graphs are a foundational way to represent scenarios where objects interact in pairs. Recently, graph neural networks (GNNs) have become widely used for modeling simple graph structures, either in homogeneous or heterogeneous graphs, where edges represent pairwise relationships between nodes. However, many real-world situations involve more complex interactions where multiple nodes interact simultaneously, as observed in contexts such as social groups and gene-gene interactions. Traditional graph embeddings often fail to capture these multifaceted nonpairwise dynamics. A hypergraph, which generalizes a simple graph by connecting two or more nodes via a single hyperedge, offers a more efficient way to represent these interactions. While most existing research focuses on homogeneous and static hypergraph embeddings, many real-world networks are inherently heterogeneous and dynamic. To address this gap, we propose a GNN-based embedding for dynamic heterogeneous hypergraphs, specifically designed to capture nonpairwise interactions and their evolution over time. Unlike traditional embedding methods that rely on distance or meta-path-based strategies for node neighborhood aggregation, a \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-hop neighborhood strategy is introduced to effectively encapsulate higher-order interactions in dynamic networks. Furthermore, the information aggregation process is enhanced by incorporating semantic hyperedges, further enriching hypergraph embeddings. Finally, embeddings learned from each timestamp are aggregated using a mean operation to derive the final node embeddings. Extensive experiments on five real-world datasets, along with comparisons against homogeneous, heterogeneous, and hypergraph-based baselines (both static and dynamic), demonstrate the robustness and superiority of our model.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5465-5477"},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiuhua Wang;Shuai Wang;Yiwei Li;Fengrui Fan;Shikang Li;Xiaodong Lin
{"title":"Differentially Private and Heterogeneity-Robust Federated Learning With Theoretical Guarantee","authors":"Xiuhua Wang;Shuai Wang;Yiwei Li;Fengrui Fan;Shikang Li;Xiaodong Lin","doi":"10.1109/TAI.2024.3446759","DOIUrl":"https://doi.org/10.1109/TAI.2024.3446759","url":null,"abstract":"Federated learning (FL) is a popular distributed paradigm where enormous clients collaboratively train a machine learning (ML) model under the orchestration of a central server without knowing the clients’ private raw data. The development of effective FL algorithms faces multiple practical challenges including data heterogeneity and clients’ privacy protection. Despite that numerous attempts have been made to deal with data heterogeneity or rigorous privacy protection, none have effectively tackled both issues simultaneously. In this article, we propose a differentially private and heterogeneity-robust FL algorithm, named \u0000<monospace>DP-FedCVR</monospace>\u0000 to mitigate the data heterogeneity by following the client-variance-reduction strategy. Besides, it adopts a sophisticated differential privacy (DP) mechanism where the privacy-amplified strategy is applied, to achieve a rigorous privacy protection guarantee. We show that the proposed \u0000<monospace>DP-FedCVR</monospace>\u0000 algorithm maintains its heterogeneity-robustness though DP noises are incorporated, while achieving a sublinear convergence rate for a nonconvex FL problem. Numerical experiments based on image classification tasks are presented to demonstrate that \u0000<monospace>DP-FedCVR</monospace>\u0000 provides superior performance over the benchmark algorithms in the presence of data heterogeneity and various DP privacy budgets.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6369-6384"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}