Xuemeng Hui;Zhunga Liu;Jiaxiang Liu;Zuowei Zhang;Longfei Wang
{"title":"Visual–Semantic Fuzzy Interaction Network for Zero-Shot Learning","authors":"Xuemeng Hui;Zhunga Liu;Jiaxiang Liu;Zuowei Zhang;Longfei Wang","doi":"10.1109/TAI.2024.3524955","DOIUrl":"https://doi.org/10.1109/TAI.2024.3524955","url":null,"abstract":"Zero-shot learning (ZSL) aims to recognize unseen class image objects using manually defined semantic knowledge corresponding to both seen and unseen images. The key of ZSL lies in building the interaction between precise image data and fuzzy semantic knowledge. The fuzziness is attributed to the difficulty in quantifying human knowledge. However, the existing ZSL methods ignore the inherent fuzziness of semantic knowledge and treat it as precise data during building the visual–semantic interaction. This is not good for transferring semantic knowledge from seen classes to unseen classes. In order to solve this problem, we propose a visual–semantic fuzzy interaction network (VSFIN) for ZSL. VSFIN utilize an effective encoder–decoder structure, including a semantic prototype encoder (SPE) and visual feature decoder (VFD). The SPE and VFD enable the visual features to interact with semantic knowledge via cross-attention. To achieve visual–semantic fuzzy interaction in SPE and VFD, we introduce the concept of membership function in fuzzy set theory and design a membership loss function. This loss function allows for a certain degree of imprecision in visual–semantic interaction, thereby enabling VSFIN to becomingly utilize the given semantic knowledge. Moreover, we introduce the concept of rank sum test and propose a distribution alignment loss to alleviate the bias towards seen classes. Extensive experiments on three widely used benchmarks have demonstrated that VSFIN outperforms current state-of-the-art methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1345-1359"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Dai;Xiaopeng Hong;Yabin Wang;Zhiheng Ma;Dongmei Jiang;Yaowei Wang
{"title":"Prompt Customization for Continual Learning","authors":"Yong Dai;Xiaopeng Hong;Yabin Wang;Zhiheng Ma;Dongmei Jiang;Yaowei Wang","doi":"10.1109/TAI.2024.3524977","DOIUrl":"https://doi.org/10.1109/TAI.2024.3524977","url":null,"abstract":"Contemporary continual learning approaches typically select prompts from a pool, which function as supplementary inputs to a pretrained model. However, this strategy is hindered by the inherent noise of its selection approach when handling increasing tasks. In response to these challenges, we reformulate the prompting approach for continual learning and propose the prompt customization (PC) method. PC mainly comprises a prompt generation module (PGM) and a prompt modulation module (PMM). In contrast to conventional methods that employ hard prompt selection, PGM assigns different coefficients to prompts from a fixed-sized pool of prompts and generates tailored prompts. Moreover, PMM further modulates the prompts by adaptively assigning weights according to the correlations between input data and corresponding prompts. We evaluate our method on four benchmark datasets for three diverse settings, including the class, domain, and task-agnostic incremental learning tasks. Experimental results demonstrate consistent improvement (by up to 16.2%), yielded by the proposed method, over the state-of-the-art (SOTA) techniques. The code has been released online.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1373-1385"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiehua Zhang;Yuze Liu;Zhishu Shen;Xingjun Ma;Peng Qi;Zhijun Ding;Jiong Jin
{"title":"Learning from Heterogeneity: A Dynamic Learning Framework for Hypergraphs","authors":"Tiehua Zhang;Yuze Liu;Zhishu Shen;Xingjun Ma;Peng Qi;Zhijun Ding;Jiong Jin","doi":"10.1109/TAI.2024.3524984","DOIUrl":"https://doi.org/10.1109/TAI.2024.3524984","url":null,"abstract":"Graph neural network (GNN) has gained increasing popularity in recent years owing to its capability and flexibility in modeling complex graph structure data. Among all graph learning methods, hypergraph learning is a technique for exploring the implicit higher-order correlations when training the embedding space of the graph. In this article, we propose a hypergraph learning framework named <italic>learning from heterogeneity (LFH)</i> that is capable of dynamic hyperedge construction and attentive embedding update utilizing the heterogeneity attributes of the graph. Specifically, in our framework, the high-quality features are first generated by the pairwise fusion strategy that utilizes explicit graph structure information when generating initial node embedding. Afterward, a hypergraph is constructed through the dynamic grouping of implicit hyperedges, followed by the type-specific hypergraph learning process. To evaluate the effectiveness of our proposed framework, we conduct comprehensive experiments on several popular datasets with twelve state-of-the-art models on both node classification and link prediction tasks, which fall into categories of homogeneous pairwise graph learning, heterogeneous pairwise graph learning, and hypergraph learning. The experimental results demonstrate a significant performance gain (an average of 12.9% in node classification and 12.8% in link prediction) compared with recent state-of-the-art methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 6","pages":"1513-1528"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamically Scaled Temperature in Self-Supervised Contrastive Learning","authors":"Siladittya Manna;Soumitri Chattopadhyay;Rakesh Dey;Umapada Pal;Saumik Bhattacharya","doi":"10.1109/TAI.2024.3524979","DOIUrl":"https://doi.org/10.1109/TAI.2024.3524979","url":null,"abstract":"In contemporary self-supervised contrastive algorithms such as SimCLR and MoCo, the task of balancing attraction between two semantically similar samples and repulsion between two samples of different classes is primarily affected by the presence of hard negative samples. While the InfoNCE loss has been shown to impose penalties based on hardness, the temperature hyperparameter is the key to regulate the penalties and the tradeoff between uniformity and tolerance. In this work, we focus our attention on improving the performance of InfoNCE loss in self-supervised learning by proposing a novel cosine similarity dependent temperature scaling function to effectively optimize the distribution of the samples in the feature space. We also provide mathematical analyzes to support the construction of such a dynamically scaled temperature function. Experimental evidence shows that the proposed framework outperforms the contrastive loss-based SSL algorithms.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 6","pages":"1502-1512"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weakly Correlated Multimodal Domain Adaptation for Pattern Classification","authors":"Shuyue Wang;Zhunga Liu;Zuowei Zhang;Mohammed Bennamoun","doi":"10.1109/TAI.2024.3524976","DOIUrl":"https://doi.org/10.1109/TAI.2024.3524976","url":null,"abstract":"Multimodal domain adaptation (MMDA) aims to transfer knowledge across different domains that contain multimodal data. Current methods typically assume that both the source and target domains have paired multimodal data with the same modalities, allowing for direct knowledge transfer between corresponding types of data. However, in certain applications, the source domain benefits from advanced sensors and equipment, capturing more modalities than those available in the target domain. As a result, the information from the source modalities may not strongly align with that of the target modalities. This weak correlation hinders the effective utilization of all source data for the target domain. To address this challenge, we propose a weakly correlated multimodal domain adaptation (WCMMDA) method for pattern classification. WCMMDA is designed to acquire the modality-independent and category-related knowledge from the source domain, enabling the full utilization of available source modalities for effective knowledge transfer. Specifically, modality-invariant features are first extracted from the multimodal data to bridge the heterogeneity gap within each domain. Subsequently, domain-invariant features are further learned from these modality-invariant features to align the feature distributions across the source and target domains. A source-specific classifier is employed here, which predicts pseudo-labels for the target data and enables the feature extractor to explore category-related information in source features. Finally, a target-specific classifier is trained using the pseudolabeled target data, where highly reliable pseudolabels are selected based on confidence to improve classification performance. Extensive experiments are performed on the real-world multimodal datasets to demonstrate the superiority of WCMMDA.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1360-1372"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artun Saday;İlker Demirel;Yiğit Yıldırım;Cem Tekin
{"title":"Federated Multiarmed Bandits Under Byzantine Attacks","authors":"Artun Saday;İlker Demirel;Yiğit Yıldırım;Cem Tekin","doi":"10.1109/TAI.2024.3524954","DOIUrl":"https://doi.org/10.1109/TAI.2024.3524954","url":null,"abstract":"Multiarmed bandits (MAB) is a sequential decision-making model in which the learner controls the trade-off between exploration and exploitation to maximize its cumulative reward. Federated multiarmed bandits (FMAB) is an emerging framework where a cohort of learners with heterogeneous local models play an MAB game and communicate their aggregated feedback to a server to learn a globally optimal arm. Two key hurdles in FMAB are communication-efficient learning and resilience to adversarial attacks. To address these issues, we study the FMAB problem in the presence of Byzantine clients who can send false model updates threatening the learning process. We analyze the sample complexity and the regret of <inline-formula><tex-math>$beta$</tex-math></inline-formula>-optimal arm identification. We borrow tools from robust statistics and propose a median-of-means (MoM)-based online algorithm, Fed-MoM-UCB, to cope with Byzantine clients. In particular, we show that if the Byzantine clients constitute less than half of the cohort, the cumulative regret with respect to <inline-formula><tex-math>$beta$</tex-math></inline-formula>-optimal arms is bounded over time with high probability, showcasing both communication efficiency and Byzantine resilience. We analyze the interplay between the algorithm parameters, a discernibility margin, regret, communication cost, and the arms’ suboptimality gaps. We demonstrate Fed-MoM-UCB's effectiveness against the baselines in the presence of Byzantine attacks via experiments.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 6","pages":"1488-1501"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hierarchical Cross-Modal Spatial Fusion Network for Multimodal Emotion Recognition","authors":"Ming Xu;Tuo Shi;Hao Zhang;Zeyi Liu;Xiao He","doi":"10.1109/TAI.2024.3523250","DOIUrl":"https://doi.org/10.1109/TAI.2024.3523250","url":null,"abstract":"Recent advancements in emotion recognition research based on physiological data have been notable. However, existing multimodal methods often overlook the interrelations between various modalities, such as video and electroencephalography (EEG) data, in emotion recognition. In this article, a feature fusion-based hierarchical cross-modal spatial fusion network (HCSFNet) is proposed that effectively integrates EEG and video features. By designing an EEG feature extraction network based on 1-D convolution and a video feature extraction network based on 3-D convolution, corresponding modality features are thoroughly extracted. To promote sufficient interaction between the two modalities, a hierarchical cross-modal coordinated attention module is proposed in this article. Additionally, to enhance the network's perceptual ability for emotion-related features, a multiscale spatial pyramid pooling module is also designed. Meanwhile, a self-distillation method is introduced, which enhances the performance while reducing the number of parameters in the network. The HCSFNet achieved an accuracy of 97.78% on the valence–arousal dimension of the Database for Emotion Analysis using Physiological Signals (DEAP) dataset, and it also obtained an accuracy of 60.59% on the MAHNOB-human-computer interaction (HCI) dataset, reaching the state-of-the-art level.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1429-1438"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Stratified Seed Selection Algorithm for $K$-Means Clustering on Big Data","authors":"Namita Bajpai;Jiaul H. Paik;Sudeshna Sarkar","doi":"10.1109/TAI.2024.3524370","DOIUrl":"https://doi.org/10.1109/TAI.2024.3524370","url":null,"abstract":"In <inline-formula><tex-math>$k$</tex-math></inline-formula>-means clustering, the selection of initial seeds significantly influences the quality of the resulting clusters. Moreover, clustering large-sized data introduces an additional challenge for seed selection. We propose a novel and scalable seed selection approach by jointly modeling the quality and diversity of the potential seeds through a principled probabilistic stochastic point process. To this end, we also propose a novel seed quality estimation approach on large data. Our approach quantifies the quality of a seed by measuring the divergence between the distribution of similarity between the closest neighbors and that of the randomly chosen neighbors from exhaustive stratified batches of samples. Unlike many existing scalable approaches, we do not rely on a small sample of the original data; instead, we use the entire data, thereby minimizing the chance of leaving out information about a potentially high-quality seed. The extensive evaluation on a set of benchmark data shows that it outperforms a number of strong, well-known, and recent algorithms measured by three standard metrics.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1334-1344"},"PeriodicalIF":0.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143892441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Periodic Hamiltonian Neural Networks","authors":"Zi-Yu Khoo;Dawen Wu;Jonathan Sze Choong Low;Stéphane Bressan","doi":"10.1109/TAI.2024.3515934","DOIUrl":"https://doi.org/10.1109/TAI.2024.3515934","url":null,"abstract":"Modeling dynamical systems is a core challenge for science and engineering. Hamiltonian neural networks (HNNs) are state-of-the-art models that regress the vector field of a dynamical system under the learning bias of Hamilton's equations. A recent observation is that embedding biases regarding invariances of the Hamiltonian improve regression performance. One such invariance is the periodicity of the Hamiltonian, which improves extrapolation performance. We propose <italic>periodic HNNs</i> that embed periodicity within HNNs using observational, learning, and inductive biases. An observational bias is embedded by training the HNN on data that reflects the periodicity of the Hamiltonian. A learning bias is embedded through the loss function of the HNN. An inductive bias is embedded by a periodic activation function in the HNN. We evaluate the performance of the proposed models on interpolation and extrapolation problems that either assume knowledge of the periods a priori or learn the periods as parameters. We show that the proposed models can interpolate well but are far more effective than the HNN at extrapolating the Hamiltonian and the vector field for both problems and can even extrapolate the vector field of the chaotic double pendulum Hamiltonian system.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1194-1202"},"PeriodicalIF":0.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143892456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khoi Do;Minh-Duong Nguyen;Nguyen Tien Hoa;Long Tran-Thanh;Nguyen H. Tran;Quoc-Viet Pham
{"title":"Revisiting LARS for Large Batch Training Generalization of Neural Networks","authors":"Khoi Do;Minh-Duong Nguyen;Nguyen Tien Hoa;Long Tran-Thanh;Nguyen H. Tran;Quoc-Viet Pham","doi":"10.1109/TAI.2024.3523252","DOIUrl":"https://doi.org/10.1109/TAI.2024.3523252","url":null,"abstract":"This article investigates large batch training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings. In particular, we first show that a state-of-the-art technique, called LARS with the warm-up, tends to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. To address these issues, we propose time varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later stages. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2% improvement in classification scenarios. In all self-supervised learning cases, TVLARS achieves up to 10% performance improvement. Our implementation is available at <uri>https://github.com/KhoiDOO/tvlars</uri>.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1321-1333"},"PeriodicalIF":0.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10817779","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143892501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}