Briti Gangopadhyay;Zhao Wang;Jia-Fong Yeh;Shingo Takamatsu
{"title":"Enhancing Generalization of Offline RL in Data-Limited Settings With Heuristic Rules","authors":"Briti Gangopadhyay;Zhao Wang;Jia-Fong Yeh;Shingo Takamatsu","doi":"10.1109/TAI.2025.3544971","DOIUrl":"https://doi.org/10.1109/TAI.2025.3544971","url":null,"abstract":"With the ability to learn from static datasets, OFFLINE reinforcement learning (RL) emerges as a compelling avenue for real-world applications. However, state-of-the-art offline RL algorithms perform suboptimally when confronted with limited data confined to specific regions within the state space. Performance degradation is attributed to the inability of offline RL algorithms to learn appropriate actions for rare or unseen observations. This article proposes a heuristic rule-based regularization technique and adaptively refines the initial knowledge from heuristics to considerably boost performance in limited data with partially omitted states. The key insight is that the regularization term mitigates erroneous actions for sparse samples and unobserved states covered by domain knowledge. Empirical evaluations on standard offline RL datasets demonstrate a substantial average performance increase compared to ensemble of domain knowledge and existing offline RL algorithms operating on limited data.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2291-2301"},"PeriodicalIF":0.0,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensuring Fairness in Spectral Clustering via Disparate Impact-Based Graph Construction","authors":"Adithya K. Moorthy;V. Vijaya Saradhi;Bhanu Prasad","doi":"10.1109/TAI.2025.3545800","DOIUrl":"https://doi.org/10.1109/TAI.2025.3545800","url":null,"abstract":"Spectral clustering algorithms rely on graphs where edges are defined based on the similarity between the vertices (data points). The effectiveness and fairness of spectral clustering depend significantly on how the graph is constructed. While automated graph construction methods, which learn graphs from real-valued vector datasets, have demonstrated strong performance in the quality of clustering, fairness concerns still remain. In this work, we introduce a graph construction method that incorporates a new fairness definition—Edge Disparate Impact—into the edge relationships, aiming to produce a fair graph. This approach modifies the optimization process of automated graph construction to account for fairness, resulting in a more equitable graph. Extensive experiments were conducted to compare our method with the latest graph construction techniques and fair spectral clustering algorithms. The results prove that, by using a fair graph for spectral clustering, fairness is improved in the resulting clusters. We also demonstrate that our method outperforms baseline approaches in both fairness and the quality of clustering.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2342-2352"},"PeriodicalIF":0.0,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pruning Networks Only Using Few-Shot Pretraining Based on Gradient Similarity Frequency","authors":"Haigen Hu;Huihuang Zhang;Qianwei Zhou;Tieming Chen","doi":"10.1109/TAI.2025.3544582","DOIUrl":"https://doi.org/10.1109/TAI.2025.3544582","url":null,"abstract":"Neural network pruning is a popular and promising approach aiming at reducing heavy networks to lightweight ones by removing redundancies. Most existing methods adopt a three-stage pipeline, including pretraining, pruning, and fine-tuning. However, it is time-consuming to train a large and redundant network in the pretraining process. In this work, we propose a new minimal pretraining pruning method, gradient similarity frequency-based pruning (GSFP), which prunes a given network only using few-shot pretraining before training. Instead of pretraining a fully trained over-parameterized model, our method only uses one epoch to obtain the ranked list of convolution filters to be pruned according to their gradient similarity frequency and determines the redundant convolution filters that should be removed. Then, the obtained sparse network is trained in the standard way without the need to fine-tune the inherited weights from the full model. Finally, a series of experiments are conducted to verify the effectiveness of CIFAR10/100 and ImageNet. The results show that our method can achieve remarkable results on some popular networks, such as VGG, ResNet, and DenseNet. Importantly, the proposed pruning approach never requires pretraining the over-parameterized model, thus offering a promising prospect of application and spreading for limited computational resources.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2253-2265"},"PeriodicalIF":0.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LaBINet—An Approach for Seamlessly Integrating New Advertisement Into an Existing Scene","authors":"Sukriti Dhang;Mimi Zhang;Soumyabrata Dev","doi":"10.1109/TAI.2025.3544595","DOIUrl":"https://doi.org/10.1109/TAI.2025.3544595","url":null,"abstract":"Billboards in multimedia images are critical for capturing wide audiences through advertising. Currently, no open-source platform exists for automated billboard integration, which impacts industries such as filmmaking, advertising, and sports broadcasting. Effective detection and seamless integration of new advertisements into existing frames are essential for this process. This article introduces LaBINet, a technique that leverages advanced deep learning methodologies to localize existing advertisements and utilizes image registration techniques for seamless integration of new ads. The process begins with generating a probabilistic map using AdSegNet to obtain transformed coordinates. Next, seamless integration is performed using the Poisson equation combined with Laplace matrices. To address the challenge of evaluating image quality in the absence of a reference image, we propose an evaluation method that correlates and statistically verifies subjective and objective scores. Experimental results demonstrate that our method outperforms existing techniques in integrating billboards under various lighting conditions, achieving strong subjective preference scores (76–95%) and low distortion scores (median values ranging from 21.817 to 22.529), indicating superior image quality.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2281-2290"},"PeriodicalIF":0.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SecureLLAMA: Secure FPGAs Using LLAMA Large Language Models","authors":"Mansour Alqarni;Akramul Azim","doi":"10.1109/TAI.2025.3544590","DOIUrl":"https://doi.org/10.1109/TAI.2025.3544590","url":null,"abstract":"Field-programmable gate arrays (FPGAs) are increasingly utilized in critical applications across sectors such as infrastructure, defense, and autonomous systems. However, the inherent flexibility of FPGAs introduces significant security vulnerabilities, particularly in the hardware description languages (HDLs) used to program them. This article introduces SecureLLAMA, an enhanced version of the LLAMA2 model, specifically designed to detect and mitigate FPGA vulnerabilities. Leveraging a novel dataset “FPGAvul” which includes both real-world examples and synthetically generated vulnerabilities. Our dataset FPGAvul addresses vulnerabilities such as initialization errors, clock domain crossing issues, insecure state machines, resource sharing conflicts, and buffer overflows. SecureLLAMA demonstrates superior accuracy in identifying and addressing security flaws in FPGA configurations. Comprehensive evaluation shows that SecureLLAMA significantly improves the detection of vulnerabilities, providing a robust solution for securing FPGAs in embedded systems. The findings of this research have the potential to advance FPGA security practices, ensuring their safe integration in critical environments where reliability is essential.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2266-2280"},"PeriodicalIF":0.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Wali Ur Rahman;Ric Nevarez;Lamia Tasnim Mim;Salim Hariri
{"title":"Multiagent Actor-Critic Generative AI for Query Resolution and Analysis","authors":"Mohammad Wali Ur Rahman;Ric Nevarez;Lamia Tasnim Mim;Salim Hariri","doi":"10.1109/TAI.2025.3544173","DOIUrl":"https://doi.org/10.1109/TAI.2025.3544173","url":null,"abstract":"In this article, we introduce multiagent strategic query resolution and diagnostic tool (MASQRAD), a transformative framework for query resolution based on the actor-critic model, which utilizes multiple generative AI agents. MASQRAD is excellent at translating imprecise or ambiguous user inquiries into precise and actionable requests. This framework generates pertinent visualizations and responses to these focused queries, as well as thorough analyses and insightful interpretations for users. MASQRAD addresses the common shortcomings of existing solutions in domains that demand fast and precise data interpretation, such as their incapacity to successfully apply AI for generating actionable insights and their challenges with the inherent ambiguity of user queries. MASQRAD functions as a sophisticated multiagent system but “masquerades” to users as a single AI entity, which lowers errors and enhances data interaction. This approach makes use of three primary AI agents: Actor Generative AI, Critic Generative AI, and Expert Analysis Generative AI. Each is crucial for creating, enhancing, and evaluating data interactions. The Actor AI generates Python scripts to generate data visualizations from large datasets within operational constraints, and the Critic AI rigorously refines these scripts through multiagent debate. Finally, the Expert Analysis AI contextualizes the outcomes to aid in decision-making. With an accuracy rate of 87% when handling tasks related to natural language visualization, MASQRAD establishes new benchmarks for automated data interpretation and showcases a noteworthy advancement that has the potential to revolutionize AI-driven applications.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2226-2240"},"PeriodicalIF":0.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Multiparticle Swarm Neural Architecture Search for High-Incidence Cancer Prediction","authors":"Liming Xu;Jie Zheng;Chunlin He;Jing Wang;Bochuan Zheng;Jiancheng Lv","doi":"10.1109/TAI.2025.3543822","DOIUrl":"https://doi.org/10.1109/TAI.2025.3543822","url":null,"abstract":"Cancer is a disease caused by uncontrolled growth and spread of cells, and early diagnosis is essential to improve the cure rate and reduce mortality. Although machine learning and deep learning have shown great potential in early cancer prediction, the accuracy of detection and prediction still needs to be improved due to the different scales of lesion areas. Therefore, we propose an adaptive multiparticle swarm neural architecture search method to automatically explore an efficient deep neural network architecture for high-incidence cancer prediction. First, the multiparticle swarm strategy is used to initialize the high-quality architecture in the scale adaptive search space to enhance multiscale perception. Then, the improved weighted average method is combined with classification accuracy, parameters, and floating-point operations to adaptively update the particle swarm architecture to avoid falling into local optimum. In addition, a method based on weight sharing is used to improve the efficiency of architecture search. The experimental results show that comparing with the manual design network and the existing neural architecture search method, the proposed algorithm achieves average increments of 26.33%, 33.99%, 8.98%, 37.41%, 35.1%, and 51.76% in classification accuracy, F1-Score, Cohen's kappa, AUC, exponential balance accuracy and search efficiency, respectively.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2203-2214"},"PeriodicalIF":0.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ao Li;Minchao Wu;Rui Ouyang;Yongming Wang;Fan Li;Zhao Lv
{"title":"A Multimodal-Driven Fusion Data Augmentation Framework for Emotion Recognition","authors":"Ao Li;Minchao Wu;Rui Ouyang;Yongming Wang;Fan Li;Zhao Lv","doi":"10.1109/TAI.2025.3537965","DOIUrl":"https://doi.org/10.1109/TAI.2025.3537965","url":null,"abstract":"The pursuit of imbuing computers with emotional intelligence has driven extensive research into physiological signal analysis for emotion recognition. Deep learning techniques offer promising avenues for analyzing physiological signals in this domain. Despite numerous studies on emotion recognition using various physiological signals, challenges persist in classifying multimodal physiological signals due to data scarcity. Current research lacks focus on addressing data insufficiency for multimodal physiological signals. This article proposes an innovative method to address this issue and improve the effect of emotion recognition using multimodal physiological signal data. Our model comprises a physiological signal encoder, a multimodal data generator, and a multimodal emotion recognizer. Specifically, we introduce a customized ConvNeXt-attention fusion model (CNXAF) to fuse diverse physiological signals, generating fused multimodal data. The multimodal data generator employs a conditional self-attention generative adversarial network (c-SAGAN) to synthesize additional data across different categories, augmenting original datasets. Finally, the multimodal emotion recognizer utilizes the ConvNeXt-t classifier for emotion recognition on the extended dataset. Through extensive experimentation, our model achieves accuracies of 96.06<inline-formula><tex-math>$%$</tex-math></inline-formula> on the DEAP dataset and 95.70<inline-formula><tex-math>$%$</tex-math></inline-formula> on the WESAD dataset, demonstrating the effectiveness of our approach in accurately recognizing emotions. Experimental results underscore the superior performance of our method compared to existing approaches in multimodal emotion recognition research.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2083-2097"},"PeriodicalIF":0.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Honglin Guo;Weizhi Nie;Ruidong Chen;Lanjun Wang;Guoqing Jin;Anan Liu
{"title":"ContentDM: A Layout Diffusion Model for Content-Aware Layout Generation","authors":"Honglin Guo;Weizhi Nie;Ruidong Chen;Lanjun Wang;Guoqing Jin;Anan Liu","doi":"10.1109/TAI.2025.3544172","DOIUrl":"https://doi.org/10.1109/TAI.2025.3544172","url":null,"abstract":"Content-aware layout generation aims to produce fitting element arrangements based on the background contents, which is used for graphic design applications such as automatic poster layout design. In this article, we propose ContentDM, a layout diffusion model specifically designed for the content-aware layout generation task, overcoming the limitations suffered from existing methods: irrational arrangement among layout elements and lack of refining ability for coarse generated results. ContentDM defines the layout diffusion process through random perturbations applied to both the position and type of layout elements. During the denoising training phase, the content-aware layout generator is trained to reconstruct samples from these perturbed layouts. This process enables the model to learn the correct arrangement patterns within the layout elements, thereby enhancing the rationality of generated layouts. Moreover, we develop an iterative layout inference strategy to enable the layout generator to refine the generated layouts progressively, thereby enhancing the overall quality of the generation results. Extensive experiments demonstrate that ContentDM significantly outperforms existing methods, achieving state-of-the-art performance in content-aware layout generation, both in terms of visual quality and quantitative metrics.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2215-2225"},"PeriodicalIF":0.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facilitating Continuous Facial Aging Through Latent Age Attribute Modulation","authors":"Xiyuan Hu;Jinglei Qu;Chen Chen","doi":"10.1109/TAI.2025.3543811","DOIUrl":"https://doi.org/10.1109/TAI.2025.3543811","url":null,"abstract":"In recent years, facial aging has attracted significant research interest due to its broad applications and potential benefits. While generative adversarial networks (GANs) have achieved notable progress in synthesizing realistic facial images, many GAN-based facial aging methods struggle to accurately capture the continuous progression of age-related changes over time. In this article, we propose an innovative framework featuring the latent age attribute module (LAAM), which maps age attributes to a structured latent space that facilitates efficient sampling for precise age attribute modeling. We further introduce the age-AdaIN fusion module (AFM), which seamlessly integrates age features from LAAM with facial content features, enabling the generation of images that exhibit smooth, continuous age transitions. This framework excels in capturing fine-grained aging details, particularly for elderly individuals. Quantitative and qualitative evaluations on benchmark datasets demonstrate the effectiveness of our approach in generating realistic age-progressed facial images, with a notable improvement in elderly aging accuracy and detail.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2163-2177"},"PeriodicalIF":0.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}