{"title":"Broad Siamese Network for Facial Beauty Prediction","authors":"Yikai Li;Tong Zhang;C. L. Philip Chen","doi":"10.1109/TAI.2024.3429293","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429293","url":null,"abstract":"Facial beauty prediction (FBP) aims to automatically predict beauty scores of facial images according to human perception. Usually, facial images contain lots of information irrelevant to facial beauty, such as information about pose, emotion, and illumination, which interferes with the prediction of facial beauty. To overcome interferences, we develop a broad Siamese network (BSN) to focus more on the task of beauty prediction. Specifically, BSN consists mainly of three components: a multitask Siamese network (MTSN), a multilayer attention (MLA) module, and a broad representation learning (BRL) module. First, MTSN is proposed with different tasks about facial beauty to fully mine knowledge about attractiveness and guide the network to neglect interference information. In the subnetwork of MTSN, the MLA module is proposed to focus more on salient features about facial beauty and reduce the impact of interference information. Then, the BRL module based on broad learning system (BLS) is developed to learn discriminative features with the guidance of beauty scores. It further releases facial features from the impact of interference information. Comparisons with state-of-the-art methods demonstrate the effectiveness of BSN.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5786-5800"},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CycleGAN*: Collaborative AI Learning With Improved Adversarial Neural Networks for Multimodalities Data","authors":"Yibo He;Kah Phooi Seng;Li Minn Ang","doi":"10.1109/TAI.2024.3432856","DOIUrl":"https://doi.org/10.1109/TAI.2024.3432856","url":null,"abstract":"With the widespread adoption of generative adversarial networks (GANs) for sample generation, this article aims to enhance adversarial neural networks to facilitate collaborative artificial intelligence (AI) learning which has been specifically tailored to handle datasets containing multimodalities. Currently, a significant portion of the literature is dedicated to sample generation using GANs, with the objective of enhancing the detection performance of machine learning (ML) classifiers through the incorporation of these generated data into the original training set via adversarial training. The quality of the generated adversarial samples is contingent upon the sufficiency of training data samples. However, in the multimodal domain, the scarcity of multimodal data poses a challenge due to resource constraints. In this article, we address this challenge by proposing a new multimodal dataset generation approach based on the classical audio–visual speech recognition (AVSR) task, utilizing CycleGAN, DiscoGAN, and StyleGAN2 for exploration and performance comparison. AVSR experiments are conducted using the LRS2 and LRS3 corpora. Our experiments reveal that CycleGAN, DiscoGAN, and StyleGAN2 do not effectively address the low-data state problem in AVSR classification. Consequently, we introduce an enhanced model, CycleGAN*, based on the original CycleGAN, which efficiently learns the original dataset features and generates high-quality multimodal data. Experimental results demonstrate that the multimodal datasets generated by our proposed CycleGAN* exhibit significant improvement in word error rate (WER), indicating reduced errors. Notably, the images produced by CycleGAN* exhibit a marked enhancement in overall visual clarity, indicative of its superior generative capabilities. Furthermore, in contrast to traditional approaches, we underscore the significance of collaborative learning. We implement co-training with diverse multimodal data to facilitate information sharing and complementary learning across modalities. This collaborative approach enhances the model’s capability to integrate heterogeneous information, thereby boosting its performance in multimodal environments.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5616-5629"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cooperative Advantage Actor–Critic Reinforcement Learning for Multiagent Pursuit-Evasion Games on Communication Graphs","authors":"Yizhen Meng;Chun Liu;Qiang Wang;Longyu Tan","doi":"10.1109/TAI.2024.3432511","DOIUrl":"https://doi.org/10.1109/TAI.2024.3432511","url":null,"abstract":"This article investigates the distributed optimal strategy problem in multiagent pursuit-evasion (MPE) games, striving for Nash equilibrium through the optimization of individual benefit matrices based on observations. To this end, a novel collaborative control scheme for MPE games using communication graphs is proposed. This scheme employs cooperative advantage actor–critic (A2C) reinforcement learning to facilitate collaborative capture by pursuers in a distributed manner while maintaining bounded system signals. The strategy orchestrates the actions of pursuers through adaptive neural network learning, ensuring proximity-based collaboration for effective captures. Meanwhile, evaders aim to evade collectively by converging toward each other. Through extensive simulations involving five pursuers and two evaders, the efficacy of the proposed approach is demonstrated, and pursuers seamlessly organize into pursuit units and capture evaders, validating the collaborative capture objective. This article represents a promising step toward effective and cooperative control strategies in MPE game scenarios.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6509-6523"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Correlated Sequential Rules","authors":"Lili Chen;Wensheng Gan;Chien-Ming Chen","doi":"10.1109/TAI.2024.3429306","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429306","url":null,"abstract":"The goal of high-utility sequential pattern mining (HUSPM) is to efficiently discover profitable or useful sequential patterns in a large number of sequences. However, simply being aware of utility-eligible patterns is insufficient for making predictions. To compensate for this deficiency, high-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns based on the appearance of premise sequential patterns. It has numerous applications, such as product recommendation and weather prediction. However, the existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules. To address this issue, we propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM. The proposed algorithm requires not only that each rule be correlated but also that the patterns in the antecedent and consequent of the high-utility sequential rule be correlated. The algorithm adopts a utility-list structure to avoid multiple database scans. Additionally, several pruning strategies are used to improve the algorithm's efficiency and performance. Based on several real-world datasets, subsequent experiments demonstrated that CoUSR is effective and efficient in terms of operation time and memory consumption. All codes are accessible on GitHub: \u0000<uri>https://github.com/DSI-Lab1/CoUSR</uri>\u0000.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5340-5351"},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Stage Representation Refinement Based on Convex Combination for 3-D Human Poses Estimation","authors":"Luefeng Chen;Wei Cao;Biao Zheng;Min Wu;Witold Pedrycz;Kaoru Hirota","doi":"10.1109/TAI.2024.3432028","DOIUrl":"https://doi.org/10.1109/TAI.2024.3432028","url":null,"abstract":"In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6500-6508"},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Security Breach by Evolutionary Universal Perturbation Attack (EUPA)","authors":"Neeraj Gupta;Mahdi Khosravy;Antoine Pasquali;Olaf Witkowski","doi":"10.1109/TAI.2024.3429473","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429473","url":null,"abstract":"The potential for sabotaging deep convolutions neural networks classifiers by universal perturbation attack (UPA) has proved itself as an effective threat to fool deep learning models in sensitive applications such as autonomous vehicles, clinical diagnosis, face recognition, and so on. The prospective application of UPA is for adversarial training of deep convolutional networks against the attacks. Although evolutionary algorithms have already shown their tremendous ability in solving nonconvex complex problems, the literature has limited exploration of evolutionary techniques and strategies for UPA, thus, it needs to be explored on evolutionary algorithms to minimize the magnitude and number of perturbation pixels while maximizing the misclassification of maximum data samples. In this research. This work focuses on utilizing an integer coded genetic algorithm within an evolutionary framework to evolve the UPA. The evolutionary UPA has been structured, analyzed, and compared for two evolutionary optimization structures: 1) constrained single-objective evolutionary UPA; and 2) Pareto double-objective evolutionary UPA. The efficiency of the methodology is analyzed on GoogleNet convolution neural network for its effectiveness on the Imagenet dataset. The results show that under the same experimental conditions, the constrained single objective technique outperforms the Pareto double objective one, and manages a successful breach on a deep network wherein the average detection score falls to \u0000<inline-formula><tex-math>$0.446429$</tex-math></inline-formula>\u0000. It is observed that besides the minimization of the detection rate score, the constraint of invisibility of noise is much more effective rather than having a conflicting objective of noise power minimization.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5655-5665"},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Spatial-Temporal Masked Contrast for Skeleton Action Recognition","authors":"Wenming Cao;Aoyu Zhang;Zhihai He;Yicha Zhang;Xinpeng Yin","doi":"10.1109/TAI.2024.3430260","DOIUrl":"https://doi.org/10.1109/TAI.2024.3430260","url":null,"abstract":"In the field of 3-D action recognition, self-supervised learning has shown promising results but remains a challenging task. Previous approaches to motion modeling often relied on selecting features solely from the temporal or spatial domain, which limited the extraction of higher-level semantic information. Additionally, traditional one-to-one approaches in multilevel comparative learning overlooked the relationships between different levels, hindering the learning representation of the model. To address these issues, we propose the hierarchical spatial-temporal masked network (HSTM) for learning 3-D action representations. HSTM introduces a novel masking method that operates simultaneously in both the temporal and spatial dimensions. This approach leverages semantic relevance to identify meaningful regions in time and space, guiding the masking process based on semantic richness. This guidance is crucial for learning useful feature representations effectively. Furthermore, to enhance the learning of potential features, we introduce cross-level distillation (CLD) to extend the comparative learning approach. By training the model with two types of losses simultaneously, each level of the multilevel comparative learning process can be guided by levels rich in semantic information. This allows for more effective supervision of comparative learning, leading to improved performance. Extensive experiments conducted on the NTU-60, NTU-120, and PKU-MMD datasets demonstrate the effectiveness of our proposed framework. The learned action representations exhibit strong transferability and achieve state-of-the-art results.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5801-5814"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emadeldeen Eldele;Mohamed Ragab;Zhenghua Chen;Min Wu;Chee-Keong Kwoh;Xiaoli Li
{"title":"Label-Efficient Time Series Representation Learning: A Review","authors":"Emadeldeen Eldele;Mohamed Ragab;Zhenghua Chen;Min Wu;Chee-Keong Kwoh;Xiaoli Li","doi":"10.1109/TAI.2024.3430236","DOIUrl":"https://doi.org/10.1109/TAI.2024.3430236","url":null,"abstract":"Label-efficient time series representation learning, which aims to learn effective representations with limited labeled data, is crucial for deploying deep learning models in real-world applications. To address the scarcity of labeled time series data, various strategies, e.g., transfer learning, self-supervised learning, and semisupervised learning, have been developed. In this survey, we introduce a novel taxonomy for the first time, categorizing existing approaches as in-domain or cross domain based on their reliance on external data sources or not. Furthermore, we present a review of the recent advances in each strategy, conclude the limitations of current methodologies, and suggest future research directions that promise further improvements in the field.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6027-6042"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study of Enhancing Federated Learning on Non-IID Data With Server Learning","authors":"Van Sy Mai;Richard J. La;Tao Zhang","doi":"10.1109/TAI.2024.3430250","DOIUrl":"10.1109/TAI.2024.3430250","url":null,"abstract":"Federated learning (FL) has emerged as a means of distributed learning using local data stored at clients with a coordinating server. Recent studies showed that FL can suffer from poor performance and slower convergence when training data at the clients are not independent and identically distributed (IID). Here, we consider auxiliary server learning (SL) as a \u0000<italic>complementary</i>\u0000 approach to improving the performance of FL on non-IID data. Our analysis and experiments show that this approach can achieve significant improvements in both model accuracy and convergence time even when the dataset utilized by the server is small and its distribution differs from that of the clients’ aggregate data. Moreover, experimental results suggest that auxiliary SL delivers benefits when employed together with other techniques proposed to mitigate the performance degradation of FL on non-IID data.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5589-5604"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying-Lin Chen;Sara Sacchi;Bappaditya Dey;Victor Blanco;Sandip Halder;Philippe Leray;Stefan De Gendt
{"title":"Exploring Machine Learning for Semiconductor Process Optimization: A Systematic Review","authors":"Ying-Lin Chen;Sara Sacchi;Bappaditya Dey;Victor Blanco;Sandip Halder;Philippe Leray;Stefan De Gendt","doi":"10.1109/TAI.2024.3429479","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429479","url":null,"abstract":"As machine learning (ML) continues to find applications, extensive research is currently underway across various domains. This study examines the current methodologies of ML being investigated to optimize semiconductor manufacturing processes. Our research involved searching the SPIE Digital Library, IEEE Xplore, and ArXiv databases, identifying 58 publications in the field of ML-based semiconductor process optimization. These investigations employ ML techniques such as feature extraction, feature selection, and neural network architecture are analyzed using different algorithms. These models find applications in advanced process control, virtual metrology, and quality control, critical aspects in semiconductor manufacturing for enhancing throughput and reducing production costs. We categorize the articles based on the methods and applications employed, summarizing the primary findings. Furthermore, we discuss the general conclusion of several studies. Overall, the reviewed literature suggests that ML-based semiconductor manufacturing is rapidly gaining popularity and advancing at a swift pace.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"5969-5989"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}