{"title":"Artificial Intelligence Driven Predictive Analysis of Acoustic and Linguistic Behaviors for ASD Identification","authors":"Ashwini B.;Deeptanshu;Sheffali Gulati;Jainendra Shukla","doi":"10.1109/TAI.2024.3439288","DOIUrl":"https://doi.org/10.1109/TAI.2024.3439288","url":null,"abstract":"The identification of autism spectrum disorder (ASD) faces challenges due to the lack of reliable biomarkers and the subjectivity in diagnostic procedures, necessitating improved tools for objectivity and efficiency. Being a key characteristic of autism, language impairments are regarded as potential markers for identifying ASD. However, current research predominantly focuses on analyzing language characteristics in English, overlooking linguistic and contextual specificities in other resource-constrained languages. Motivated by these, we developed an artificial intelligence (AI)-based system to detect ASD, utilizing a range of acoustic and linguistic features extracted from dyadic conversations between a child and their communication partner. Validating our model on 76 English-speaking children [35 ASD and 41 typically developing (TD)] and 33 Hindi-speaking children (15 ASD and 18 TD), our extensive analysis of a diverse and comprehensive set of acoustic and linguistic speech attributes, including lexical, syntactic, semantic, and pragmatic elements revealed reliable speech attributes as predictors of ASD. This comprehensive analysis achieved a remarkable macro F1-score of approximately \u0000<inline-formula><tex-math>$boldsymbol{sim}$</tex-math></inline-formula>\u000091.30%. We further addressed the influence of linguistic diversity on speech-based ASD assessment by examining speech behaviors in both English and the low-resource language, Hindi. Specific features such as adverbs and distinct roots contributed significantly to ASD classification in English, while the proportion of unintelligible utterances and adposition use held greater importance in Hindi. This study underscores the reliability of speech-based biomarkers in ASD assessment, emphasizing their effectiveness across diverse linguistic backgrounds and highlighting the need for language-specific research in this domain.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5709-5719"},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tailor-Made Reinforcement Learning Approach With Advanced Noise Optimization for Soft Continuum Robots","authors":"Jino Jayan;Lal Priya P.S.;Hari Kumar R.","doi":"10.1109/TAI.2024.3440225","DOIUrl":"https://doi.org/10.1109/TAI.2024.3440225","url":null,"abstract":"Advancements in the fusion of reinforcement learning (RL) and soft robotics are presented in this study, with a focus on refining training methodologies for soft planar continuum robots (SPCRs). The proposed modifications to the twin-delayed deep deterministic (TD3) policy gradient algorithm introduce the innovative dynamic harmonic noise (DHN) to enhance exploration adaptability. Additionally, a tailored adaptive task achievement reward (ATAR) is introduced to balance goal achievement, time efficiency, and trajectory smoothness, thereby improving precision in SPCR navigation. Evaluation metrics, including mean squared distance (MSD), mean error (ME), and mean episodic reward (MER), demonstrate robust generalization capabilities. Significant improvements in average reward, success rate, and convergence speed for the proposed modified TD3 algorithm over traditional TD3 are highlighted in the comparative analysis. Specifically, a 45.17% increase in success rate and a 4.92% increase in convergence speed over TD3 are demonstrated by the proposed TD3. Beyond insights into RL and soft robotics, potential applicability of RL in diverse scenarios is underscored, laying the foundation for future breakthroughs in real-world applications.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5509-5518"},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CrackLens: Automated Sidewalk Crack Detection and Segmentation","authors":"Chan Young Koh;Mohamed Ali;Abdeltawab Hendawi","doi":"10.1109/TAI.2024.3435608","DOIUrl":"https://doi.org/10.1109/TAI.2024.3435608","url":null,"abstract":"Automatic sidewalk crack detection is necessary for urban infrastructure maintenance to ensure pedestrian safety. Such a task becomes complex on overgrown sidewalks, where crack detection usually misjudges vegetation as cracks. A lack of automated crack detection targets overgrown sidewalk problems; most crack detection focuses on vehicular roadway cracks that are recognizable even at the aerial photography level. Hence, this article introduces CrackLens, an automated sidewalk crack detection framework capable of detecting cracks even on overgrown sidewalks. We include several contributions as follows. First, we designed an automatic data parser using a red, green, and blue (RGB)-depth fusion sidewalk dataset we collected. The RGB and depth information are combined to create depth-embedded matrices, which are used to prelabel and separate the collected dataset into two categories (with and without crack). Second, we created an automatic annotation process using image processing methods and tailored the tool only to annotate cracks on overgrown sidewalks. This process is followed by a binary classification for verification, allowing the tool to target overgrown problems on sidewalks. Lastly, we explored the robustness of our framework by experimenting with it using 8,000 real sidewalk images with some overgrown problems. The evaluation leveraged several transformer-based neural network models. Our framework achieves substantial crack detection and segmentation in overgrown sidewalks by addressing the challenges of limited data and subjective manual annotations.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5418-5430"},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suman Adhya;Avishek Lahiri;Debarshi Kumar Sanyal;Partha Pratim Das
{"title":"Evaluating Negative Sampling Approaches for Neural Topic Models","authors":"Suman Adhya;Avishek Lahiri;Debarshi Kumar Sanyal;Partha Pratim Das","doi":"10.1109/TAI.2024.3432857","DOIUrl":"https://doi.org/10.1109/TAI.2024.3432857","url":null,"abstract":"Negative sampling has emerged as an effective technique that enables deep learning models to learn better representations by introducing the paradigm of “learn-to-compare.” The goal of this approach is to add robustness to deep learning models to learn better representation by comparing the positive samples against the negative ones. Despite its numerous demonstrations in various areas of computer vision and natural language processing, a comprehensive study of the effect of negative sampling in an unsupervised domain such as topic modeling has not been well explored. In this article, we present a comprehensive analysis of the impact of different negative sampling strategies on neural topic models. We compare the performance of several popular neural topic models by incorporating a negative sampling technique in the decoder of variational autoencoder-based neural topic models. Experiments on four publicly available datasets demonstrate that integrating negative sampling into topic models results in significant enhancements across multiple aspects, including improved topic coherence, richer topic diversity, and more accurate document classification. Manual evaluations also indicate that the inclusion of negative sampling into neural topic models enhances the quality of the generated topics. These findings highlight the potential of negative sampling as a valuable tool for advancing the effectiveness of neural topic models.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5630-5642"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Communication-Efficient Federated Learning for Decision Trees","authors":"Shuo Zhao;Zikun Zhu;Xin Li;Ying-Chi Chen","doi":"10.1109/TAI.2024.3433419","DOIUrl":"https://doi.org/10.1109/TAI.2024.3433419","url":null,"abstract":"The increasing concerns about data privacy and security have driven the emergence of federated learning, which preserves privacy by collaborative learning across multiple clients without sharing their raw data. In this article, we propose a communication-efficient federated learning algorithm for decision trees (DTs), referred to as FL-DT. The key idea is to exchange the statistics of a small number of features among the server and all clients, enabling identification of the optimal feature to split each DT node without compromising privacy. To efficiently find the splitting feature based on the partially available information at each DT node, a novel formulation is derived to estimate the lower and upper bounds of Gini indexes of all features by solving a sequence of mixed-integer convex programming problems. Our experimental results based on various public datasets demonstrate that FL-DT can reduce the communication overhead substantially without surrendering any classification accuracy, compared to other conventional methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5478-5492"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ghufran Ahmad Khan;Jalaluddin Khan;Taushif Anwar;Zubair Ashraf;Mohammad Hafeez Javed;Bassoma Diallo
{"title":"Weighted Concept Factorization Based Incomplete Multi-view Clustering","authors":"Ghufran Ahmad Khan;Jalaluddin Khan;Taushif Anwar;Zubair Ashraf;Mohammad Hafeez Javed;Bassoma Diallo","doi":"10.1109/TAI.2024.3433379","DOIUrl":"https://doi.org/10.1109/TAI.2024.3433379","url":null,"abstract":"The primary objective of classical multiview clustering (MVC) is to categorize data into separate clusters under the assumption that all perspectives are completely available. However, in practical situations, it is common to encounter cases where not all viewpoints of the data are accessible. This limitation can impede the effectiveness of traditional MVC methods. The incompleteness of the clustering of multiview data has witnessed substantial progress in recent years due to its promising applications. In response to the aforementioned issue, we have tackled it by introducing an inventive MVC algorithm that is tailored to handle incomplete data from various views. Additionally, we have proposed a distinct objective function that leverages a weighted concept factorization technique to address the absence of data instances within each incomplete perspective. To address inconsistencies between different views, we introduced a coregularization factor, which operates in conjunction with a shared consensus matrix. It is important to highlight that the proposed objective function is intrinsically nonconvex, presenting challenges in terms of optimization. To secure the optimal solution for this objective function, we have implemented an iterative optimization approach to reach the local minima for our method. To underscore the efficacy and validation of our approach, we experimented with real-world datasets and used state-of-the-art methods to perform comparative assessments.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5699-5708"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Broad Siamese Network for Facial Beauty Prediction","authors":"Yikai Li;Tong Zhang;C. L. Philip Chen","doi":"10.1109/TAI.2024.3429293","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429293","url":null,"abstract":"Facial beauty prediction (FBP) aims to automatically predict beauty scores of facial images according to human perception. Usually, facial images contain lots of information irrelevant to facial beauty, such as information about pose, emotion, and illumination, which interferes with the prediction of facial beauty. To overcome interferences, we develop a broad Siamese network (BSN) to focus more on the task of beauty prediction. Specifically, BSN consists mainly of three components: a multitask Siamese network (MTSN), a multilayer attention (MLA) module, and a broad representation learning (BRL) module. First, MTSN is proposed with different tasks about facial beauty to fully mine knowledge about attractiveness and guide the network to neglect interference information. In the subnetwork of MTSN, the MLA module is proposed to focus more on salient features about facial beauty and reduce the impact of interference information. Then, the BRL module based on broad learning system (BLS) is developed to learn discriminative features with the guidance of beauty scores. It further releases facial features from the impact of interference information. Comparisons with state-of-the-art methods demonstrate the effectiveness of BSN.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5786-5800"},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CycleGAN*: Collaborative AI Learning With Improved Adversarial Neural Networks for Multimodalities Data","authors":"Yibo He;Kah Phooi Seng;Li Minn Ang","doi":"10.1109/TAI.2024.3432856","DOIUrl":"https://doi.org/10.1109/TAI.2024.3432856","url":null,"abstract":"With the widespread adoption of generative adversarial networks (GANs) for sample generation, this article aims to enhance adversarial neural networks to facilitate collaborative artificial intelligence (AI) learning which has been specifically tailored to handle datasets containing multimodalities. Currently, a significant portion of the literature is dedicated to sample generation using GANs, with the objective of enhancing the detection performance of machine learning (ML) classifiers through the incorporation of these generated data into the original training set via adversarial training. The quality of the generated adversarial samples is contingent upon the sufficiency of training data samples. However, in the multimodal domain, the scarcity of multimodal data poses a challenge due to resource constraints. In this article, we address this challenge by proposing a new multimodal dataset generation approach based on the classical audio–visual speech recognition (AVSR) task, utilizing CycleGAN, DiscoGAN, and StyleGAN2 for exploration and performance comparison. AVSR experiments are conducted using the LRS2 and LRS3 corpora. Our experiments reveal that CycleGAN, DiscoGAN, and StyleGAN2 do not effectively address the low-data state problem in AVSR classification. Consequently, we introduce an enhanced model, CycleGAN*, based on the original CycleGAN, which efficiently learns the original dataset features and generates high-quality multimodal data. Experimental results demonstrate that the multimodal datasets generated by our proposed CycleGAN* exhibit significant improvement in word error rate (WER), indicating reduced errors. Notably, the images produced by CycleGAN* exhibit a marked enhancement in overall visual clarity, indicative of its superior generative capabilities. Furthermore, in contrast to traditional approaches, we underscore the significance of collaborative learning. We implement co-training with diverse multimodal data to facilitate information sharing and complementary learning across modalities. This collaborative approach enhances the model’s capability to integrate heterogeneous information, thereby boosting its performance in multimodal environments.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5616-5629"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Correlated Sequential Rules","authors":"Lili Chen;Wensheng Gan;Chien-Ming Chen","doi":"10.1109/TAI.2024.3429306","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429306","url":null,"abstract":"The goal of high-utility sequential pattern mining (HUSPM) is to efficiently discover profitable or useful sequential patterns in a large number of sequences. However, simply being aware of utility-eligible patterns is insufficient for making predictions. To compensate for this deficiency, high-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns based on the appearance of premise sequential patterns. It has numerous applications, such as product recommendation and weather prediction. However, the existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules. To address this issue, we propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM. The proposed algorithm requires not only that each rule be correlated but also that the patterns in the antecedent and consequent of the high-utility sequential rule be correlated. The algorithm adopts a utility-list structure to avoid multiple database scans. Additionally, several pruning strategies are used to improve the algorithm's efficiency and performance. Based on several real-world datasets, subsequent experiments demonstrated that CoUSR is effective and efficient in terms of operation time and memory consumption. All codes are accessible on GitHub: \u0000<uri>https://github.com/DSI-Lab1/CoUSR</uri>\u0000.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5340-5351"},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Security Breach by Evolutionary Universal Perturbation Attack (EUPA)","authors":"Neeraj Gupta;Mahdi Khosravy;Antoine Pasquali;Olaf Witkowski","doi":"10.1109/TAI.2024.3429473","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429473","url":null,"abstract":"The potential for sabotaging deep convolutions neural networks classifiers by universal perturbation attack (UPA) has proved itself as an effective threat to fool deep learning models in sensitive applications such as autonomous vehicles, clinical diagnosis, face recognition, and so on. The prospective application of UPA is for adversarial training of deep convolutional networks against the attacks. Although evolutionary algorithms have already shown their tremendous ability in solving nonconvex complex problems, the literature has limited exploration of evolutionary techniques and strategies for UPA, thus, it needs to be explored on evolutionary algorithms to minimize the magnitude and number of perturbation pixels while maximizing the misclassification of maximum data samples. In this research. This work focuses on utilizing an integer coded genetic algorithm within an evolutionary framework to evolve the UPA. The evolutionary UPA has been structured, analyzed, and compared for two evolutionary optimization structures: 1) constrained single-objective evolutionary UPA; and 2) Pareto double-objective evolutionary UPA. The efficiency of the methodology is analyzed on GoogleNet convolution neural network for its effectiveness on the Imagenet dataset. The results show that under the same experimental conditions, the constrained single objective technique outperforms the Pareto double objective one, and manages a successful breach on a deep network wherein the average detection score falls to \u0000<inline-formula><tex-math>$0.446429$</tex-math></inline-formula>\u0000. It is observed that besides the minimization of the detection rate score, the constraint of invisibility of noise is much more effective rather than having a conflicting objective of noise power minimization.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5655-5665"},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}