Darshana Subhash , Jyothish Lal G. , Premjith B. , Vinayakumar Ravi
{"title":"A robust accent classification system based on variational mode decomposition","authors":"Darshana Subhash , Jyothish Lal G. , Premjith B. , Vinayakumar Ravi","doi":"10.1016/j.engappai.2024.109512","DOIUrl":"10.1016/j.engappai.2024.109512","url":null,"abstract":"<div><div>State-of-the-art automatic speech recognition models often struggle to capture nuanced features inherent in accented speech, leading to sub-optimal performance in speaker recognition based on regional accents. Despite substantial progress in the field of automatic speech recognition, ensuring robustness to accents and generalization across dialects remains a persistent challenge, particularly in real-time settings. In response, this study introduces a novel approach leveraging Variational Mode Decomposition (VMD) to enhance accented speech signals, aiming to mitigate noise interference and improve generalization on unseen accented speech datasets. Our method employs decomposed modes of the VMD algorithm for signal reconstruction, followed by feature extraction using Mel-Frequency Cepstral Coefficients (MFCC). These features are subsequently classified using machine learning models such as 1D Convolutional Neural Network (1D-CNN), Support Vector Machine (SVM), Random Forest, and Decision Trees, as well as a deep learning model based on a 2D Convolutional Neural Network (2D-CNN). Experimental results demonstrate superior performance, with the SVM classifier achieving an accuracy of approximately 87.5% on a standard dataset and 99.3% on the AccentBase dataset. The 2D-CNN model further improves the results in multi-class accent classification tasks. This research contributes to advancing automatic speech recognition robustness and accent-inclusive speaker recognition, addressing critical challenges in real-world applications.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109512"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The docking control system of an autonomous underwater vehicle combining intelligent object recognition and deep reinforcement learning","authors":"Chao-Ming Yu, Yu-Hsien Lin","doi":"10.1016/j.engappai.2024.109565","DOIUrl":"10.1016/j.engappai.2024.109565","url":null,"abstract":"<div><div>This study develops a visual-based docking system (VDS) for an autonomous underwater vehicle (AUV), significantly enhancing docking performance by integrating intelligent object recognition and deep reinforcement learning (DRL). The system overcomes traditional navigation limitations in complex and unpredictable environments by using a variable information dock (VID) for precise multi-sensor docking recognition in the AUV. Employing image-based visual servoing (IBVS) technology, the VDS efficiently converts 2D visual data into accurate 3D motion control commands. It integrates the YOLO (short for You Only Look Once) algorithm for object recognition and the deep deterministic policy gradient (DDPG) algorithm, improving continuous motion control, docking accuracy, and adaptability. Experimental validation at the National Cheng Kung University towing tank demonstrates that the VDS enhances control stability and operational reliability, reducing the mean absolute error (MAE) in depth control by 42.03% and pitch control by 98.02% compared to the previous method. These results confirm the VDS's reliability and its potential for transforming AUV docking.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109565"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Xu , Zhen-Zhen Zhao , Tong-Wei Lu , Wei Ke , Yi Luo , Yan-Lin He , Qun-Xiong Zhu , Yang Zhang , Ming-Qing Zhang
{"title":"Latent temporal smoothness-induced Schatten-p norm factorization for sequential subspace clustering","authors":"Yuan Xu , Zhen-Zhen Zhao , Tong-Wei Lu , Wei Ke , Yi Luo , Yan-Lin He , Qun-Xiong Zhu , Yang Zhang , Ming-Qing Zhang","doi":"10.1016/j.engappai.2024.109476","DOIUrl":"10.1016/j.engappai.2024.109476","url":null,"abstract":"<div><div>This paper presents an innovative latent temporal smoothness-induced Schatten-<span><math><mi>p</mi></math></span> norm factorization (SpFLTS) method aimed at addressing challenges in sequential subspace clustering tasks. Globally, SpFLTS employs a low-rank subspace clustering framework based on Schatten-2/3 norm factorization to enhance the comprehensive capture of the original data features. Locally, a total variation smoothing term is induced to the temporal gradients of latent subspace matrices obtained from sub-orthogonal projections, thereby preserving smoothness in the sequential latent space. To efficiently solve the closed-form optimization problem, a fast Fourier transform is combined with the non-convex alternating direction method of multipliers to optimize latent subspace matrix, which greatly speeds up computation. Experimental results demonstrate that the proposed SpFLTS method surpasses existing techniques on multiple benchmark databases, highlighting its superior clustering performance and extensive application potential.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109476"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive graph learning algorithm for incomplete multi-view clustered image segmentation","authors":"Junhui Cao, Jing Hu, Rongguo Zhang","doi":"10.1016/j.engappai.2024.109264","DOIUrl":"10.1016/j.engappai.2024.109264","url":null,"abstract":"<div><div>There are problems of relying on data initialization and ignoring data structure in existing incomplete multi-view clustering algorithms, an adaptive graph learning incomplete multi-view clustering image segmentation algorithm is proposed. Firstly, the similarity matrix of each non missing view is adaptive learned, and the index matrix of the missing view is used to complete the similarity matrix and unify the dimensions,which ensure the authenticity of the data and revealing the data structure. Secondly, the low dimension representation of the complete similarity matrix under spectral constraints is calculated, and a discrete clustering index matrix is directly obtained through adaptive weighted spectral rotation, avoiding post-processing. The clustering index matrix is used to obtain clustering of multi-view features, thereby obtaining image segmentation results. Finally, an iterative algorithm optimization model is presented, which is compared with six existing algorithms using seven evaluation metrics on six datasets. The results show significant improvements in clustering performance and segmentation performance.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109264"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahshad Jamdar , Kiarash M. Dolatshahi , Omid Yazdanpanah
{"title":"Data-driven nonmodel seismic assessment of eccentrically braced frames with soil-structure interaction","authors":"Mahshad Jamdar , Kiarash M. Dolatshahi , Omid Yazdanpanah","doi":"10.1016/j.engappai.2024.109549","DOIUrl":"10.1016/j.engappai.2024.109549","url":null,"abstract":"<div><div>This study presents a nonmodel-based machine learning framework for estimating engineering demand parameters (EDPs) of eccentrically braced frames with soil-structure interaction effects. The objective is to estimate residual and peak story drift ratio, peak floor acceleration, and develop fragility curves using traditional regression equations and advanced machine-learning techniques. Correction coefficients are developed to improve prediction accuracy by accounting for soil-structure interaction. A comprehensive database, including incremental dynamic analysis results of 4- and 8-story frames, is developed, consisting of 109,841 data points. The database includes fixed-base models and models with various soil-structure interaction values, subjected to 44 far-field ground motions. Four scenarios are introduced considering various input variables to compare the impact of soil-structure interaction. Findings reveal the effects of soil-structure interaction features on the performance of machine learning algorithms, increasing by up to 17.61% of the coefficient of determination. Utilizing the predicted story drift ratio, two types of fragility curves indicate more precise predictions, emphasizing the impact of soil-structure interaction effects at lower damage levels. A graphical user interface has been developed to predict fragility curves based on various inputs to promote the practical use of machine learning in engineering. Two new 4-story frames are used as case studies, subjected to unseen ground motions to assess the application of trained machine learning algorithms. Prediction errors in input-output scenarios considering soil-structure interaction range from 3% to 18% for new frames. The proposed approach for predicting EDPs is further acknowledged by evaluating a real instrumented five-story steel frame office building.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109549"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kinematic matrix: One-shot human action recognition using kinematic data structure","authors":"Mohammad Hassan Ranjbar , Ali Abdi , Ju Hong Park","doi":"10.1016/j.engappai.2024.109569","DOIUrl":"10.1016/j.engappai.2024.109569","url":null,"abstract":"<div><div>One-shot action recognition, which refers to recognizing human-performed actions using only a single training example, holds significant promise in advancing video analysis, particularly in domains requiring rapid adaptation to new actions. However, existing algorithms for one-shot action recognition face multiple challenges, including high computational complexity, limited accuracy, and difficulties in generalization to unseen actions. To address these issues, we propose a novel kinematic-based skeleton representation that effectively reduces computational demands while enhancing recognition performance. This representation leverages skeleton locations, velocities, and accelerations to formulate the one-shot action recognition task as a metric learning problem, where a model projects kinematic data into an embedding space. In this space, actions are distinguished based on Euclidean distances, facilitating efficient nearest-neighbour searches among activity reference samples. Our approach not only reduces computational complexity but also achieves higher accuracy and better generalization compared to existing methods. Specifically, our model achieved a validation accuracy of 78.5%, outperforming state-of-the-art methods by 8.66% under comparable training conditions. These findings underscore the potential of our method for practical applications in real-time action recognition systems.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109569"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingbin Hao , Xiaokai Sun , Xinhua Liu , Dezheng Hua , Jianhua Hu
{"title":"A lightweight and explainable model for driver abnormal behavior recognition","authors":"Jingbin Hao , Xiaokai Sun , Xinhua Liu , Dezheng Hua , Jianhua Hu","doi":"10.1016/j.engappai.2024.109559","DOIUrl":"10.1016/j.engappai.2024.109559","url":null,"abstract":"<div><div>With the advancement of intelligent transportation systems, accurate identification of driver abnormal behavior is crucial for enhancing road safety. However, the limited computing power of vehicular systems poses a challenge for running efficient and explainable behavior recognition models. This paper proposes a lightweight and explainable driver abnormal behavior recognition model based on an improved You Only Look Once version 8 (YOLOv8). Firstly, a Spatial and Channel Reconstruction Convolution (SCConv) module is introduced to optimize the Convolution to Feature (C2f) structure, enhancing the model's feature extraction capabilities while reducing parameter redundancy. Secondly, a Spatial Pyramid Pooling with Fast Large Separable Kernel Attention (SPPF-LSKA) module is designed to better capture image context and integrate global information. Additionally, a Dynamic upsample (Dysample) module is introduced to improve the model's ability to capture subtle driver movements. Lastly, a Lightweight Shared Group Normalization Convolution Detection Head (LSGCDH) is designed to enhance the model's generalization ability, significantly reducing the model's computational load, parameter count, and size. Experimental results demonstrate that our approach has significant advantages for edge device deployment compared to mainstream algorithms. The visualization results effectively corroborate the role of each improved structure, enhancing the explainability of the abnormal behavior recognition model, which is beneficial for deployment in vehicular systems and contributes to improving road traffic safety.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109559"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qinghui Chen , Lunqian Wang , Zekai Zhang , Xinghua Wang , Weilin Liu , Bo Xia , Hao Ding , Jinglin Zhang , Sen Xu , Xin Wang
{"title":"Dual-path aggregation transformer network for super-resolution with images occlusions and variability","authors":"Qinghui Chen , Lunqian Wang , Zekai Zhang , Xinghua Wang , Weilin Liu , Bo Xia , Hao Ding , Jinglin Zhang , Sen Xu , Xin Wang","doi":"10.1016/j.engappai.2024.109535","DOIUrl":"10.1016/j.engappai.2024.109535","url":null,"abstract":"<div><div>While Transformer-based approaches have recently achieved notable success in super-resolution, their extensive computational requirements impede widespread practical adoption. High-resolution meteorological satellite cloud imagery is essential for weather analysis and forecasting. Enhancing image resolution through super-resolution techniques facilitates the accurate identification and localization of geographic features by meteorological systems. However, current super-resolution methods fail to restore the intricacies of cloud formations and complex regions fully. This research introduces a novel dual-path aggregation Transformer network (DPAT) tailored to enhance the super-resolution of meteorological satellite cloud images. The DPAT network adeptly captures cloud imagery's subtle details and textures, effectively addressing occlusions and the variability inherent in satellite imagery. It bolsters the model's ability to manage the complex attributes of cloud images through the introduction of the Dual-path Aggregation Self-Attention (DASA) mechanism and the Multi-scale Feature Aggregation Block (MFAB), thereby enhancing performance in processing intricate cloud features. The DASA mechanism synthesizes features across spatial, depth, and channel dimensions via a dual-path approach, thoroughly exploiting feature correlations. The MFAB, designed to supplant the multilayer perceptron, incorporates shift convolution and a multi-scale interaction block to augment feature information, compensating for the deficiency in local information absorption due to fixed receptive fields. Experimental outcomes indicate that DPAT delivers superior super-resolution outcomes. With a parameter count of only 32% of the Enhanced Deep Residual Network (EDSR) or 77% of the Image Restoration using Shift Window Transformer (SwinIR), DPAT matches SwinIR's performance on the satellite cloud dataset. Moreover, DPAT balances accuracy and parameter economy across various datasets. This technology is expected to improve image super-resolution capabilities in multiple fields such as human action recognition and industrial recognition, and indirectly improve the accuracy of image perception tasks.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109535"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenhe Shen , Xinjue Hu , Jialun Liu , Shijie Li , Hongdong Wang
{"title":"A pre-trained multi-step prediction informer for ship motion prediction with a mechanism-data dual-driven framework","authors":"Wenhe Shen , Xinjue Hu , Jialun Liu , Shijie Li , Hongdong Wang","doi":"10.1016/j.engappai.2024.109523","DOIUrl":"10.1016/j.engappai.2024.109523","url":null,"abstract":"<div><div>The advancement of autonomous maritime surface ships has increased the need for accurate and rapid multi-step prediction of ship motion for decision-making, motion planning, and real-time control tasks. This paper proposes a multi-step prediction method based on Informer with a pre-trained strategy to achieve accurate and fast motion prediction for ships, which substitutes generative inference for rolling prediction to avoid the cumulative error caused by the increasing time horizon. Due to the difference in temporal features from long-term control actions and short-term state sequences, heterogeneous inputs of encoder and decoder are designed to respectively capture their information without information redundancy. To address the bottleneck between the high cost of real data acquisition and the high demand for deep learning methods for data, we propose a mechanism-data dual-driven framework. This framework utilizes a prior mechanism model to generate virtual data incorporating a range of excitation signals designed in accordance with the results of free-running model tests. To reduce the need for real data and increase interpretability, the improved Informer is pre-trained by virtual data from the mechanism model before being trained by real data. Our experiments for multi-step ship motion prediction demonstrate that the proposed method respectively reduces the error and time to 41.36% and 13.20% on average compared to state-of-the-art and classical methods.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109523"},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel ensemble method based on residual convolutional neural network with attention module for transient stability assessment considering operational variability","authors":"Wensheng Liu, Song Han, Na Rong","doi":"10.1016/j.engappai.2024.109519","DOIUrl":"10.1016/j.engappai.2024.109519","url":null,"abstract":"<div><div>Data-driven methods have been extensively applied in the field of power system transient stability assessment (TSA) owing to their robust capabilities to excavate valuable features. However, TSA methods still face significant challenges in predictive accuracy and generalization ability under variable operation conditions with fluctuating loads or power generations. To address this, a data-driven ensemble TSA method which integrates convolutional block attention module (CBAM) with residual network (ResNet) is proposed to enhance the prediction accuracy. Meanwhile, the traditional cross entropy loss function is replaced by the focal loss function, aiming to reduce the misclassification of unstable samples. Moreover, a rapid updating strategy integrating active learning and fine turning techniques is suggested. It can renew the classifier quickly with limited labeled samples and less time when the network topology changes substantially and makes the pre-trained TSA model unavailable, thus ensuring optimal performance on the new topology. Finally, case studies conducted on the New England 10-machine 39-bus system and the Western Electricity Coordinating Council (WECC) 29-machine 179-bus system validate the effectiveness and robustness of the proposed TSA method. The accuracy of the proposed TSA method achieves 99.56% on 10-machine system and 99.47% on 29-machine system separately, demonstrating the superiority of the proposed TSA method.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109519"},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}