Jianwen Li, Yinglan Lv, Xiangbo Lin, Jinglue Hang, Xuanheng Li, Yi Sun
{"title":"A single-demonstration guided manipulation learning with dexterous hand","authors":"Jianwen Li, Yinglan Lv, Xiangbo Lin, Jinglue Hang, Xuanheng Li, Yi Sun","doi":"10.1016/j.engappai.2025.111606","DOIUrl":"10.1016/j.engappai.2025.111606","url":null,"abstract":"<div><div>Demonstration assisted reinforcement learning has been proven to be an extremely effective method for solving complex multi-fingered dexterous hand manipulation tasks. It usually requires costly and time-consuming expert demonstrations for each task, affecting the learning efficiency of the dexterous manipulation policy. To overcome this drawback, this paper devotes to use only one human demonstration per task to obtain a generalizable dexterous manipulation policy. And a novel ‘Basics-before-Extension’ policy learning strategy (BBE) is proposed for this purpose. It consists of two learning stages. In the ‘basics learning stage’, the dexterous hand extracts hand-object contact points as key clues from demonstration, facilitating a quick learning of the expert basic skill. While in ‘extension learning stage’, the designed joint policy training scheme enables the expert knowledge to be transferred and adapted to new environments, outputting generalizable policy. We present the distinctive overall framework from the low-cost demonstration data collection to the policy learning process. Meanwhile, BBE strategy has been experimentally validated on typical grasp and manipulation tasks, including relocating objects, opening door, hammering nails and functional grasp. The results indicate that the proposed BBE strategy can empower the multi-fingered dexterous hand with the intelligence of learning typical grasp and manipulation efficiently and accurately from a single demonstration.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111606"},"PeriodicalIF":7.5,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing attention-based methods with long short-term memory for state encoding in reinforcement learning-based separation management","authors":"D.J. Groot, J. Ellerbroek, J.M. Hoekstra","doi":"10.1016/j.engappai.2025.111592","DOIUrl":"10.1016/j.engappai.2025.111592","url":null,"abstract":"<div><div>Reinforcement learning (RL) is a method that has been studied extensively for the task of conflict-resolution and separation management within air traffic control, offering advantages over analytical methods. One key challenge associated with RL for this task is the construction of the input vector. Because the number of agents in the airspace varies, methods that can handle dynamic number of agents are required. Various methods exist, for example, selecting a fixed number of aircraft, or using methods such as recurrent neural networks or attention to encode the information. Multiple studies have shown promising results using these encoder methods, however, studies comparing these methods are limited and the results remain inconclusive on which method works better. To address this issue, this paper compares different input encoding methods: three different attention methods – scaled dot-product, additive and context aware attention – and long short-term memory (LSTM) with three different sorting strategies. These methods are used as input encoders for different models trained with the Soft Actor–Critic algorithm for separation management in high traffic density scenarios. It is found that additive attention is the most effective at increasing the total safety and maximizing path efficiency, outperforming the commonly used scaled dot-product attention and LSTM. Additionally, it is shown that the order of the input sequence significantly impacts the performance of the LSTM based input encoder. This is in contrast with the attention methods, which are sequence-independent and therefore do not suffer from biases introduced by the order of the input sequence.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111592"},"PeriodicalIF":7.5,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Three-dimensional deep shape optimization with a limited dataset","authors":"Yongmin Kwon, Namwoo Kang","doi":"10.1016/j.engappai.2025.111504","DOIUrl":"10.1016/j.engappai.2025.111504","url":null,"abstract":"<div><div>Generative models have attracted considerable attention for their ability to produce novel shapes. However, their application in mechanical design remains constrained due to the limited size and variability of available datasets. This study proposes a deep learning-based optimization framework specifically tailored for shape optimization with limited datasets, leveraging positional encoding and a Lipschitz regularization term to robustly learn geometric characteristics and maintain a meaningful latent space. Through extensive experiments, the proposed approach demonstrates robustness, generalizability and effectiveness in addressing typical limitations of conventional optimization frameworks. The validity of the methodology is confirmed through multi-objective shape optimization experiments conducted on diverse three-dimensional datasets, including wheels and cars, highlighting the model’s versatility in producing practical and high-quality design outcomes even under data-constrained conditions.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111504"},"PeriodicalIF":7.5,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint depth-segmentation learning with segment priors for non-contact seedling height and stem thickness estimation","authors":"Lei Song , Bo Jiang , Huaibo Song","doi":"10.1016/j.engappai.2025.111572","DOIUrl":"10.1016/j.engappai.2025.111572","url":null,"abstract":"<div><div>To achieve precise and rapid computation of seedling height and stem diameter — key phenotypic traits for monitoring seedling growth and selecting superior varieties — this study proposes a SAM-Integrated Adaptive Fusion Depth Network (SAFD-Net). SAFD-Net integrates segmentation masks generated by Segment Anything Model (SAM) with an Adaptive Prior Extraction (APE) module to produce priors focused on individual seedling characteristics, and it fuses these priors with deep features through an Adaptive Attention Fusion (AAF) module. A Local Depth Generation (LDG) module refines depth details to improve estimation accuracy, and an Adaptive Multi-scale Fusion (AMF) module merges LDG outputs at different scales to produce high-precision depth maps. From these maps, seedling region depth, pixel height, and pixel stem diameter are extracted to compute actual seedling height and stem diameter. Comparisons with various depth estimation networks demonstrate that SAFD-Net outperforms existing models in both depth estimation and seedling measurement. Experimental evaluations on seedlings from three crops with distinct phenotypic characteristics further show that the method maintains high accuracy under varying shooting distances, lighting conditions, multiple targets, and tilt angles, offering a novel approach for phenotypic monitoring during seedling cultivation. <em><strong>Code is released at</strong></em> <span><span>https://github.com/Songlei7664/SAFD-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111572"},"PeriodicalIF":7.5,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Jin , Qing He , Yuji Wang , Nisuo Du , Wenjing Lei
{"title":"Aspect-based sentiment analysis with semantic and syntactic enhanced multi-layer fusion model","authors":"Song Jin , Qing He , Yuji Wang , Nisuo Du , Wenjing Lei","doi":"10.1016/j.engappai.2025.111654","DOIUrl":"10.1016/j.engappai.2025.111654","url":null,"abstract":"<div><div>Aspect-based sentiment analysis (ABSA) aims to identify the sentiment polarity of specific aspect words or phrases in a sentence. Although recent studies have used attention mechanisms or syntactic relations of dependency trees to establish links between aspect terms and sentences, these approaches are imperfect in effectively fusing syntactic and semantic contextual information. Therefore, in this paper, we propose a novel multi-layer fusion model (MLFM) based on artificial intelligence (AI) techniques to efficiently fuse semantic and syntactic information for sentiment analysis. In the model, we first propose a new bi-graph convolutional network module for aspect term-centered aspect nodal attention (Aspect-NA) to enhance Semantic and Syntactic learning. Within Aspect-NA, we introduce dependency embedding and propose a dual embedding update mechanism that pays more attention to the influence of dependency types and semantics. In addition, we propose an adaptive hierarchical cross-attention (AHCA) for fusing the semantic information of aspect term with their associated syntactic features. AHCA not only effectively fuses features between syntax and semantics of the context, but also carries out the key features. We conducted experiments on six benchmark datasets, and the results show that our proposed model outperforms most baseline methods. The code and datasets involved in this paper are provided on <span><span>https://github.com/jims-bug/MLFM.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111654"},"PeriodicalIF":7.5,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A deep learning based iterative denoising algorithm for multiple frequency lines recovery","authors":"Qifan Shen, Xinwei Luo, Long Chen","doi":"10.1016/j.engappai.2025.111601","DOIUrl":"10.1016/j.engappai.2025.111601","url":null,"abstract":"<div><div>Passive detection technology constitutes a crucial research direction in underwater acoustic target detection. It has been the subject of ongoing investigations to address the pressing need for stealth capabilities. The most formidable hurdle that all types of detectors must overcome is the extraction of line spectral components relevant to the target, given the convoluted underwater environment teeming with significant noise pollution. In this paper, a pioneering deep learning-based algorithm, known as the Additive Diffusion Probabilistic Denoising Model (ADPDM), is proposed to rectify the performance inadequacies of neural network-based approaches when operating under low signal-to-noise ratios (SNRs). To begin with, the ADPDM was ingeniously crafted. It was designed to astutely modify the representation of underwater signals by transforming the generative inference process of the diffusion model into a deterministic recovery strategy. Subsequently, the ADPDM was expanded into the complex-valued time–frequency joint domain, in order to take full advantage of the multi-dimensional information representation brought about by the lofargram. Moreover, an accelerating inference algorithm was adopted and calibrated to be fully compatible with the ADPDM framework. In contrast to the prevailing frequency line trackers that predominantly concentrate on discerning the frequency positions of the line spectrum, the ADPDM is dedicated to unearthing and reconstructing the latent line spectrum components concealed within the observed signal. This, in turn, paves the way for more effective subsequent detection or estimation operations. Empirical results demonstrated that the frequency lines within the signal enhanced by the ADPDM can be detected with remarkable efficacy, even when a relatively less sophisticated tracker is employed. On the basis of these findings, the detection performance metrics of the ADPDM have been shown to outstrip those of the current state-of-the-art (SOTA) methods, both those founded on deep learning and the hidden Markov model (HMM), across the entire spectrum of experimental SNRs.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111601"},"PeriodicalIF":7.5,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weihong Cen , Chupeng Su , Kainuo Cen , Lie Yang , Gang Chen , Longhan Xie
{"title":"A dual spatial temporal neural network for bottleneck prediction in manufacturing systems","authors":"Weihong Cen , Chupeng Su , Kainuo Cen , Lie Yang , Gang Chen , Longhan Xie","doi":"10.1016/j.engappai.2025.111586","DOIUrl":"10.1016/j.engappai.2025.111586","url":null,"abstract":"<div><div>In manufacturing systems, bottlenecks act as constraints that limit system throughput. Extensive efforts have been made to detect and predict bottlenecks. Traditional bottleneck prediction methods predominantly utilize time-series feature analysis, which is limited in capturing the dynamic spatial dependencies introduced by production material flow. To address these limitations, we proposed a dual spatial temporal neural network for dynamic bottlenecks (Dual-BDSTN) to learn the dependencies of temporal and spatial features dynamically. In the temporal module, a gated recurrent unit combined with a self-attention mechanism is employed to capture the time-evolving dynamics of temporal features related to machine status. In the spatial module, a dynamic graph neural network is employed to learn spatial information affected by dynamic production material flow and a cross-attention mechanism captures the effect of temporal features on spatial features. Finally, gated recurrent neural networks are applied to capture the temporal trends of the temporal and spatial features to predict future starvation and blockage for identifying bottleneck locations. Experimental results demonstrate that the proposed model outperforms the best benchmark, achieving a 5.95 and 2.95 reduction in root mean square error for predicting starvation and blockage times of overall machines in the production system respectively (over 10%), with a 2.85% improvement in bottleneck prediction accuracy.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111586"},"PeriodicalIF":7.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid approach for static hand gesture recognition: Integrating Directional Adaptive Patterns with Multi-Scale Feature Extraction and Aggregation","authors":"Arti Bahuguna , Gopa Bhaumik , Bam Bahadur Sinha , Mahesh Chandra Govil","doi":"10.1016/j.engappai.2025.111566","DOIUrl":"10.1016/j.engappai.2025.111566","url":null,"abstract":"<div><div>This research introduces a hybrid model that combines the strengths of the Directional Adaptive Pattern (DAP) descriptor and the Multi-Scale Feature Extraction and Aggregation Network (MaXNet) to achieve robust and efficient gesture recognition. The primary objective of this study is to enhance accuracy and computational efficiency while ensuring robustness across diverse datasets. The directional adaptive pattern descriptor effectively captures intricate texture details and directional variations by leveraging directional feature analysis, adaptive neighborhood encoding, and multilevel pattern representation. To address the variable-size feature outputs of the proposed descriptor, agglomerative clustering is utilized to generate compact, fixed-size representations, reducing noise while preserving essential texture information. Multi-scale feature extraction and aggregation network further enhances multiscale feature extraction by integrating multi-kernel convolutional layers, depthwise and pointwise convolutions, and hierarchical feature aggregation. Its lightweight and modular design allows for efficient extraction of fine-grained and large-scale patterns while maintaining computational efficiency. The effectiveness of the proposed model is evaluated based on accuracy, precision, recall, and F1-score across ten benchmark datasets. Experimental results show that the proposed model achieves superior accuracy compared to the current state-of-the-art methods.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111566"},"PeriodicalIF":7.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prognosticate pulmonary pathosis for COVID negative and post-acute COVID patients using chest computed tomography images","authors":"D. Suganya , R. Kalpana","doi":"10.1016/j.engappai.2025.111639","DOIUrl":"10.1016/j.engappai.2025.111639","url":null,"abstract":"<div><div>A significant number of studies have omitted data regarding fatalities (6 months–2 year) after the recovery from Corona Virus Diseases (COVID). Post-COVID, or long-term COVID, refers to the enduring consequences that individuals who have recovered from COVID-19 commonly suffer. People with chronic lung disorders are more likely to die as the infection progresses rapidly. Chest Computed Tomography (CT) images were used to identify the lung abnormalities and determine the patient's exact condition. The proposed method groups COVID-19-negative patients by chronic lung diseases, post-COVID lung disorders, and severity using an improved mask regional-convolutional neural network (R-CNN) to analyze chest CT scan images. Generate synthetic image with a cycle-consistency generative adversarial network (Cycle-GAN) to avoid overfitting. Enhanced Mask R-CNN using ResNet-101 by incorporating FPN model classifies COVID-negative patients' abnormal lung conditions as chronic or post-COVID disorder. This model achieves an accuracy of 95.71 %, F1 score of 94.21 %, a mean average precision (mAP) of 92.34 %, and a geometric mean (G-mean) of 94.89 %. Further post-COVID disorder can be classified into mild (structural abnormalities) or severe (fibrosis). This model had an accuracy of 90.38 % without using cycle-GAN and 94.35 % of accuracy by applying cycle GAN to generate synthetic images which balances the dataset for the severity classification of post-COVID disorder. It achieves the p-value as 0.0004 where p < 0.01 shows that the augmented dataset showed significantly higher performance. This method helps radiologists diagnose chronic lung disease or post-COVID disorder, which enables them to provide appropriate and effective treatments.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111639"},"PeriodicalIF":7.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingfeng Li , Boxiang Lv , Qingshan Chen , Xinxin Duan , Xinwei Li
{"title":"Comprehensive-Detail Synergy with Multi-Level Dynamic Interaction for Enhanced Salient Object Detection","authors":"Bingfeng Li , Boxiang Lv , Qingshan Chen , Xinxin Duan , Xinwei Li","doi":"10.1016/j.engappai.2025.111579","DOIUrl":"10.1016/j.engappai.2025.111579","url":null,"abstract":"<div><div>Salient Object Detection involves identifying and segmenting the most visually distinctive objects in an image. A key challenge is distinguishing Salient objects from complex backgrounds while preserving global features and minimizing local detail loss. To address this issue, we introduce a Comprehensive-Detail Synergy with Multi-Level Dynamic Interaction for Enhanced Salient Object Detection aimed at enhancing salient object features. Initially, a Multi-Scale Pooling Self-Attention Module is introduced to capture global contextual information of salient objects by combining multi-scale max pooling across spatial dimensions with self-attention. Additionally, to better preserve local details, an Adaptive Channel Enhancement Block is proposed, utilizing an adaptive weighting strategy to prioritize salient channels and enhance the model’s ability to capture intricate local features. Furthermore, to enhance the interaction between features at different levels, a Multi-Level Diffusive Synergy Block is introduced. With the integration of the cross-attention and dynamic diffusion refinement mechanism, it enables deep features to guide shallow features in focusing on salient regions. To alleviate the loss of local details due to excessive deep feature guidance, a Dual-Domain Fusion Attention Module is proposed, which integrates global self-attention with locally enhanced feature extraction units, thereby balancing global context modeling and local detail preservation. The experimental results conducted on six challenging publicly available datasets demonstrate that the proposed method outperforms the state of the art, achieving improvements of 4.9%, 3.3%, 2.4%, 2.3%, 1.2%, and 7.2% in the Weighted Harmonic Mean of Precision and Recall. These results demonstrate that the method improves accuracy and boundary detail.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111579"},"PeriodicalIF":7.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}