Engineering Applications of Artificial Intelligence最新文献

筛选
英文 中文
AutoMedTS: Automated modeling of physiological time series for surgical suturing action recognition AutoMedTS:用于外科缝合动作识别的生理时间序列自动建模
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-22 DOI: 10.1016/j.engappai.2025.112880
Baobing Zhang , Paul Sullivan , Benjie Tang , Ghulam Nabi , Mustafa Suphi Erden
{"title":"AutoMedTS: Automated modeling of physiological time series for surgical suturing action recognition","authors":"Baobing Zhang ,&nbsp;Paul Sullivan ,&nbsp;Benjie Tang ,&nbsp;Ghulam Nabi ,&nbsp;Mustafa Suphi Erden","doi":"10.1016/j.engappai.2025.112880","DOIUrl":"10.1016/j.engappai.2025.112880","url":null,"abstract":"<div><div>In laparoscopic surgical training and evaluation, real-time recognition of surgical actions with transparency outputs is crucial for automated, objective, and immediate instructional feedback to support skills improvement. However, we face challenges due to limited dataset sizes and variability in surgical environments. This study presents <em>AutoMedTS</em>, an end-to-end automated machine learning framework customized for medical time-series data, enabling rapid deployment using surgical suturing trajectories collected from both expert and novice surgeons. The proposed method features key improvements including: (i) a novel temperature-scaled Softmax resampling technique effectively addressing severe class imbalance, and (ii) an uncertainty-aware ensemble selection mechanism ensuring robust predictions across surgeons with varying skill levels. Additionally, the approach emphasizes model transparency to meet the high standards of reliability and transparency required in medical applications. Compared to deep learning methods, traditional machine learning models not only facilitate efficient rapid deployment but also offer significant transparency advantages. Experimental results demonstrate that our method provides fast, stable, and reliable real-time surgical action recognition in clinical training environments. Code and data are publicly available at <span><span>https://github.com/baobingzhang/AutoMedTS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112880"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing surrogate model accuracy in ship design optimization through intelligent constraint-aware sample selection 通过智能约束感知样本选择提高船舶设计优化中代理模型的精度
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-22 DOI: 10.1016/j.engappai.2025.112716
Chang HaiChao , Hou Wenlong , Liu Zuyuan , Feng Baiwei , Zheng Qiang
{"title":"Enhancing surrogate model accuracy in ship design optimization through intelligent constraint-aware sample selection","authors":"Chang HaiChao ,&nbsp;Hou Wenlong ,&nbsp;Liu Zuyuan ,&nbsp;Feng Baiwei ,&nbsp;Zheng Qiang","doi":"10.1016/j.engappai.2025.112716","DOIUrl":"10.1016/j.engappai.2025.112716","url":null,"abstract":"<div><div>For the hull form optimization design problem based on numerical simulation, constructing a surrogate model is usually required to reduce computational cost and time. However, since the existing sample point selection methods do not consider the influence of constraint conditions on the sampling space, the effectiveness of their sample point selection is not high, which is also one of the main reasons for the high cost of constructing a high-precision surrogate model. Therefore, this paper proposes an improved sampling method to achieve effective selection of sample points in the feasible region and improve the efficiency of the surrogate model development. The proposed method uses data mining to identify the potential mapping relationship between optimization variables and constraint conditions, realizes the selection of sample points in the space satisfying constraints, and then constructs an surrogate model in the feasible region to achieve efficient hull form optimization. Applying this method to the actual hull form optimization process of a 7500 Deadweight Tonnage (DWT) bulk carrier shows that under the same sample size, the prediction accuracy of the surrogate model is significantly improved, and the optimization result similar to that of the traditional method is obtained, verifying the engineering applicability of the intelligent sampling process proposed in this paper. This paper proposes an intelligent sampling framework integrating data mining, innovatively embeds data mining technology into the sampling process, realizes the reduction of sampling space and optimization space from the full space to the constraint subspace, leading to ship intelligent optimization.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112716"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved retrieval-augmented long-term grouting power prediction method: Rejecting low-similarity retrievals 一种改进的检索增强长期注浆功率预测方法:拒绝低相似检索
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-22 DOI: 10.1016/j.engappai.2025.112733
Baoxi Liu , Liangsi Xu , Bingyu Ren , Chengyu Yu , Hongling Yu , Xiangyu Chen , Xinyu Liu
{"title":"An improved retrieval-augmented long-term grouting power prediction method: Rejecting low-similarity retrievals","authors":"Baoxi Liu ,&nbsp;Liangsi Xu ,&nbsp;Bingyu Ren ,&nbsp;Chengyu Yu ,&nbsp;Hongling Yu ,&nbsp;Xiangyu Chen ,&nbsp;Xinyu Liu","doi":"10.1016/j.engappai.2025.112733","DOIUrl":"10.1016/j.engappai.2025.112733","url":null,"abstract":"<div><div>Grouting power long-term prediction is beneficial to regulating power output. Traditional long-term prediction methods require iterative updates with newly accumulated data during construction, which is time-consuming. Retrieval-augmented methods not only achieve higher prediction accuracy but also enable more efficient performance upgrades through database updates, avoiding the need to retrain models. However, conventional retrieval augmented frameworks unconditionally incorporate retrieved sequences into the prediction process, even when their similarity to the query is low. This design choice can introduce noisy or irrelevant historical patterns, misleading the fusion mechanism and degrading overall performance. To address this issue, this study proposes a retrieval-augmented method for long-term grouting power prediction with a rejection-substitution mechanism. Compared with the naive retrieval augmented prediction method, this mechanism enables selective fusion of retrievals by evaluating the similarity of each retrieved sequence before integration. If the similarity falls below a predefined threshold, the corresponding result is substituted with a prediction from the TimeXer model. Otherwise, the retrieved result is retained. The processed results are then fused by a Gate Recurrent Unit network to generate the final prediction. To validate the effectiveness of the proposed method, experiments were conducted on both a grouting power dataset and a publicly accessible dataset. The results indicate that incorporating a rejection-substitution mechanism enhances the prediction accuracy compared to the traditional retrieval-augmented prediction approach.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112733"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mutual risk prompt learning with multi-objective optimization for collaborative tumor and peritumor segmentation 基于多目标优化的互风险提示学习协同肿瘤和肿瘤周围分割
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-22 DOI: 10.1016/j.engappai.2025.112824
Nuo Tong , Qingyang Meng , Chunsheng Xu , Changhao Liu , Shuiping Gou , Mei Shi , Mengbin Li
{"title":"Mutual risk prompt learning with multi-objective optimization for collaborative tumor and peritumor segmentation","authors":"Nuo Tong ,&nbsp;Qingyang Meng ,&nbsp;Chunsheng Xu ,&nbsp;Changhao Liu ,&nbsp;Shuiping Gou ,&nbsp;Mei Shi ,&nbsp;Mengbin Li","doi":"10.1016/j.engappai.2025.112824","DOIUrl":"10.1016/j.engappai.2025.112824","url":null,"abstract":"<div><div>Early radical surgery, radiotherapy, and other treatments may offer curative effects for tumors. However, the proximity of the tumor to surrounding organs-at-risk (OARs) significantly influences both the surgical outcome and prognosis. For benign tumors, the risk is primarily associated with the tumor's boundaries. In contrast, for malignant tumors, the main challenge lies in balancing the preservation of surrounding organ function while minimizing the risk of tumor recurrence. Therefore, understanding the tumor's characteristics and its anatomical relationships with OARs are essential. Most of the existing studies neglect the constrained interrelations and the potential optimization conflicts between tumor and OARs and easily introduce risks and uncertainties in tumor treatment and OARs protection. Here, we propose a novel multi-objective segmentation network for tumor and OARs, called ROJS-Net, which incorporates mutual risk prompt learning and multi-gate mixture of experts to achieve risk-optimized collaborative segmentation. A multi-task learning framework with shared encoder and multiple expert decoders are employed as the network backbone. Mutual risk prompt learning module is developed to obtain the target-specific features and perform mutual risk recalibration between features of different targets, enabling a comprehensive understanding of the anatomical environment. The risk-recalibrated features are then fed into the task-specific gating network to adaptively activate the highly-correlated expert decoders, generating the final segmentation results. Extensive experiments conducted on both benign and malignant tumor datasets demonstrate the effectiveness of the proposed ROJS-Net. These results validate that ROJS-Net effectively resolves the optimization divergence, facilitating risk-controllable treatment planning in various clinical settings.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112824"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Adaptive Image Dehazing Network with Multi-Color Feature for Complex Real-World Hazy Scenes 一种具有多色特征的复杂现实朦胧场景自适应图像去雾网络
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-21 DOI: 10.1016/j.engappai.2025.112867
Zhiyu Lyu , Qi An , Yan Chen
{"title":"An Adaptive Image Dehazing Network with Multi-Color Feature for Complex Real-World Hazy Scenes","authors":"Zhiyu Lyu ,&nbsp;Qi An ,&nbsp;Yan Chen","doi":"10.1016/j.engappai.2025.112867","DOIUrl":"10.1016/j.engappai.2025.112867","url":null,"abstract":"<div><div>Real-world hazy scenes can be broadly categorized into four types based on haze distribution and concentration: light homogeneous haze, dense homogeneous haze, light non-homogeneous haze, and dense non-homogeneous haze. However, many existing dehazing models are tailored for specific haze types, struggling to generalize effectively across these diverse conditions. Additionally, these models commonly extract feature information in the Red, Green, and Blue (RGB) color space, which makes it challenging to extract sufficient feature information in various hazy scenes. To address this issue, we propose an Adaptive Network (AdaNet) for multiple hazy scenes. The network includes two sub-networks: a color-guided feature extraction network and a scene reconstruction network. The color-guided feature extraction network is used to capture sufficient color, detail, and other feature information in both RGB and Luminance, Chroma Red, Chroma Blue (YCrCb) color spaces. For light and dense non-homogeneous hazy scenes, we enhance the scene reconstruction network with the Feature Selection Units (FSU) to filter out less relevant information, ensuring precise recovery of critical local details. Additionally, to tackle dehazing in light and dense homogeneous hazy scenes, we integrate the Feature Fusion Units (FFU) that combine multi-level features to improve overall feature utilization. Extensive experiments on multiple datasets with diverse hazy scenes demonstrate that our AdaNet outperforms state-of-the-art dehazing models, producing high-quality dehazed images in quadruple haze scenarios and ensuring reliability for high-level visual tasks in real-world hazy scenes.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112867"},"PeriodicalIF":8.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal fusion in speech emotion recognition: A comprehensive review of methods and technologies 语音情感识别中的多模态融合:方法与技术综述
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-21 DOI: 10.1016/j.engappai.2025.112624
Nhut Minh Nguyen , Thanh Trung Nguyen , Phuong-Nam Tran , Chee Peng Lim , Nhat Truong Pham , Duc Ngoc Minh Dang
{"title":"Multimodal fusion in speech emotion recognition: A comprehensive review of methods and technologies","authors":"Nhut Minh Nguyen ,&nbsp;Thanh Trung Nguyen ,&nbsp;Phuong-Nam Tran ,&nbsp;Chee Peng Lim ,&nbsp;Nhat Truong Pham ,&nbsp;Duc Ngoc Minh Dang","doi":"10.1016/j.engappai.2025.112624","DOIUrl":"10.1016/j.engappai.2025.112624","url":null,"abstract":"<div><div>Speech emotion recognition (SER) plays a crucial role in human–computer interaction, enhancing numerous applications such as virtual assistants, healthcare monitoring, and customer support by identifying and interpreting emotions conveyed through spoken language. While unimodal SER systems demonstrate notable simplicity and computational efficiency, excelling in extracting critical features like vocal prosody and linguistic content, there is a pressing need to improve their performance in challenging conditions, such as noisy environments and the handling of ambiguous expressions or incomplete information. These challenges underscore the necessity of transitioning to multimodal approaches, which integrate complementary data sources to achieve more robust and accurate emotion detection. With advancements in artificial intelligence, especially in neural networks and deep learning, many studies have employed advanced deep learning and feature fusion techniques to enhance SER performance. This review synthesizes a comprehensive collection of publications from 2020 to 2024, exploring prominent multimodal fusion strategies, including early fusion, late fusion, deep fusion, and hybrid fusion methods, while also examining data representation, data translation, attention mechanisms, and graph-based fusion technologies. We assess the effectiveness of various fusion techniques across standard SER datasets, highlighting their performance in diverse tasks and addressing challenges related to data alignment, noise management, and computational demands. Furthermore, we highlight real-world applications of multimodal SER and provide critical research challenges that must be addressed for practical deployment, offering insights into optimal fusion strategies and guiding future developments in multimodal SER.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112624"},"PeriodicalIF":8.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fuzzy fixed-time multi-source fusion with attention-based neural networks for reliable navigation under satellite signal loss 基于注意力神经网络的模糊定时多源融合在卫星信号丢失情况下的可靠导航
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-21 DOI: 10.1016/j.engappai.2025.112806
Elahe Sadat Abdolkarimi , Sadra Rafatnia
{"title":"Fuzzy fixed-time multi-source fusion with attention-based neural networks for reliable navigation under satellite signal loss","authors":"Elahe Sadat Abdolkarimi ,&nbsp;Sadra Rafatnia","doi":"10.1016/j.engappai.2025.112806","DOIUrl":"10.1016/j.engappai.2025.112806","url":null,"abstract":"<div><div>Accurate and reliable navigation is critical for autonomous systems, particularly in environments with sensor errors, unknown dynamic variations, global navigation satellite system (GNSS) outages, and complex maneuvers. This paper presents an innovative navigation framework that combines an adaptive observer with fixed-time convergence and a deep learning model based on an attention-equipped convolutional neural network (CNN) and a gated recurrent unit (GRU) with conditional fusion of air-data sensors in both GNSS-available and GNSS-denied conditions. At the core of the system lies a nonlinear observer with theoretical guarantees for fixed-time convergence, capable of estimating both the system state and perturbations independently of initial conditions. To enhance resilience against noise, sensor drift, and the limitations of low-cost inertial measurement units (IMUs), a fuzzy logic-based mechanism is employed to adaptively adjust observer gains using the normalized estimation error and its variance. During GNSS outages, the deep learning predictor — based on CNN and GRU with an attention mechanism — is used to estimate horizontal position changes from IMU data. Convolutional neural network layers extract local features, GRU layers capture temporal dependencies, and the attention mechanism prioritizes informative segments, enabling accurate and reliable predictions. In addition, a conditional sensor fusion strategy is proposed, which selectively utilizes pitot tube velocity and barometric altitude only during GNSS-denied phases, effectively mitigating the drift of the strap-down inertial navigation system (SINS). Field experiments across complex trajectories — including several GNSS interruptions — demonstrate that the proposed hybrid artificial intelligence-based framework offers improved estimation accuracy and real-time autonomous navigation in challenging real-world environments.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112806"},"PeriodicalIF":8.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145334882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable stacking-based hybrid machine learning for predicting uni-axial creep deformation in concrete 基于可解释堆叠的混合机器学习预测混凝土单轴蠕变变形
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-21 DOI: 10.1016/j.engappai.2025.112843
Mahamadou Djibo Zakari, Jing Wu, Luqi Xie, Abdoul Razak Abdou Harouna
{"title":"Explainable stacking-based hybrid machine learning for predicting uni-axial creep deformation in concrete","authors":"Mahamadou Djibo Zakari,&nbsp;Jing Wu,&nbsp;Luqi Xie,&nbsp;Abdoul Razak Abdou Harouna","doi":"10.1016/j.engappai.2025.112843","DOIUrl":"10.1016/j.engappai.2025.112843","url":null,"abstract":"<div><div>To address the complexity of modeling concrete creep behavior and the limitations of traditional models, this study proposes a data-driven hybrid machine learning model for accurate prediction of creep deformation. The Northwestern University creep database is preprocessed to identify the most influential factors, and a stacking-based hybrid model is developed by combining five ensemble tree-based algorithms with an artificial neural network. Bayesian optimization, implemented via the Hyperopt library, is employed for hyperparameter tuning, ensuring optimal model performance. A 10-fold cross-validation is conducted to demonstrate the model's strong generalization capability. The hybrid model outperforms standalone base estimators, achieving a coefficient of determination (R<sup>2</sup>) of 0.960 on the testing set. SHapley Additive exPlanations are used to interpret the model's predictions globally and locally, revealing factor importance consistent with experimental findings. A comparison with three widely used traditional models, the Comité Européen du Béton (CEB) Model Code 90–99, Fédération Internationale du Béton (fib) Model Code 2010, and the B4 model on selected testing subsets demonstrates the superiority of the proposed model across six evaluation metrics. The prediction of various creep strains closely aligns with experimentally measured values, further validating the model's accuracy and effectiveness in predicting different types of creep deformations.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112843"},"PeriodicalIF":8.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145334887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-domain attentions for unmanned aerial vehicle small object detection 无人机小目标检测的双域关注
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-21 DOI: 10.1016/j.engappai.2025.112849
Chunmei Wang , Yunxiao Chang , Shan Xie , Xiaobao Yang , Yadong Tian , Wei Sun , Junyan Hu
{"title":"Dual-domain attentions for unmanned aerial vehicle small object detection","authors":"Chunmei Wang ,&nbsp;Yunxiao Chang ,&nbsp;Shan Xie ,&nbsp;Xiaobao Yang ,&nbsp;Yadong Tian ,&nbsp;Wei Sun ,&nbsp;Junyan Hu","doi":"10.1016/j.engappai.2025.112849","DOIUrl":"10.1016/j.engappai.2025.112849","url":null,"abstract":"<div><div>Images captured by unmanned aerial vehicles (UAVs) often suffer from severe degradation in small object quality and resolution due to environmental constraints, posing significant challenges in preserving the dual-domain characteristics of spatial details and frequency components. While large-scale models attempt to address this through complex architectures, aggressive down-sampling and successive convolution operations inevitably erase fine-grained patterns that are essential for detecting small objects. To overcome these challenges, we propose a dual-domain attention mechanism for small object detection, which focuses on both spatial and frequency domains. In the spatial domain, the proposed step-free triple-attention convolution (SFTAConv) reduces information loss during feature propagation by combining spatial–channel interactions and a lossless space-to-depth transform, thereby enhancing subtle object patterns while suppressing background interference. In the frequency domain, the frequency-domain hybrid attention (FD-HAT) jointly recalibrates high- and low-frequency components, moving beyond single-domain recalibration to recover discriminative representations of occluded or blurred small objects. Additionally, a classification-assisted localization (CAL) branch with classification-guided localization further refines detection accuracy. After extensive experiments conducted on the vision meets drone 2019 object detection (VisDrone2019Det), dataset for object detection in aerial (DOTA), and pascal visual object classes (PASCAL VOC) datasets, the results demonstrate that our model achieved the significant gains of 2.2%, 1.7%, 5.3% at <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></math></span> metric on three datasets, respectively, and being competitive with the state-of-the-art (SOTA) detectors.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112849"},"PeriodicalIF":8.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145334889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancement of apple defect identification with semantic segmentation and label-efficient data generation 基于语义分割和标签高效数据生成的苹果缺陷识别方法的改进
IF 8 2区 计算机科学
Engineering Applications of Artificial Intelligence Pub Date : 2025-10-21 DOI: 10.1016/j.engappai.2025.112679
Jiwon Ryu , Sang-Yeon Kim , Chang-Hyup Lee , Gyumin Kim , Harin Jang , Taehyeong Kim , Suk-Ju Hong , Geon Hee Kim , Ghiseok Kim
{"title":"Enhancement of apple defect identification with semantic segmentation and label-efficient data generation","authors":"Jiwon Ryu ,&nbsp;Sang-Yeon Kim ,&nbsp;Chang-Hyup Lee ,&nbsp;Gyumin Kim ,&nbsp;Harin Jang ,&nbsp;Taehyeong Kim ,&nbsp;Suk-Ju Hong ,&nbsp;Geon Hee Kim ,&nbsp;Ghiseok Kim","doi":"10.1016/j.engappai.2025.112679","DOIUrl":"10.1016/j.engappai.2025.112679","url":null,"abstract":"<div><div>The increasing demand for automated fruit-sorting systems has driven the development of machine vision and deep learning technologies for postharvest grading of fruit defects. Effective sorting requires identifying not only the presence of defects but also their types and severity to avoid unnecessary rejection of marketable fruits with minor defects. This study applied deep learning-based semantic segmentation models to Fuji apple images, focusing on four defect types: cracks, bruises, diseases, and scars. Model performance was evaluated for both defect classification and severity estimation. To further improve performance, a label-efficient approach using generative adversarial networks was proposed to generate synthetic apple images and defect masks, reducing the need for extensive manual labeling to create a larger dataset. Qualitative and quantitative analyses of the generated results showed that the synthetic dataset successfully mimicked the biological characteristics of apples as well as the shape, position, and size of the defects. The segmentation model's ability to identify defects was enhanced by the proposed synthetic dataset. The R<sup>2</sup> values for defect severity estimation increased to 0.82, 0.85, 0.75, and 0.92 for cracks, bruises, diseases, and scars respectively, while F1-scores for defect classification reached 100, 94.3, 94.1, and 89.7 %. Furthermore, per-sample classification performance was enhanced with a binary F1-score of 95.9 % for defect presence and a multi-label accuracy of 93.9 % for defect types. This study clearly demonstrates that synthetic datasets generated using generative adversarial networks can substantially enhance both defect type classification and severity estimation in semantic segmentation-based apple defect identification models.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112679"},"PeriodicalIF":8.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145334912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信