Expert Systems with Applications最新文献

筛选
英文 中文
An iterated adaptive large neighborhood search algorithm for the large-scale communication satellite range scheduling problem
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-30 DOI: 10.1016/j.eswa.2025.127377
Zhehan Liu , Jinming Liu , Xiaolu Liu, Jungang Yan, Yuqing Cheng, Yingwu Chen
{"title":"An iterated adaptive large neighborhood search algorithm for the large-scale communication satellite range scheduling problem","authors":"Zhehan Liu ,&nbsp;Jinming Liu ,&nbsp;Xiaolu Liu,&nbsp;Jungang Yan,&nbsp;Yuqing Cheng,&nbsp;Yingwu Chen","doi":"10.1016/j.eswa.2025.127377","DOIUrl":"10.1016/j.eswa.2025.127377","url":null,"abstract":"<div><div>The communication satellite range scheduling problem (CSRSP) is indispensable for the regular operation of the low earth orbit internet constellation, which involves scheduling tracking telemetry and command (TT&amp;C) tasks within their executable arcs to maximize the profit from these scheduled tasks. Different from traditional SRSP, the inter-satellite links are taken into account in CSRSP to facilitate the rapid completion of TT&amp;C tasks. Moreover, the increasing number of satellites and the emergence of associated diverse types of TT&amp;C tasks further escalate the complexity of this problem. Thus, we propose an iterated adaptive large neighborhood search algorithm (IALNS) to solve the CSRSP quickly and straightforwardly. In this algorithm, ALNS is employed to refine heuristic initial solutions. Frequent pattern mining, a popular data mining method, is used to guide the algorithmic search process as iterative mechanisms: on the one hand, the inferior structures in low-quality solutions are mined to significantly assist the ALNS removal process. On the other hand, the superior structures in high-quality solutions are identified to guide the construction of new solutions. Experimental tests with different task scales demonstrate that IALNS effectively deals with the CSRSP, outperforming three state-of-the-art algorithms.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"278 ","pages":"Article 127377"},"PeriodicalIF":7.5,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved DDPG based on enhancing decision evaluation for path planning in high-density environments
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-30 DOI: 10.1016/j.eswa.2025.127378
Junxiao Xue , Mengyang He , Jinpu Chen , Bowei Dong , Yuanxun Zheng
{"title":"Improved DDPG based on enhancing decision evaluation for path planning in high-density environments","authors":"Junxiao Xue ,&nbsp;Mengyang He ,&nbsp;Jinpu Chen ,&nbsp;Bowei Dong ,&nbsp;Yuanxun Zheng","doi":"10.1016/j.eswa.2025.127378","DOIUrl":"10.1016/j.eswa.2025.127378","url":null,"abstract":"<div><div>Path planning, a crucial task in mobile device intelligent navigation, faces two key challenges: path optimization and obstacle avoidance. Traditional methods struggle in dynamic scenarios with dense obstacles. To address this, we propose a path planning method for high-density scenarios, leveraging improved Deep Deterministic Policy Gradient (DDPG), a reinforcement learning method, based on enhancing decision evaluation. Our innovation revolves around crafting decision evaluation models that reinforce reward mechanisms to facilitate precise appraisal of each choice made by the agent about the environment. Firstly, we utilize hypothetical trajectory calculations from historical data to expedite learning efficacy by identifying more promising decisions, optimizing obstacle avoidance strategies. Secondly, a strategic optimization of the agent’s observation range enhances the focal point on local obstacles, enabling the agent to make more informed decisions that contribute to shortening the overall path length. Furthermore, leveraging observations of the target, we optimize path smoothness and achieve more effective target localization and superior navigation trajectories in real-time scenarios. Ultimately, experimental results demonstrate that our method attains higher success rate and better path compared to analogous algorithms, showcasing superior compatibility in scenarios of varying complexity.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127378"},"PeriodicalIF":7.5,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep neighbor-coherence hashing with discriminative sample mining for supervised cross-modal retrieval
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-30 DOI: 10.1016/j.eswa.2025.127365
Congcong Zhu , Qibing Qin , Wenfeng Zhang , Lei Huang
{"title":"Deep neighbor-coherence hashing with discriminative sample mining for supervised cross-modal retrieval","authors":"Congcong Zhu ,&nbsp;Qibing Qin ,&nbsp;Wenfeng Zhang ,&nbsp;Lei Huang","doi":"10.1016/j.eswa.2025.127365","DOIUrl":"10.1016/j.eswa.2025.127365","url":null,"abstract":"<div><div>Deep supervised cross-modal hashing has attracted extensive attention because of its low cost and high retrieval efficiency. Although the existing deep supervised cross-modal hashing methods have made great progress, they still suffer from two factors in the preservation of semantic relations between heterogeneous modalities. (1) Most of the available deep supervised cross-modal hashing learn hash functions by employing either pair-wise/multi-wise loss to explore the point-to-point relation or class center loss to explore the point-to-class relation, ignoring collaborative semantic relations. (2) Compared with the large proportion of simple samples, the hard pairs with a small proportion could provide more valuable information for the model training, nevertheless, most deep hash treats all samples equally in the learning process, and overlooks the positive contribution of hard samples in the learning process, impeding the hash function learning. To address these challenges, by considering both point-to-point and point-to-class relations, the novel Deep Neighbor-coherence Hashing (DNcH) framework is proposed to preserve the consistency of neighbor relations and generate high-quality binary codes with intra-class compactness and inter-class separability. Specifically, by jointly exploring the point-to-point and point-to-class relations between heterogeneous data, the neighbor-aware constraint is proposed to project the heterogeneous data into a unified Hamming space, where each anchor is close to all similar samples and corresponding class center, and far away from dissimilar samples and their class centers. The hard pairs containing valuable information are effectively mined by introducing the multi-similarity measurement strategy between heterogeneous modalities to construct the informative and representative training batches. Besides, to further gradually capture discriminant information from multi-modal hard pairs, a self-paced learning mechanism is introduced to assign dynamic weights to multi-modal pairs, which enables the deep cross-modal hashing to gradually concentrate on hard pairs while jointly learning universal patterns from the entire set of multi-modal pairs. Extensive experiments on three benchmark datasets show that our DNcH framework has better performance than the most advanced cross-modal hashing methods. The source code for the DNcH framework is available at <span><span>https://github.com/QinLab-WFU/DNcH</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127365"},"PeriodicalIF":7.5,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143747011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exact and heuristic algorithms for team orienteering problem with fuzzy travel times
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-30 DOI: 10.1016/j.eswa.2025.127369
Xinrui Liu , Xiaojuan Jiang , Xinggang Luo , Zhongliang Zhang , Pengli Ji
{"title":"Exact and heuristic algorithms for team orienteering problem with fuzzy travel times","authors":"Xinrui Liu ,&nbsp;Xiaojuan Jiang ,&nbsp;Xinggang Luo ,&nbsp;Zhongliang Zhang ,&nbsp;Pengli Ji","doi":"10.1016/j.eswa.2025.127369","DOIUrl":"10.1016/j.eswa.2025.127369","url":null,"abstract":"<div><div>The Team Orienteering Problem (TOP) is a combinatorial optimization challenge that aims to determine a set of routes to maximize the total collected profit. In real-world scenarios, uncertainties in customer travel times frequently arise due to various factors such as weather conditions, traffic congestion, and peak hours. This study addresses the uncertainty by modeling travel time with trapezoidal fuzzy variables, and subsequently proposes a new variant of TOP, which is referred to as the Team Orienteering Problem with Fuzzy Travel Time. To solve this problem, a chance-constrained programming model is developed, and two solution approaches are proposed: a Branch-and-Price (B&amp;P) exact algorithm and a Hybrid Adaptive Large Neighborhood Search (HALNS) heuristic algorithm. Numerical experiments are conducted to evaluate the performance of both algorithms. The results demonstrate the effectiveness of the B&amp;P algorithm based on its capability in optimally solving most instances with a maximum computational time of 120 min. Moreover, the HALNS algorithm shows to be highly efficient, solving all instances within a short running time while maintaining only minimal profit gaps compared to the B&amp;P algorithm.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"278 ","pages":"Article 127369"},"PeriodicalIF":7.5,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does user interest matter? Exploring the impact of ignoring user interests in recommendations
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-29 DOI: 10.1016/j.eswa.2025.127373
Lijia Chen , Chang Sun , Yan-Li Lee , Qingsong Pu , Xinru Chen , Jia Liu , Yajun Du , Wen-Bo Xie
{"title":"Does user interest matter? Exploring the impact of ignoring user interests in recommendations","authors":"Lijia Chen ,&nbsp;Chang Sun ,&nbsp;Yan-Li Lee ,&nbsp;Qingsong Pu ,&nbsp;Xinru Chen ,&nbsp;Jia Liu ,&nbsp;Yajun Du ,&nbsp;Wen-Bo Xie","doi":"10.1016/j.eswa.2025.127373","DOIUrl":"10.1016/j.eswa.2025.127373","url":null,"abstract":"<div><div>User interests have long been a critical factor in recommender systems, serving as a key criterion for recommendations. Existing interest-based recommendation prioritize matching items to users’ interests, often overlooking the importance of the intrinsic characteristics of items. This can lead to recommendations that, while aligned with user interests, fail to address the user’s specific preferences for item intrinsic characteristics, reducing satisfaction and trust in the system. In this paper, we propose an interest disentangling recommendation algorithm (IDG). During the model-training phase, user interactions with items are disentangled into user preference for items’ intrinsic characteristics and interest groups associated with those items. During the prediction phase, downplay the user’s preferences for items’ interest groups and focus more on their preferences for the items’ intrinsic characteristics. Extensive experiments show that, on average, IDG outperforms the ten baselines by 15.9%, 34.2%, 25% and 32.8% in terms of HR@20, NDCG@20, PRE@20, and ILS@20, respectively, across three real datasets. Further experiments show that items recommended by the IDG algorithm are more concentrated within the same interest groups. However, IDG effectively enhances the diversity of items within the recommendation list, quantified by item similarity.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127373"},"PeriodicalIF":7.5,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143747546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyperspectral anomaly detection via Cascaded convolutional autoencoders with adaptive pixel-level attention
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-29 DOI: 10.1016/j.eswa.2025.127366
Zhe Zhao , Jiangluqi Song , Mingtao You , Pei Xiang , Dong Zhao , Jiajia Zhang , Huixin Zhou , Dabao Wang , Xiaofang Wang
{"title":"Hyperspectral anomaly detection via Cascaded convolutional autoencoders with adaptive pixel-level attention","authors":"Zhe Zhao ,&nbsp;Jiangluqi Song ,&nbsp;Mingtao You ,&nbsp;Pei Xiang ,&nbsp;Dong Zhao ,&nbsp;Jiajia Zhang ,&nbsp;Huixin Zhou ,&nbsp;Dabao Wang ,&nbsp;Xiaofang Wang","doi":"10.1016/j.eswa.2025.127366","DOIUrl":"10.1016/j.eswa.2025.127366","url":null,"abstract":"<div><div>In recent years, autoencoder (AE)-based hyperspectral anomaly detection (HAD) methods have been receiving much attention, the residual between original hyperspectral image (HSI) and the reconstructed image is used to identify anomalies. However, due to the generalization ability of AEs, abnormal pixels can also be reconstructed well. Conversely, the variation in background spectra can lead to poor reconstruction of complex background samples. As a result, most reconstruction error-based methods tend to generate high false alarm rate or suffer from missed detection. To solve these problems, this paper proposed two Cascaded AEs with adaptive Pixel-level attention module (CAPNet) for HAD. The proposed CAPNet extracts the common features of the same materials through cascaded AEs, thus alleviating the differences caused by spectral variation. Specifically, the input HSI is passed into the first AE to extract shallow features, and then the first reconstructed HSI is fed into the second AE while the abnormal information is retained by skip connections. Through this structure, CAPNet can effectively extract discriminative high-level semantic features, and enhance the differences between anomalies and background. In addition, to reduce the redundancy of feature fusion and enhance the representation of background, multiple adaptive pixel-level attention modules are embedded into the decoder to guide the decoding process, so that the reconstructed data has a greater difference between anomalies and background. Finally, we use Mahalanobis distance instead of reconstruction errors for anomaly detection on the second reconstructed HSI. In the experiments, including comparison and ablation studies on five datasets, demonstrate the effectiveness and competitiveness of the proposed CAPNet.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127366"},"PeriodicalIF":7.5,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EHW-Font: A handwriting enhancement approach mimicking human writing processes
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-29 DOI: 10.1016/j.eswa.2025.127278
Lei Wang , Cunrui Wang , Yu Liu
{"title":"EHW-Font: A handwriting enhancement approach mimicking human writing processes","authors":"Lei Wang ,&nbsp;Cunrui Wang ,&nbsp;Yu Liu","doi":"10.1016/j.eswa.2025.127278","DOIUrl":"10.1016/j.eswa.2025.127278","url":null,"abstract":"<div><div>Balancing personalized style mimicry and legibility in handwritten font generation is particularly challenging for complex, multi-stroke characters like Chinese. Most existing approaches rely on a single modality – either pixel-based or sequence-based modeling – and employ random style reference selection during training, which often undermines both readability and stylistic consistency. In this paper, we introduce EHW-Font, a novel dual-modal framework that refines handwritten font generation by replicating the user’s writing style and process. Our approach fully exploits component-level, fine-grained style information from content and style characters. It employs a dual-modal fusion strategy to adaptively integrate the global visual features from handwritten stroke images with the dynamic process captured by stroke sequences. To mitigate style redundancy, we propose a quantization strategy that represents the style feature vector as the Cartesian product of one-dimensional variable sets, compressing redundant features while preserving essential stylistic details. Experiments show that our approach exhibits the best performance in qualitative, quantitative, and user studies. Moreover, our method is an equally effective means of data augmentation.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"278 ","pages":"Article 127278"},"PeriodicalIF":7.5,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ME-WARD: A multimodal ergonomic analysis tool for musculoskeletal risk assessment from inertial and video data in working places
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-29 DOI: 10.1016/j.eswa.2025.127212
Javier González-Alonso , Paula Martín-Tapia, David González-Ortega , Míriam Antón-Rodríguez , Francisco Javier Díaz-Pernas , Mario Martínez-Zarzuela
{"title":"ME-WARD: A multimodal ergonomic analysis tool for musculoskeletal risk assessment from inertial and video data in working places","authors":"Javier González-Alonso ,&nbsp;Paula Martín-Tapia,&nbsp;David González-Ortega ,&nbsp;Míriam Antón-Rodríguez ,&nbsp;Francisco Javier Díaz-Pernas ,&nbsp;Mario Martínez-Zarzuela","doi":"10.1016/j.eswa.2025.127212","DOIUrl":"10.1016/j.eswa.2025.127212","url":null,"abstract":"<div><div>This study presents ME-WARD (<em>Multimodal Ergonomic Workplace Assessment and Risk from Data</em>), a novel system for ergonomic assessment and musculoskeletal risk evaluation that implements the Rapid Upper Limb Assessment (RULA) method. ME-WARD is designed to process joint angle data from motion capture systems, including inertial measurement unit (IMU)-based setups, and deep learning human body pose tracking models. The tool’s flexibility enables ergonomic risk assessment using any system capable of reliably measuring joint angles, extending the applicability of RULA beyond proprietary setups. To validate its performance, the tool was tested in an industrial setting during the assembly of conveyor belts, which involved high-risk tasks such as inserting rods and pushing conveyor belt components. The experiments leveraged gold standard IMU systems alongside a state-of-the-art monocular 3D pose estimation system. The results confirmed that ME-WARD produces reliable RULA scores that closely align with IMU-derived metrics for flexion-dominated movements and comparable performance with the monocular system, despite limitations in tracking lateral and rotational motions. This work highlights the potential of integrating multiple motion capture technologies into a unified and accessible ergonomic assessment pipeline. By supporting diverse input sources, including low-cost video-based systems, the proposed multimodal approach offers a scalable, cost-effective solution for ergonomic assessments, paving the way for broader adoption in resource-constrained industrial environments.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"278 ","pages":"Article 127212"},"PeriodicalIF":7.5,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143724874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal depression recognition based on gait and rating scale
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-29 DOI: 10.1016/j.eswa.2025.127285
Xiaotong Liu, Min Ren, Xuecai Hu, Qiong Li, Yongzhen Huang
{"title":"Multimodal depression recognition based on gait and rating scale","authors":"Xiaotong Liu,&nbsp;Min Ren,&nbsp;Xuecai Hu,&nbsp;Qiong Li,&nbsp;Yongzhen Huang","doi":"10.1016/j.eswa.2025.127285","DOIUrl":"10.1016/j.eswa.2025.127285","url":null,"abstract":"<div><div>Recently, depression recognition has garnered significant attention. Given its ease of acquisition from a distance, gait-based depression analysis emerges as a valuable tool for assisting in the diagnosis and assessment of depression. However, current research on gait-based depression recognition often uses scale results as labels but neglects the rich semantic information within the scales, which reflects the emotional, lifestyle, and physical states of participants and provides more personalized depression characteristics. To enhance the reliability and accuracy of depression analysis, we propose a text-guided depression recognition method based on gait. Firstly, we utilize silhouette-based modeling for depression recognition to capture relevant gait features. Secondly, we design the GT-CLIP module to leverage text information from scales as an auxiliary branch to guide feature learning within the gait recognition framework, enabling the model to effectively extract corresponding gait features based on these depression-related text information. Then, we devise a text-guided attention mechanism to capture variations across different body parts. In the D-Gait dataset, which includes 92 depressed subjects and 200 normal controls, our proposed text-guided depression recognition model achieves an F1-score of 59.85, outperforming existing state-of-the-art methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"278 ","pages":"Article 127285"},"PeriodicalIF":7.5,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
I3D-AE-LSTM: Combining action representations using a 2-stream autoencoder for Action Quality Assessment
IF 7.5 1区 计算机科学
Expert Systems with Applications Pub Date : 2025-03-29 DOI: 10.1016/j.eswa.2025.127368
Tevin Moodley, Dustin van der Haar
{"title":"I3D-AE-LSTM: Combining action representations using a 2-stream autoencoder for Action Quality Assessment","authors":"Tevin Moodley,&nbsp;Dustin van der Haar","doi":"10.1016/j.eswa.2025.127368","DOIUrl":"10.1016/j.eswa.2025.127368","url":null,"abstract":"<div><div>Systems dedicated to the Action Quality Assessment (AQA) have seen a notable surge in interest from both scholars and industry experts. Such systems seek to provide an objective measure of the quality of athletes’ physical movements, offering perspectives that were once exclusive to skilled human evaluators. Our research builds upon the observation that previous studies have primarily considered spatio-temporal features and pose estimation keypoints as distinct elements within action analysis. We introduce an innovative two-stream methodology that merges these representations of an action. Our approach leverages the combined capabilities of various techniques: utilising Inflated 3D ConvNet (I3D) for the extraction of spatial–temporal characteristics from video content, employing OpenPose for detailed pose estimation keypoints that furnish an Autoencoder (AE) with intricate information to concentrate on key aspects of action, and employing Long Short Term Memory (LSTM) networks. The integration of the proposed two-stream methodology allows for a more comprehensive analysis of athletic movements. By capturing both spatial and temporal dynamics together, we can better understand the nuances of an action. Traditional methods often treat these aspects separately, leading to a fragmented understanding. In addition, incorporating detailed pose estimation through OpenPose allows the Autoencoder to focus on key aspects of the action. This targeted focus ensures that the evaluation is not only about how movements look in a general sense but also how they align with optimal performance criteria. By combining the representations, we enhance the model’s ability to recognise and evaluate complex movements more accurately. Furthermore, we propose a new multi-variate scoring system designed to assess action quality based on the scores from individual judges. Multi-variate scoring introduces a richer dataset for assessing action quality, enabling more sophisticated analyses and decision-making. Multi-variate scoring can better highlight discrepancies or consensus among judges, leading to more reliable assessments. The method has an average Spearman Rank Correlation of 97.35%, which outperforms current state-of-the-art methods and underscores the effectiveness of merging spatio-temporal and pose estimation keypoints into a unified action representation. Finally, training the model to predict the scores assigned by each judge reveals additional advantages. In sports involving multiple scoring criteria, the proposed approach enables the extraction of more detailed insights, marking a significant advancement that allows for more precise and detailed predictions of scores.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"278 ","pages":"Article 127368"},"PeriodicalIF":7.5,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信