{"title":"Adaptive multimodal prompt for human-object interaction with local feature enhanced transformer","authors":"Kejun Xue, Yongbin Gao, Zhijun Fang, Xiaoyan Jiang, Wenjun Yu, Mingxuan Chen, Chenmou Wu","doi":"10.1007/s10489-024-05774-7","DOIUrl":"10.1007/s10489-024-05774-7","url":null,"abstract":"<div><p>Human-object interaction (HOI) detection is an important computer vision task for recognizing the interaction between humans and surrounding objects in an image or video. The HOI datasets have a serious long-tailed data distribution problem because it is challenging to have a dataset that contains all potential interactions. Many HOI detectors have addressed this issue by utilizing visual-language models. However, due to the calculation mechanism of the Transformer, the visual-language model is not good at extracting the local features of input samples. Therefore, we propose a novel local feature enhanced Transformer to motivate encoders to extract multi-modal features that contain more information. Moreover, it is worth noting that the application of prompt learning in HOI detection is still in preliminary stages. Consequently, we propose a multi-modal adaptive prompt module, which uses an adaptive learning strategy to facilitate the interaction of language and visual prompts. In the HICO-DET and SWIG-HOI datasets, the proposed model achieves full interaction with 24.21% mAP and 14.29% mAP, respectively. Our code is available at https://github.com/small-code-cat/AMP-HOI.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12492 - 12504"},"PeriodicalIF":3.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William C. Sleeman IV, Martha Roseberry, Preetam Ghosh, Alberto Cano, Bartosz Krawczyk
{"title":"Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms","authors":"William C. Sleeman IV, Martha Roseberry, Preetam Ghosh, Alberto Cano, Bartosz Krawczyk","doi":"10.1007/s10489-024-05763-w","DOIUrl":"10.1007/s10489-024-05763-w","url":null,"abstract":"<div><p>In the era of big data, it is necessary to provide novel and efficient platforms for training machine learning models over large volumes of data. The MapReduce approach and its Apache Spark implementation are among the most popular methods that provide high-performance computing for classification algorithms. However, they require dedicated implementations that will take advantage of such architectures. Additionally, many real-world big data problems are plagued by class imbalance, posing challenges to the classifier training step. Existing solutions for alleviating skewed distributions do not work well in the MapReduce environment. In this paper, we propose a novel KD-tree based classifier, together with a variation of the SMOTE algorithm dedicated to the Spark platform. Our algorithms offer excellent predictive power and can work simultaneously with binary and multi-class imbalanced data. Exhaustive experiments conducted using the Amazon Web Service platform showcase the high efficiency and flexibility of our proposed algorithms.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12558 - 12575"},"PeriodicalIF":3.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meihang Zhang, Hua Zhang, Wei Yan, Lin Zhang, Zhigang Jiang
{"title":"Multi-objective optimization enabling CFRP energy-efficient milling based on deep reinforcement learning","authors":"Meihang Zhang, Hua Zhang, Wei Yan, Lin Zhang, Zhigang Jiang","doi":"10.1007/s10489-024-05800-8","DOIUrl":"10.1007/s10489-024-05800-8","url":null,"abstract":"<div><p>The expanding application of Carbon Fiber Reinforced Polymer (CFRP) in industries is drawing increasing attention to energy efficiency improvement and cost reducing during the secondary processing, particularly in milling. Machining parameter optimization is a practical and economical way to achieve this goal. However, the unclear milling mechanism and dynamic machining conditions of CFRP make it challenging. To fill this gap, this paper proposes a DRL-based approach that integrates physics-guided Transformer networks with Twin Delayed Deep Deterministic Policy Gradient (PGTTD3) to optimize CFRP milling parameters with multi-objectives. Firstly, a PG-Transformer-based CFRP milling energy consumption model is proposed, which modifies the existing De-stationary Attention module by integrating external physical variables to enhance modeling accuracy and efficiency. Secondly, a multi-objective optimization model considering energy consumption, milling time and machining cost for CFRP milling is formulated and mapped to a Markov Decision Process, and a reward function is designed. Thirdly, a PGTTD3 approach is proposed for dynamic parameter decision-making, incorporating a time difference strategy to enhance agent training stability and online adjustment reliability. The experimental results show that the proposed method reduces energy consumption, milling time and machining cost by 10.98%, 3.012%, and 14.56% in CFRP milling respectively, compared to the actual averages. The proposed algorithm exhibits excellent performance metrics when compared to state-of-the-art optimization algorithms, with an average improvement in optimization efficiency of over 20% and a maximum enhancement of 88.66%.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12531 - 12557"},"PeriodicalIF":3.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GaitLRDF: gait recognition via local relevant feature representation and discriminative feature learning","authors":"Xiaoying Pan, Hewei Xie, Nijuan Zhang, Shoukun Li","doi":"10.1007/s10489-024-05837-9","DOIUrl":"10.1007/s10489-024-05837-9","url":null,"abstract":"<div><p>As an emerging biometric recognition technology, gait recognition has the advantages of non-contact long distance and difficult to imitate. Existing gait recognition methods perform gait recognition by using features extracted from the overall appearance or local regions of humans. However, the detailed features extracted by current gait recognition methods based on human local region lose the overall relevance of the image and the edge information of human local region. Secondly, the method based on the local area of the human body does not focus on the local parts of the human body that are less affected by clothing occlusion. To solve the above problems, this paper proposes a new gait recognition network framework GaitLRDF, which improves the accuracy and robustness of gait recognition by Local Relation Convolutional layers (LRConv) and Human Body Focusing module(HBF). LRConv can simultaneously use the global and local information of the human body, and the local detail features extracted in the module can retain the edge information of the human body. HBF can focuse on the gait parts that are less affected by clothing occlusion, and obtain more discriminative gait detail features. The experimental results show that in the three gait environments of NM, BG and CL set by CASIA-B dataset, GaitLRDF is 0.40%, 0.10% and 1.10% higher than the current most advanced method respectively. The recognition accuracy on OU-MVLP dataset reaches 91.40%.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12476 - 12491"},"PeriodicalIF":3.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tri-channel visualised malicious code classification based on improved ResNet","authors":"Sicong Li, Jian Wang, Yafei Song, Shuo Wang","doi":"10.1007/s10489-024-05707-4","DOIUrl":"10.1007/s10489-024-05707-4","url":null,"abstract":"<div><p>As malicious code attacks continue to evolve, attackers leverage techniques like packing and code obfuscation to generate numerous variants, challenging traditional detection methods. Addressing the limitations of current deep learning-based malicious code classification approaches in feature extraction and accuracy, this paper introduces an innovative RGB visualization detection method based on a hybrid multi-head attention mechanism. Initially, a feature representation method utilizing RGB images is introduced. This approach focuses on semantic relationships between a malware’s binary information, assembly details, and API data, generating images with richer textural information. This technique effectively uncovers the deep dependencies between the original and variant versions of malicious code, providing stronger support for subsequent classification tasks. Furthermore, to tackle the issues of malware encryption and obfuscation, a deep neural network framework is adopted, incorporating a modular design philosophy and integrating a multi-head attention mechanism. This design not only enhances the expressiveness of critical features but also helps the model better focus on key aspects of the malicious code, thereby improving classification accuracy. Through comparative experiments and in-depth analysis, the effectiveness and superiority of the proposed RGB visualization method and MSA-ResNet model in the field of malicious code variant classification are validated. The accuracy rates achieved on the Kaggle and DataCon datasets are 99.49% and 97.70%, respectively, representing significant improvements over other methods. This approach demonstrates strong generalization capabilities and resistance to obfuscation, offering a new and effective tool for malicious code detection.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12453 - 12475"},"PeriodicalIF":3.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-024-05707-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weakly supervised point cloud semantic segmentation based on scene consistency","authors":"Yingchun Niu, Jianqin Yin, Chao Qi, Liang Geng","doi":"10.1007/s10489-024-05822-2","DOIUrl":"10.1007/s10489-024-05822-2","url":null,"abstract":"<div><p>Weakly supervised point cloud segmentation has garnered considerable interest recently, primarily due to its ability to diminish labor-intensive manual labeling costs. The effectiveness of such methods hinges on their ability to augment the supervision signals available for training implicitly. However, we found that most approaches tend to be implemented through complex modeling, which is not conducive to deployment and implementation in resource-poor scenarios. Our study introduces a novel scene consistency modeling approach that significantly enhances weakly supervised point cloud segmentation in this context. By synergistically modeling both complete and incomplete scenes, our method can improve the quality of the supervision signal and save more resources and ease of deployment in practical applications. To achieve this, we first generate the corresponding incomplete scene for the whole scene using windowing techniques. Next, we input the complete and incomplete scenes into a network encoder and obtain prediction results for each scene through two decoders. We enforce semantic consistency between the labeled and unlabeled data in the two scenes by employing cross-entropy and KL loss. This consistent modeling method enables the network to focus more on the same areas in both scenes, capturing local details and effectively increasing the supervision signals. One of the advantages of the proposed method is its simplicity and cost-effectiveness. Because we rely solely on variance and KL loss to model scene consistency, resulting in straightforward computations. Our experimental evaluations on S3DIS, ScanNet, and Semantic3D datasets provide further evidence that our method can effectively leverage sparsely labeled data and abundant unlabeled data to enhance supervision signals and improve the overall model performance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12439 - 12452"},"PeriodicalIF":3.4,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weiqiang Xu, Yang Liu, Wenjie Liu, Huakang Li, Guozi Sun
{"title":"HDML: hybrid data-driven multi-task learning for China’s stock price forecast","authors":"Weiqiang Xu, Yang Liu, Wenjie Liu, Huakang Li, Guozi Sun","doi":"10.1007/s10489-024-05838-8","DOIUrl":"10.1007/s10489-024-05838-8","url":null,"abstract":"<div><p>Recent years have witnessed the rapid development of the China’s stock market, but investment risks have also emerged. Stock price is always unstable and non-linear, affected not only by historical transaction data but also by national policies, news, and other data. Stock price and textual data are beginning to be employed in the prediction process. However, the challenge lies in effectively integrating feature information derived from stock price and textual information. To address the problem, in this paper, this paper proposes a <b>H</b>ybrid <b>D</b>ata-driven <b>M</b>ulti-task <b>L</b>earning(<b>HDML</b>) framework to predict stock price. HDML adopts hybrid data as model input, mining the transaction and capital flow data information in the stock market and considering the impact of investors’ emotions on the stock market. In addition, we incorporate multi-task learning, which predicts the closing price range of stock based on structured data and then corrects the prediction results through investors’ comment text data. HDML effectively captures the relationship between different modal data through multi-task learning and achieve improvements on both tasks. The experimental results show that compared with previous work, HDML reduces the RMSE of the evaluation set by 12.14% and improves the F1 score by an average of 13.64% at the same time. Moreover, value at risk (VaR), together with the HDML model, can help investors weigh the potential gains against the associated risks.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12420 - 12438"},"PeriodicalIF":3.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isabel Fernández, Javier Puente, Borja Ponte, Alberto Gómez
{"title":"Integration of AHP and fuzzy inference systems for empowering transformative journeys in organizations: Assessing the implementation of Industry 4.0 in SMEs","authors":"Isabel Fernández, Javier Puente, Borja Ponte, Alberto Gómez","doi":"10.1007/s10489-024-05816-0","DOIUrl":"10.1007/s10489-024-05816-0","url":null,"abstract":"<div><p>The combined use of the Analytical Hierarchy Process (AHP) and Fuzzy Inference Systems (FISs) can significantly enhance the effectiveness of transformative projects in organizations by better managing their complexities and uncertainties. This work develops a novel multicriteria model that integrates both methodologies to assist organizations in these projects. To demonstrate the value of the proposed approach, we present an illustrative example focused on the implementation of Industry 4.0 in SMEs. First, through a review of relevant literature, we identify the key barriers to improving SMEs' capability to implement Industry 4.0 effectively. Subsequently, the AHP, enhanced through Dong and Saaty’s methodology, establishes a consensus-based assessment of the importance of these barriers, using the judgments of five experts. Next, a FIS is utilized, with rule bases automatically derived from the preceding weights, eliminating the need for another round of expert input. This paper shows and discusses how SMEs can use this model to self-assess their adaptability to the Industry 4.0 landscape and formulate improvement strategies to achieve deeper alignment with this transformative paradigm.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12357 - 12377"},"PeriodicalIF":3.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-024-05816-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Affinity adaptive sparse subspace clustering via constrained Laplacian rank","authors":"Ting Yang, Shuisheng Zhou, Zhuan Zhang","doi":"10.1007/s10489-024-05812-4","DOIUrl":"10.1007/s10489-024-05812-4","url":null,"abstract":"<div><p>Subspace clustering typically clusters data by performing spectral clustering to an affinity matrix constructed in some deterministic ways of self-representation coefficient matrix. Therefore, the quality of the affinity matrix is vital to their performance. However, traditional deterministic ways only provide a feasible affinity matrix but not the most suitable one for showing data structures. Besides, post-processing commonly on the coefficient matrix also affects the affinity matrix’s quality. Furthermore, constructing the affinity matrix is separate from optimizing the coefficient matrix and performing spectral clustering, which can not guarantee the optimal overall result. To this end, we propose a new method, affinity adaptive sparse subspace clustering (AASSC), by adding Laplacian rank constraint into a subspace sparse-representation model to adaptively learn a high-quality affinity matrix having accurate <i>p</i>-connected components from a sparse coefficient matrix without post-processing, where <i>p</i> represents categories. In addition, by relaxing the Laplacian rank constraint into a trace minimization, AASSC naturally combines the operations of the coefficient matrix, affinity matrix, and spectral clustering into a unified optimization, guaranteeing the overall optimal result. Extensive experimental results verify the proposed method to be effective and superior.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12378 - 12390"},"PeriodicalIF":3.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco J. Gil-Gala, Marko Đurasević, Domagoj Jakobović
{"title":"Evolving routing policies for electric vehicles by means of genetic programming","authors":"Francisco J. Gil-Gala, Marko Đurasević, Domagoj Jakobović","doi":"10.1007/s10489-024-05803-5","DOIUrl":"10.1007/s10489-024-05803-5","url":null,"abstract":"<div><p>In recent years, the growing interest in environmental sustainability has led to Electric Vehicle Routing Problems (EVRPs) attracting more and more attention. EVRPs involve the use of electric vehicles, which have additional constraints, such as range and recharging time, compared to conventional Vehicle Routing Problems (VRPs). The complexity and dynamic nature of solving VRPs often lead to the introduction of Routing Policies (RPs), simple heuristics that incrementally build routes. However, manually designing efficient RPs proves to be a challenging and time-consuming task. Therefore, there is a pressing need to explore the application of hyper-heuristics, in particular Genetic Programming (GP), to automatically generate new RPs. Since this method has not yet been investigated in the literature in the context of EVRPs, this study explores the applicability of GP to automatically generate new RPs for EVRP. To this end, three RP variants (serial, semiparallel, and parallel) are introduced in this study, along with a set of domain-specific terminal nodes to optimise three criteria: the number of vehicles, energy consumption, and total tardiness. The experimental analysis shows that the serial variant performs best in terms of energy consumption and number of vehicles, while the parallel variant is most effective in minimising the total tardiness. A comprehensive analysis of the proposed method is conducted to determine its convergence properties and the impact of the proposed terminal nodes on performance and to describe several generated RPs. The results show that the automatically generated RPs perform commendably compared to traditional methods such as metaheuristics and exact methods, which usually require significantly more runtime. More specifically, depending on the scenario in which they are used, the generated RPs achieve results that are about 20%-37% worse compared to the best known results for the number of vehicles in almost negligible time, in just some milliseconds.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 23","pages":"12391 - 12419"},"PeriodicalIF":3.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-024-05803-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}