{"title":"Robust and explainable multi-objective dynamic scheduling of an ANFIS-driven PSO using IT2FS","authors":"Yu-Cheng Wang","doi":"10.1016/j.eswa.2026.131553","DOIUrl":"10.1016/j.eswa.2026.131553","url":null,"abstract":"<div><div>Manufacturing scheduling increasingly operates under dynamic conditions where processing times and machine availability are uncertain and subject to frequent disruptions. While particle swarm optimization (PSO) performs strongly in flexible job shop scheduling, solutions often rely on deterministic assumptions and provide limited transparency that restricts reliability and adoption in real shop-floor environments. This paper presents a fuzzy-neural explainable PSO (FNE-PSO) framework for multi-objective dynamic flexible job shop scheduling (DFJSP). The framework integrates interval type-2 fuzzy sets (IT2FS) to model dual uncertainty in processing times and machine availability, an adaptive neuro-fuzzy inference system (ANFIS) to regulate PSO parameters based on swarm-state indicators, and a hybrid SHAP decision-tree explanation module to interpret particle movement and search behavior. It also introduces dynamic contribution radar (DCR) visualization to support understanding multi-objective trade-offs and solution evolution. Experiments on synthetic dynamic scenarios and adapted benchmark instances demonstrate that the proposed approach achieves more robust scheduling performance than representative baselines, particularly in terms of makespan and tardiness stability under uncertainty. Beyond optimization results, evaluation of explainability indicate that the proposed framework enhances transparency and interpretability of scheduling decisions, supporting more trustworthy and accountable industrial decision-making.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131553"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kailiang Ye , Zheng Lu , Linlin Shen , Tianxiang Cui
{"title":"Rewarding fine-grained image captioning with keyword group contrastive","authors":"Kailiang Ye , Zheng Lu , Linlin Shen , Tianxiang Cui","doi":"10.1016/j.eswa.2026.131405","DOIUrl":"10.1016/j.eswa.2026.131405","url":null,"abstract":"<div><div>Fine-grained image captioning aims to automatically generate a description with detailed information from a given image. The task poses significant challenges, as it requires image captioning models to accurately capture fine-grained details, effectively differentiate between visually similar yet distinct elements within an image, and generate detailed captions that comprehensively describe the image content. In this paper, we propose a novel framework for fine-grained image captioning that combines reinforcement learning and contrastive learning with specifically designed loss and rewards. Specifically, three image captioning objectives are devised: 1) a novel Keyword Group Contrastive loss for token representation learning by leveraging different groups of keywords matched by visual information; 2) a CLIP Contrastive reward encouraging the generated caption to be more similar to its input image and dissimilar to the other images; 3) a Fine-grained Grammar reward using the grammar ELECTRA discriminator for high-quality caption generation with good grammar. We evaluate the performance of our framework on the FineCapEval benchmark dataset and show that it significantly outperforms the existing state-of-the-art methods in terms of describing fine-grained information from its input images.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131405"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The importance of morphology-aware subword tokenization for NLP tasks in Slovak language modeling","authors":"Dávid Držík , Jozef Kapusta","doi":"10.1016/j.eswa.2026.131492","DOIUrl":"10.1016/j.eswa.2026.131492","url":null,"abstract":"<div><div>To effectively train large language models (LLMs) for morphologically rich and low-resource languages such as Slovak, high-quality tokenization is essential. Traditional approaches like Byte-Pair Encoding (BPE) overlook linguistic structure, often fragmenting root morphemes and causing semantic loss. This study examines whether morphology-aware tokenization can improve model performance across various NLP tasks. We introduce the SlovaK Morphological Tokenizer (SKMT), which incorporates root morpheme information into the tokenization process, and compare it against a standard BPE tokenizer. Both tokenizers were used to preprocess a Slovak corpus for pretraining two RoBERTa-based models (SK_Morph_BLM and SK_BPE_BLM), which were then fine-tuned on token classification, sequence classification, question answering, and semantic textual similarity tasks. Experimental results show that SK_Morph_BLM achieved slightly higher performance overall, with statistically significant gains in semantic similarity (up to +12.49%) and question answering (up to +3.23%). Complementary quantitative and qualitative analyses further revealed that morphology-aware tokenization is most effective for shorter, morphologically regular texts and improves grammatical and semantic consistency. These findings demonstrate that incorporating morphological information into tokenization can enhance model robustness and semantic understanding for morphologically rich languages.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131492"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changwook Chu , Yongtae Jeong , Hansam Cho , Jaehoon Kim , Jungmin Lee , Byungwoo Bang , Junyeon Lee , Uiseok Song , Seoung Bum Kim
{"title":"CATS-RAG: Contextual Augmented Triplet Synthesis for RAG in Technical QA","authors":"Changwook Chu , Yongtae Jeong , Hansam Cho , Jaehoon Kim , Jungmin Lee , Byungwoo Bang , Junyeon Lee , Uiseok Song , Seoung Bum Kim","doi":"10.1016/j.eswa.2026.131491","DOIUrl":"10.1016/j.eswa.2026.131491","url":null,"abstract":"<div><div>Large language models (LLMs) are widely used for building question-answering (QA) systems, with retrieval augmented generation (RAG) commonly applied to improve the accuracy of the answers. However, in specific domains, achieving high performance often requires fine-tuning components within the RAG pipeline, such as the retriever and generator, because prompt or index engineering alone may not sufficiently capture domain-specific knowledge. Moreover, obtaining document-question-answer triplets for such tuning is particularly challenging in technical domains. This paper presents contextual augmented triplet synthesis for RAG in technical QA (CATS-RAG), a framework designed to expand domain-relevant data and improve RAG accuracy in technical domains with limited available data. CATS-RAG includes two core components: QA datasets generation and the fine-tuning of the retriever and generator components within the RAG. For data generation, a chain-of-thought prompting approach enables LLMs to generate triplets solely from provided documents. In the fine-tuning phase, by using highly similar documents from retrieved sets and probabilistic omission of golden documents, which act as hard distractor, improves answer robustness even though irrelevant documents are retrieved. Experiments on TechQA and Microsoft QA show that CATS-RAG consistently improves both retrieval and generation. On average, CATS-RAG increases retriever performance by 4.3% on TechQA and 6.5% on Microsoft QA, and generator performance improves by 1.7% and 11.9%, respectively. These results demonstrate that CATS-RAG provides reliable performance gains in specialized QA settings with limited supervision.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131491"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fenglei Yang , Weiyi Ma , Qiaochuan Chen , Yan Sun , Yuexing Han
{"title":"PDDNet: An end-to-end object detection framework for real-world plant leaf disease diagnosis","authors":"Fenglei Yang , Weiyi Ma , Qiaochuan Chen , Yan Sun , Yuexing Han","doi":"10.1016/j.eswa.2026.131294","DOIUrl":"10.1016/j.eswa.2026.131294","url":null,"abstract":"<div><div>Accurate detection of plant leaf diseases in complex agricultural fields remains a critical challenge, primarily stemming from cluttered natural backgrounds, multi-scale lesion variations (ranging from tiny spots to large patches), and subtle visual distinctions among disease classes. To address these issues, we present PDDNet, an end-to-end plant disease detection framework that integrates fine-grained lesion features with global contextual information via a cascade encoder-decoder architecture. In the encoder, an Enhanced Attention-based Multi-scale Aggregation (EAMA) module is developed to capture multi-scale lesion features through dual-branch spatial-channel attention fusion, enabling cross-layer interaction and contextual enhancement. The decoder incorporates a Prior-Guided Self-Attention (PGSA) mechanism, which merges positional encodings with IoU-based geometric priors to dynamically weight attention, prioritizing lesion boundaries and morphological structures. To resolve the inherent conflict between classification and localization tasks, a Multi-task Feature Decoupling Module (MFDM) is proposed to generate task-specific dynamic masks, explicitly segregating semantic features (for classification) and spatial features (for regression). Experimental results validate the superiority of PDDNet: it achieves 43.6% AP on the PlantDoc dataset (outperforming AlignDETR by 0.3%) and 81.6% AP on the Tomato Leaf Disease dataset (outperforming the state-of-the-art by 0.2%). With its high accuracy and cross-scenario robustness, PDDNet offers a practical solution for precision agriculture, facilitating automated field-level disease diagnosis and supporting data-driven crop protection strategies.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131294"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huizhi Li , Dagang Li , Jinglin Zhang , Yuhui Zheng , Cong Bai
{"title":"3D-MolGL: A multimodal framework for integrating 3D molecular graphs into language models","authors":"Huizhi Li , Dagang Li , Jinglin Zhang , Yuhui Zheng , Cong Bai","doi":"10.1016/j.eswa.2026.131437","DOIUrl":"10.1016/j.eswa.2026.131437","url":null,"abstract":"<div><div>Large Language Models (LLMs) have exhibited remarkable capabilities in natural language generation and have been extensively applied to diverse tasks such as text generation and medical literature analysis, demonstrating robust proficiency in structured data processing and knowledge extraction. However, these models generally overlook the crucial three-dimensional (3D) molecular conformations, which are vital for understanding key chemical properties. This oversight significantly limits the potential of LLMs in the biomolecular field, particularly in complex tasks like drug structure discovery. To address this, we propose a Multimodal Framework for Integrating 3D Molecular Graphs into Language Models (<strong>3D-MolGL</strong>). It employs a Physics-Informed Equivariant Graph Neural Network (PI-EGNN) incorporating physically meaningful edge-level priors and physics-based regularization, aligning learned representations with empirical data and physical laws. Our approach incorporates an Iterative Cross-Modal Fusion module to reinforce structural and linguistic information, enabling the model to capture complex dependencies and improve the alignment between molecular data and natural language. Moreover, the Region-Phrase Semantic Grounding module enables fine-grained alignment between molecular substructures and linguistic tokens, thereby reinforcing the connection between molecular semantics and their textual representation. Additionally, the Best-of-N sampling strategy enhances output reliability. Notably, 3D-MolGL achieves competitive or state-of-the-art performance in molecule captioning and 3D-aware question answering tasks, while utilizing approximately 75% fewer parameters than existing large-scale multimodal architectures. This demonstrates that robust molecular reasoning capabilities can be achieved with more compact models, providing a promising new perspective for interpretable AI in chemistry.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131437"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qianlin Qiao , Ying Qiao , Xin Sun , Qiaomu Wen , Jian Lei
{"title":"MGPNet: A multi-modal geo-physical production network for reservoir yield forecasting","authors":"Qianlin Qiao , Ying Qiao , Xin Sun , Qiaomu Wen , Jian Lei","doi":"10.1016/j.eswa.2026.131407","DOIUrl":"10.1016/j.eswa.2026.131407","url":null,"abstract":"<div><div>Accurate reservoir production prediction often depends on a single data source or simplified physical assumptions, limiting the ability to fully capture complex reservoir heterogeneity and multi-physical coupling processes. To address these challenges, this paper proposes MGPNet, a multi-modal reservoir production prediction network that integrates seismic images, logging curves, and historical production sequences. The network incorporates the GeoFE-AE module, which uses adversarial feature enhancement and semantic alignment mechanisms to achieve deep feature extraction and collaborative representation across modalities. Furthermore, it introduces a Multi-scale Cross-Attention Mechanism (MCAM) to enable effective feature fusion across different modalities at multiple semantic granularities, and employs a bidirectional Transformer decoding and prediction module (BiAD) to accurately forecast future production sequences. Using a high-fidelity synthetic multi-modal dataset, experimental results demonstrate that MGPNet significantly outperforms existing mainstream methods across several key metrics: Mean Absolute Error (MAE) of 2.12, Root Mean Square Error (RMSE) of 5.87, Coefficient of Determination (R<sup>2</sup>) of 0.987, and Explained Variance Score (EVS) of 0.971. These results validate the model’s comprehensive strengths in accuracy, stability, and robustness to noise. Furthermore, transfer-learning evaluations on real wells from the Volve oilfield confirm the model’s practical applicability and strong cross-well generalization capability. This research offers a promising technical approach for deep fusion modeling of multi-source reservoir data, with substantial potential for practical engineering applications and further academic exploration.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131407"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuyun Li , Zhi Li , Weidong Wang , Long Zheng , Yu Lu
{"title":"DNPR: Zero-shot industrial anomaly detection via dynamic normal prototype refinement","authors":"Shuyun Li , Zhi Li , Weidong Wang , Long Zheng , Yu Lu","doi":"10.1016/j.eswa.2026.131331","DOIUrl":"10.1016/j.eswa.2026.131331","url":null,"abstract":"<div><div>Zero-Shot Industrial Anomaly Detection (ZS-IAD) aims to identify anomalies without access to target-domain training samples, making it a critical task in real-world manufacturing. Existing methods often rely on pre-trained Vision-Language Models (VLMs) guided by extra prompts, which limits fine-grained precision, or depend on offline statistics requiring full test-set access, restricting practical deployment. To overcome these limitations, we propose DNPR, a transductive zero-shot framework based on Dynamic Normal Prototype Refinement. First, to mitigate feature shifts caused by in-plane geometric variations in certain categories, we introduce a Progressive Masked Geometric Registration (PMGR) module for image alignment. Second, we propose a Neighborhood Mutual Enhancement (NME) module, which performs neighborhood-based mutual scoring in pre-trained patch feature spaces, coupled with a dynamic dual-memory mechanism that incrementally consolidates normal patterns while suppressing anomalous disturbances. Third, we propose a Texture-aware Prototype Calibration (TPC) module to refine anomaly scores by using adaptive weights derived from texture prototypes, thereby suppressing false positives in texture-rich regions. Experiments on four real-world industrial datasets demonstrate that DNPR significantly outperforms existing state-of-the-art methods under zero-shot settings. On MVTec AD, DNPR attains 96.5% image-AUROC and 96.3% pixel-AUROC, surpassing previous methods by 4.6% and 3.8%, respectively, without full test-set access. Moreover, DNPR remains competitive in few-shot settings and serves as a plug-and-play enhancement for VLM-based approaches. Code is available at: <span><span>https://github.com/shuyunli23/DNPR-main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131331"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-performance clustering and routing for industrial wireless sensor networks: An innovative heuristic optimization approach","authors":"Jing Xiao , Mengfei Wang , Tao Luo , Zhigang Li","doi":"10.1016/j.eswa.2026.131408","DOIUrl":"10.1016/j.eswa.2026.131408","url":null,"abstract":"<div><div>Industrial wireless sensor networks (IWSNs) play a critical role in enabling reliable data acquisition in industrial environments, where routing scheme directly affects network lifetime and communication reliability. This paper presents a novel clustering routing model that integrates node residual energy, clustering distance, and link quality to better reflect practical network conditions. Based on this model, a high-performance cluster routing protocol leveraging a quantum chaotic immune clone algorithm (HCR-QCICA) is proposed. The protocol employs a new chaotic initialization strategy to avoid optima and a novel quantum optimization strategy to enhance global search capability. Additionally, a type-differentiated inter-cluster multi-hop routing method is developed to address diverse data transmission requirements. Experimental results in simulated industrial scenarios show that HCR-QCICA reduces delay and packet loss rate by at least 4.08% and 9.16%, respectively, while increasing network lifetime and throughput by over 14.07% and 24.08%, compared to LEACH, TEEN, CHEABC-QCRP, MOGWO, and GAPSO-H protocols. These findings demonstrate the effectiveness of the proposed approach in improving IWSNs performance.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131408"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mariano Albaladejo-González , Manuel J. Gomez , Óscar Cánovas , Félix Gómez Mármol , José A. Ruipérez-Valiente
{"title":"Quantifying expert speech: A comprehensive analysis of instructional discourses","authors":"Mariano Albaladejo-González , Manuel J. Gomez , Óscar Cánovas , Félix Gómez Mármol , José A. Ruipérez-Valiente","doi":"10.1016/j.eswa.2026.131515","DOIUrl":"10.1016/j.eswa.2026.131515","url":null,"abstract":"<div><div>Oral communication is a crucial skill in modern society. Nevertheless, it requires sustained practice and constructive feedback. Consequently, several studies have explored the development of oral communication trainers powered by Artificial Intelligence (AI). However, what characterizes expert speech remains unclear, especially given the need to adapt speech to contextual factors. In instructional environments, the speaker’s communication proficiency is a key determinant of audience learning outcomes. For this reason, we have analyzed 1250 speeches from five types of instructional discourses: in-person college classes (<em>Lectures</em>), online learning lessons (<em>Online Courses</em>), instructional animations (<em>Animated Lessons</em>), supplementary materials for school and high school (<em>Supplementary Lessons</em>), and public presentations (<em>Public Talks</em>). We extracted 16 speech metrics, including six additional multiple-participant metrics for <em>Lectures</em>. We obtained 250 videos of each discourse type, ensuring a minimum length of five minutes. Our analysis revealed expert values for each speech metric and showed how speech metrics vary across discourse types. We also developed an AI speech classifier that achieved an F1 score of 0.78. The model struggled to identify <em>Online Courses</em>, which is consistent with the Uniform Manifold Approximation and Projection analysis, showing that <em>Online Courses</em> are closely interjected with the speech of other instructional discourses. Furthermore, we identified distinct speech profiles in <em>Lectures, Public Talks</em>, and <em>Online Courses</em>, highlighting variations in speaking styles. This research provides valuable insights into expert speech in instructional discourses by offering reference values that can help speakers refine their delivery and support researchers in developing more effective speech training systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131515"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}