{"title":"An MDA approach for robotic-based real-time business intelligence applications","authors":"Houssam Bazza , Sandro Bimonte , Zakaria Gourti , Stefano Rizzi , Hassan Badir","doi":"10.1016/j.datak.2025.102418","DOIUrl":"10.1016/j.datak.2025.102418","url":null,"abstract":"<div><div>Industry 4.0, the fourth industrial revolution, has emerged from the convergence of robotics, automation, and the Internet of Things (IoT), transforming industrial processes with intelligent systems and digital integration. This revolution also brings with it Business Intelligence (BI) systems that enable the analysis of IoT and robotic data. The data architectures employed for BI in Industry 4.0 contexts are often intricate, typically comprising robots software, DBMSs, message brokers, and data stream management systems. Consequently, designing BI data-centric applications for Industry 4.0 presents a significant challenge. Inspired by the absence of modeling approaches for this type of application and by the well-established advantages of Model-Driven Architecture (MDA), this paper introduces a novel UML profile for real-time robotic data-driven BI applications. Our profile enables the representation of robotic and transactional data within a unified and consistent framework, enabling continuous queries over these streams. Additionally, we propose an automated method to implement UML class diagrams onto a technological stack featuring ROS, Apache Kafka, PostgreSQL, and Apache Flink. An experimental evaluation in the agricultural application domain confirms the merits of our approach.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102418"},"PeriodicalIF":2.7,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143471503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aspect-level recommendation fused with review and rating representations","authors":"Heng-Ru Zhang , Ling Lin , Fan Min","doi":"10.1016/j.datak.2025.102417","DOIUrl":"10.1016/j.datak.2025.102417","url":null,"abstract":"<div><div>Review contains user opinions about different aspects of an item, which is essential data for aspect-level recommendation. Most existing aspect-level recommendation algorithms are concerned with the degree to which user and item aspects match. However, even if an item is extremely popular due to its high quality, it may only partially match the aspects of a user. A tolerant user may like the item, whereas a strict user may dislike it. This implies that these works disregard the personalized behavior patterns of the user. In this paper, we propose a new <strong>A</strong>spect-level <strong>R</strong>ecommendation model fused with <strong>R</strong>eview and <strong>R</strong>ating, namely <strong>ARRR</strong>, to address the recommendation bias. First, we introduce rating to explore user behavior patterns and item quality. Then, we present a personalized attention mechanism that generates a set of aspect-level user or item representations from reviews. Finally, we obtain comprehensive user or item representations by combining rating- and review-based representations. In the experiments, the proposed model is compared with seven state-of-the-art recommendation algorithms on seven datasets. The results show that our model outperforms on seven metrics. The source code of ARRR is available at <span><span>https://github.com/alinn00/ARRR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102417"},"PeriodicalIF":2.7,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How well can a large language model explain business processes as perceived by users?","authors":"Dirk Fahland , Fabiana Fournier , Lior Limonad , Inna Skarbovsky , Ava J.E. Swevels","doi":"10.1016/j.datak.2025.102416","DOIUrl":"10.1016/j.datak.2025.102416","url":null,"abstract":"<div><div>Large Language Models (LLMs) are trained on a vast amount of text to interpret and generate human-like textual content. They are becoming a vital vehicle in realizing the vision of the autonomous enterprise, with organizations today actively adopting LLMs to automate many aspects of their operations. LLMs are likely to play a prominent role in future AI-augmented business process management systems (ABPMSs) catering functionalities across all system lifecycle stages. One such system’s functionality is Situation-Aware eXplainability (SAX), which relates to generating causally sound and yet human-interpretable explanations that take into account the process context in which the explained condition occurred.</div><div>In this paper, we present the SAX4BPM framework developed to generate SAX explanations. The SAX4BPM suite consists of a set of services and a central knowledge repository. The functionality of these services is to elicit the various knowledge ingredients that underlie SAX explanations. A key innovative component among these ingredients is the causal process execution view. In this work, we integrate the framework with an LLM to leverage its power to synthesize the various input ingredients for the sake of improved SAX explanations.</div><div>Since the use of LLMs for SAX is also accompanied by a certain degree of doubt related to its capacity to adequately fulfill SAX along with its tendency for hallucination and lack of inherent capacity to reason, we pursued a methodological evaluation of the perceived quality of the generated explanations. To this aim, we developed a designated scale and conducted a rigorous user study. Our findings show that the input presented to the LLMs aided with the guard-railing of its performance, yielding SAX explanations having better-perceived fidelity. This improvement is moderated by the perception of trust and curiosity. More so, this improvement comes at the cost of the perceived interpretability of the explanation.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102416"},"PeriodicalIF":2.7,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating diabetes dataset for knowledge graph embedding based link prediction","authors":"Sushmita Singh, Manvi Siwach","doi":"10.1016/j.datak.2025.102414","DOIUrl":"10.1016/j.datak.2025.102414","url":null,"abstract":"<div><div>For doing any accurate analysis or prediction on data, a complete and well-populated dataset is required. Medical based data for any disease like diabetes is highly coupled and heterogeneous in nature, with numerous interconnections. This inherently complex data cannot be analysed by simple relational databases making knowledge graphs an ideal tool for its representation which can efficiently handle intricate relationships. Thus, knowledge graphs can be leveraged to analyse diabetes data, enhancing both the accuracy and efficiency of data-driven decision-making processes. Although substantial data exists on diabetes in various formats, the availability of organized and complete datasets is limited, highlighting the critical need for creation of a well-populated knowledge graph. Moreover while developing the knowledge graph, an inevitable problem of incompleteness is present due to missing links or relationships, necessitating the use of knowledge graph completion tasks to fill in this absent information which involves predicting missing data with various Link Prediction (LP) techniques. Among various link prediction methods, approaches based on knowledge graph embeddings have demonstrated superior performance and effectiveness. These knowledge graphs can support in-depth analysis and enhance the prediction of diabetes-associated risks in this field. This paper introduces a dataset specifically designed for performing link prediction on a diabetes knowledge graph, so that it can be used to fill the information gaps further contributing in the domain of risk analysis in diabetes. The accuracy of the dataset is assessed through validation with state-of-the-art embedding-based link prediction methods.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102414"},"PeriodicalIF":2.7,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143419097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelling and solving industrial production tasks as planning-scheduling tasks","authors":"Andrii Nyporko , Lukáš Chrpa","doi":"10.1016/j.datak.2025.102415","DOIUrl":"10.1016/j.datak.2025.102415","url":null,"abstract":"<div><div>Industrial production planning or manufacturing concerns the selection of activities that can produce a desired product and scheduling them on resources that perform these activities. To deal with such problems techniques in the fields of <em>Automated Planning</em> and <em>Scheduling</em> might be leveraged, which are usually pursued separately even though they are (very) complementary. In manufacturing, the activities represent elementary steps in the production and each activity requires a specific input in order to produce a desired output. From that perspective, activities resemble actions in planning as they can capture such a requirement. Selecting proper activities including their (partial) ordering can be understood as a planning task while allocating the activities to the resources can be understood as a scheduling task.</div><div>This paper formalises the concept of “combined” planning and scheduling tasks by defining <em>planning-scheduling tasks</em> that are suitable for problems concerning industrial production or manufacturing. In particular, we define two types of activities – <em>production</em> and <em>maintenance</em> activities – where the former describes elementary production tasks while the latter modifies attributes of the resources (e.g. changing the configuration of reconfigurable machines). We introduce an extension of Planning Domain Definition Language (PDDL), a well-known language for describing planning tasks, to support modelling of planning-scheduling tasks. To tackle planning-scheduling tasks we propose two compilation schemes, one into temporal planning (in PDDL 2.1) and one into classical planning. We evaluated our approaches in three use cases of industrial production planning — Reconfigurable Machines, Woodworking, and Tube Factory domains. The results showed that solving planning-scheduling tasks by compiling them into planning tasks in order to use off-the-shelf planning engines is suitable as it scales reasonably well with the size of the actual tasks (although the resulting solutions are suboptimal).</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102415"},"PeriodicalIF":2.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143437007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Axel K.F. Christfort , Vlad Paul Cosma , Søren Debois , Thomas T. Hildebrandt , Tijs Slaats
{"title":"Static and dynamic techniques for iterative test-driven modelling of Dynamic Condition Response Graphs","authors":"Axel K.F. Christfort , Vlad Paul Cosma , Søren Debois , Thomas T. Hildebrandt , Tijs Slaats","doi":"10.1016/j.datak.2025.102413","DOIUrl":"10.1016/j.datak.2025.102413","url":null,"abstract":"<div><div>Test-driven declarative process modelling combines process models with test traces and has been introduced as a means to achieve both the flexibility provided by the declarative approach and the comprehensibility of the imperative approach. Open test-driven modelling adds a notion of context to tests, specifying the activities of concern in the model, and has been introduced as a means to support both iterative test-driven modelling, where the model can be extended without having to change all tests, and unit testing, where tests can define desired properties of parts of the process without needing to reason about the details of the whole process. The openness however makes checking a test more demanding, since actions outside the context are allowed at any point in the test execution and therefore many different traces may validate or invalidate an open test. In this paper we combine previously developed static techniques for effective open test-driven modelling for Dynamic Condition Response Graphs with a novel efficient implementation of dynamic checking of open tests based on alignment checking. We illustrate the static techniques on an example based on a real-life cross-organizational case management system and benchmark the dynamic checking on models and tests of varying size.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102413"},"PeriodicalIF":2.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olusanmi A. Hundogan , Bart J. Verhoef , Patrick Theeven , Hajo A. Reijers , Xixi Lu
{"title":"Reinforcement learning for optimizing responses in care processes","authors":"Olusanmi A. Hundogan , Bart J. Verhoef , Patrick Theeven , Hajo A. Reijers , Xixi Lu","doi":"10.1016/j.datak.2025.102412","DOIUrl":"10.1016/j.datak.2025.102412","url":null,"abstract":"<div><div>Prescriptive process monitoring aims to derive recommendations for optimizing complex processes. While previous studies have successfully used reinforcement learning techniques to derive actionable policies in business processes, care processes present unique challenges due to their dynamic and multifaceted nature. For example, at any stage of a care process, a multitude of actions is possible. In this study, we follow the Reinforcement Learning (RL) approach and present a general approach that uses event data to build and train Markov decision processes. We proposed three algorithms including one that takes the elapsed time into account when transforming an event log into a semi-Markov decision process. We evaluated the RL approach using an aggression incident data set. Specifically, the goal is to optimize staff member actions when clients are displaying different types of aggressive behavior. The Q-learning and SARSA are used to find optimal policies. Our results showed that the derived policies align closely with current practices while offering alternative options in specific situations. By employing RL in the context of care processes, we contribute to the ongoing efforts to enhance decision-making and efficiency in dynamic and complex environments.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102412"},"PeriodicalIF":2.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Symmetric non negative matrices factorization applied to the detection of communities in graphs and forensic image analysis","authors":"Gaël Marec , Nédra Mellouli","doi":"10.1016/j.datak.2025.102411","DOIUrl":"10.1016/j.datak.2025.102411","url":null,"abstract":"<div><div>With the proliferation of data, particularly on social networks, the accuracy of the information becomes uncertain. In this context, a major challenge lies in detecting image manipulations, where alterations are made to deceive observers. Aligning with the anomaly detection issue, recent methods approach the detection of image transformations as a community detection problem within graphs associated with the images. In this study, we propose using a community clustering method based on non-negative symmetric matrix factorization. By examining several experiments detecting alterations in manipulated images, we assess the method’s robustness and discuss potential enhancements. We also present a process for automatically generating visually and semantically coherent forged images. Additionally, we provide a web application to demonstrate this process.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102411"},"PeriodicalIF":2.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143346792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"REDIRE: Extreme REduction DImension for extRactivE Summarization","authors":"Christophe Rodrigues , Marius Ortega , Aurélien Bossard , Nédra Mellouli","doi":"10.1016/j.datak.2025.102407","DOIUrl":"10.1016/j.datak.2025.102407","url":null,"abstract":"<div><div>This paper presents an automatic unsupervised summarization model capable of extracting the most important sentences from a corpus. The unsupervised aspect makes it possible to do away with large corpora, made up of documents and their reference summaries, and to directly process documents potentially made up of several thousand words. To extract sentences in a summary, we use pre-entrained word embeddings to represent the documents. From this thick cloud of word vectors, we apply an extreme dimension reduction to identify important words, which we group by proximity. Sentences are extracted using linear constraint solving to maximize the information present in the summary. We evaluate the approach on large documents and present very encouraging initial results.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102407"},"PeriodicalIF":2.7,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143135194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Logic-infused knowledge graph QA: Enhancing large language models for specialized domains through Prolog integration","authors":"Aneesa Bashir, Rong Peng, Yongchang Ding","doi":"10.1016/j.datak.2025.102406","DOIUrl":"10.1016/j.datak.2025.102406","url":null,"abstract":"<div><div>Efficiently answering questions over complex, domain-specific knowledge graphs remain a substantial challenge, as large language models (LLMs) often lack the logical reasoning abilities and particular knowledge required for such tasks. This paper presents a novel framework integrating LLMs with logical programming languages like Prolog for Logic-Infused Knowledge Graph Question Answering (KGQA) in specialized domains. The proposed methodology uses a transformer-based encoder–decoder architecture. An encoder reads the question, and a named entity recognition (NER) module connects entities to the knowledge graph. The extracted entities are fed into a grammar-guided decoder, producing a logical form (Prolog query) that captures the semantic constraints and relationships. The Prolog query is executed over the knowledge graph to perform symbolic reasoning and retrieve relevant answer entities. Comprehensive experiments on the MetaQA benchmark dataset demonstrate the superior performance of this logic-infused method in accurately identifying correct answer entities from the knowledge graph. Even when trained on a limited subset of annotated data, it outperforms state-of-the-art baselines, achieving 89.60 % and F1-scores of up to 89.61 %, showcasing its effectiveness in enhancing large language models with symbolic reasoning capabilities for specialized question-answering tasks. The seamless integration of LLMs and logical programming enables the proposed framework to reason effectively over complex, domain-specific knowledge graphs, overcoming a key limitation of existing KGQA systems. In specialized domains, the interpretability provided by representing questions such as Prologue queries is a valuable asset.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102406"},"PeriodicalIF":2.7,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143135551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}