Ana Julia Dal Forno , Graziela Piccoli Richetti , Vinícius Heinz Knaesel
{"title":"Fake news detection algorithms – A systematic literature review","authors":"Ana Julia Dal Forno , Graziela Piccoli Richetti , Vinícius Heinz Knaesel","doi":"10.1016/j.datak.2025.102441","DOIUrl":"10.1016/j.datak.2025.102441","url":null,"abstract":"<div><div>Social media and news platforms make available to their users, in real-time and simultaneously, access to a significant amount of content that may be true or false. It is remarkable that, with the evolution of Industry 4.0 technologies, the production and dissemination of fake news also increased in recent years. Some content quickly reaches considerable popularity because it is accessed and shared on a large scale, especially in social networks, thus having a potential for going viral. Thus, this study aimed to identify the algorithms and software used for fake news detection. The choice for this combination is justified because in Brazil this process is carried out manually by verification agencies and thus, based on the mapping of the algorithms identified in the literature, an architecture proposal will be developed using artificial intelligence. As a methodology, a systematic literature review (SLR) was conducted in the Science Direct and Scopus databases using the keywords \"fake news\" and \"machine learning\" to locate reviews and research articles published in Engineering fields from 2018 to 2023. A total of 24 articles were analyzed, and the results pointed out that Facebook and X<span><span><sup>1</sup></span></span> were the social networks most used to disseminate fake news. Moreover, the main topics addressed were the COVID-19 pandemic and the United States presidential elections of 2016 and 2020. As for the most used algorithms, a predominance of neural networks was observed. The contribution of this study is in mapping the most used algorithms and their degree of assertiveness, as well as identifying the themes, countries and related researchers that help in the evolution of the fake news theme.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102441"},"PeriodicalIF":2.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143683664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingkai Yang , Sally McClean , Kevin Burke , Mark Donnelly , Kashaf Khan
{"title":"Modelling process durations with gamma mixtures for right-censored data: Applications in customer clustering, pattern recognition, drift detection, and rationalisation","authors":"Lingkai Yang , Sally McClean , Kevin Burke , Mark Donnelly , Kashaf Khan","doi":"10.1016/j.datak.2025.102430","DOIUrl":"10.1016/j.datak.2025.102430","url":null,"abstract":"<div><div>Customer modelling, particularly concerning length of stay or process duration, is vital for identifying customer patterns and optimising business processes. Recent advancements in computing and database technologies have revolutionised statistics and business process analytics by producing heterogeneous data that reflects diverse customer behaviours. Different models should be employed for distinct customer categories, culminating in an overall mixture model. Furthermore, some customers may remain “alive” at the conclusion of the observation period, meaning their journeys are incomplete, resulting in right-censored (RC) duration data. This combination of heterogeneous and right-censored data introduces complexity to process duration modelling and analysis. This paper presents a general approach to modelling process duration data using a gamma mixture model, where each gamma distribution represents a specific customer pattern. The model is adapted to account for RC data by modifying the likelihood function during model fitting. The paper explores three key application scenarios: (1) offline pattern clustering, which categorises customers who have completed their journeys; (2) online pattern tracking, which monitors and predicts customer behaviours in real-time; and (3) concept drift detection and rationalisation, which identifies shifts in customer patterns and explains their underlying causes. The proposed method has been validated using synthetically generated data and real-world data from a hospital billing process. In all instances, the fitted models effectively represented the data and demonstrated strong performance across the three application scenarios.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102430"},"PeriodicalIF":2.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accessibility in conceptual modeling—A systematic literature review, a keyboard-only UML modeling tool, and a research roadmap","authors":"Aylin Sarioğlu, Haydar Metin, Dominik Bork","doi":"10.1016/j.datak.2025.102423","DOIUrl":"10.1016/j.datak.2025.102423","url":null,"abstract":"<div><div>The reports on Disability by the World Health Organization show that the number of people with disabilities is increasing. Consequently, accessibility should play an essential role in information systems engineering research. While there is an increasingly rich set of available web accessibility guidelines, testing frameworks, and generally accessibility features in modern web-based software systems, software development frameworks, and Integrated Development Environments, this paper shows, based on a systematic review of the literature and current modeling tools, that accessibility is, so far, only scarcely focused in conceptual modeling research. With this paper, we assess the state of the art of accessibility in conceptual modeling, we identify current research gaps, and we delineate a vision toward more accessible conceptual modeling methods and tools. As a concrete step forward toward this vision, we present a generic concept of a keyboard-only modeling tool interaction that is implemented as a new module for the Graphical Language Server Platform (GLSP) framework. We show—using a currently developed UML modeling tool—how efficiently this module allows GLSP-based tool developers to introduce accessibility features into their modeling tools, thereby engaging physically disabled users in conceptual modeling.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102423"},"PeriodicalIF":2.7,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143579543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Gaith Ayadi , Haithem Mezni , Hela Elmannai , Reem Ibrahim Alkanhel
{"title":"Privacy-preserving cross-network service recommendation via federated learning of unified user representations","authors":"Mohamed Gaith Ayadi , Haithem Mezni , Hela Elmannai , Reem Ibrahim Alkanhel","doi":"10.1016/j.datak.2025.102422","DOIUrl":"10.1016/j.datak.2025.102422","url":null,"abstract":"<div><div>With the emergence of cloud computing, the Internet of Things, and other large-scale environments, recommender systems have been faced with several issues, mainly (i) the distribution of user–item data across multiple information networks, (ii) privacy restrictions and the partial profiling of users and items caused by this distribution, (iii) the heterogeneity of user–item knowledge in different information networks. Furthermore, most approaches perform recommendations based on a single source of information, and do not handle the partial representation of users’ and items’ information in a federated way. Such isolated and non-collaborative behavior, in multi-source and cross-network information settings, often results in inaccurate and low-quality recommendations. To address these issues, we exploit the strengths of network representation learning and federated learning to propose a service recommendation approach in smart service networks. While NRL is employed to learn rich representations of entities (e.g., users, services, IoT objects), federated learning helps collaboratively infer a unified profile of users and items, based on the concept of <em>anchor user</em>, which are bridge entities connecting multiple information networks. These unified profiles are, finally, fed into a federated recommendation algorithm to select the top-rated services. Using a scenario from the smart healthcare context, the proposed approach was developed and validated on a multiplex information network built from real-world electronic medical records (157 diseases, 491 symptoms, 273 174 patients, treatments and anchors data). Experimental results under varied federated settings demonstrated the utility of cross-client knowledge (i.e. anchor links) and the collaborative reconstruction of composite embeddings (i.e. user representations) for improving recommendation accuracy. In terms of RMSE@K and MAE@K, our approach achieved an improvement of 54.41% compared to traditional single-network recommendation, as long as the federation and communication scale increased. Moreover, the gap with four federated approaches has reached 19.83 %, highlighting our approach’s ability to map local embeddings (i.e. user’s partial representations) into a complete view.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102422"},"PeriodicalIF":2.7,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143551137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A graph theoretic approach to assess quality of data for classification task","authors":"Payel Sadhukhan , Samrat Gupta","doi":"10.1016/j.datak.2025.102421","DOIUrl":"10.1016/j.datak.2025.102421","url":null,"abstract":"<div><div>The correctness of predictions rendered by an AI/ML model is key to its acceptability. To foster researchers’ and practitioners’ confidence in the model, it is necessary to render an intuitive understanding of the workings of a model. In this work, we attempt to explain a model’s working by providing some insights into the quality of data. While doing this, it is essential to consider that revealing the training data to the users is not feasible for logistical and security reasons. However, sharing some interpretable parameters of the training data and correlating them with the model’s performance can be helpful in this regard. To this end, we propose a new measure based on Euclidean Minimum Spanning Tree (EMST) for quantifying the intrinsic separation (or overlaps) between the data classes. For experiments, we use datasets from diverse domains such as finance, medical, and marketing. We use state-of-the-art measure known as <em>Davies Bouldin Index (DBI)</em> to validate our approach on four different datasets from aforementioned domains. The experimental results of this study establish the viability of the proposed approach in explaining the working and efficiency of a classifier. Firstly, the proposed measure of class-overlap quantification has shown a better correlation with the classification performance as compared to DBI scores. Secondly, the results on multi-class datasets demonstrate that the proposed measure can be used to determine the feature importance so as to learn a better classification model.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102421"},"PeriodicalIF":2.7,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mirna El Ghosh , Hala Naja , Habib Abdulrab , Mohamad Khalil
{"title":"CriMOnto: A generalized domain-specific ontology for modeling procedural norms of the Lebanese criminal law","authors":"Mirna El Ghosh , Hala Naja , Habib Abdulrab , Mohamad Khalil","doi":"10.1016/j.datak.2025.102419","DOIUrl":"10.1016/j.datak.2025.102419","url":null,"abstract":"<div><div>Criminal (or penal) law regulates offenses, offenders, and legal punishments. Modeling criminal law is gaining much attention in the ontology engineering community. However, a significant aspect is neglected: the explicit representation of procedural knowledge. Procedural norms, such as regulative norms, are addressed to agents in the normative system. They govern the different interactions among these agents. In this study, we propose a formal and faithful representation of the procedural aspect of legal norms in the context of the Lebanese Criminal Code. A modular domain-specific ontology named CriMOnto is developed for this purpose. CriMOnto is grounded in the Unified Foundational Ontology (UFO) and the legal core ontology UFO-L by applying the Ontology-Driven Conceptual Modeling (ODCM) process. Conceptual Ontology Patterns (COPs) are reused from UFO and UFO-L to build the hierarchical and procedural content of the ontology. CriMOnto is validated as a formal ontology and evaluated using a dual evaluation approach. The potential use of CriMOnto for lightweight rule-based decision support is discussed in this study.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102419"},"PeriodicalIF":2.7,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143529128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An MDA approach for robotic-based real-time business intelligence applications","authors":"Houssam Bazza , Sandro Bimonte , Zakaria Gourti , Stefano Rizzi , Hassan Badir","doi":"10.1016/j.datak.2025.102418","DOIUrl":"10.1016/j.datak.2025.102418","url":null,"abstract":"<div><div>Industry 4.0, the fourth industrial revolution, has emerged from the convergence of robotics, automation, and the Internet of Things (IoT), transforming industrial processes with intelligent systems and digital integration. This revolution also brings with it Business Intelligence (BI) systems that enable the analysis of IoT and robotic data. The data architectures employed for BI in Industry 4.0 contexts are often intricate, typically comprising robots software, DBMSs, message brokers, and data stream management systems. Consequently, designing BI data-centric applications for Industry 4.0 presents a significant challenge. Inspired by the absence of modeling approaches for this type of application and by the well-established advantages of Model-Driven Architecture (MDA), this paper introduces a novel UML profile for real-time robotic data-driven BI applications. Our profile enables the representation of robotic and transactional data within a unified and consistent framework, enabling continuous queries over these streams. Additionally, we propose an automated method to implement UML class diagrams onto a technological stack featuring ROS, Apache Kafka, PostgreSQL, and Apache Flink. An experimental evaluation in the agricultural application domain confirms the merits of our approach.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102418"},"PeriodicalIF":2.7,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143471503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aspect-level recommendation fused with review and rating representations","authors":"Heng-Ru Zhang , Ling Lin , Fan Min","doi":"10.1016/j.datak.2025.102417","DOIUrl":"10.1016/j.datak.2025.102417","url":null,"abstract":"<div><div>Review contains user opinions about different aspects of an item, which is essential data for aspect-level recommendation. Most existing aspect-level recommendation algorithms are concerned with the degree to which user and item aspects match. However, even if an item is extremely popular due to its high quality, it may only partially match the aspects of a user. A tolerant user may like the item, whereas a strict user may dislike it. This implies that these works disregard the personalized behavior patterns of the user. In this paper, we propose a new <strong>A</strong>spect-level <strong>R</strong>ecommendation model fused with <strong>R</strong>eview and <strong>R</strong>ating, namely <strong>ARRR</strong>, to address the recommendation bias. First, we introduce rating to explore user behavior patterns and item quality. Then, we present a personalized attention mechanism that generates a set of aspect-level user or item representations from reviews. Finally, we obtain comprehensive user or item representations by combining rating- and review-based representations. In the experiments, the proposed model is compared with seven state-of-the-art recommendation algorithms on seven datasets. The results show that our model outperforms on seven metrics. The source code of ARRR is available at <span><span>https://github.com/alinn00/ARRR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102417"},"PeriodicalIF":2.7,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating diabetes dataset for knowledge graph embedding based link prediction","authors":"Sushmita Singh, Manvi Siwach","doi":"10.1016/j.datak.2025.102414","DOIUrl":"10.1016/j.datak.2025.102414","url":null,"abstract":"<div><div>For doing any accurate analysis or prediction on data, a complete and well-populated dataset is required. Medical based data for any disease like diabetes is highly coupled and heterogeneous in nature, with numerous interconnections. This inherently complex data cannot be analysed by simple relational databases making knowledge graphs an ideal tool for its representation which can efficiently handle intricate relationships. Thus, knowledge graphs can be leveraged to analyse diabetes data, enhancing both the accuracy and efficiency of data-driven decision-making processes. Although substantial data exists on diabetes in various formats, the availability of organized and complete datasets is limited, highlighting the critical need for creation of a well-populated knowledge graph. Moreover while developing the knowledge graph, an inevitable problem of incompleteness is present due to missing links or relationships, necessitating the use of knowledge graph completion tasks to fill in this absent information which involves predicting missing data with various Link Prediction (LP) techniques. Among various link prediction methods, approaches based on knowledge graph embeddings have demonstrated superior performance and effectiveness. These knowledge graphs can support in-depth analysis and enhance the prediction of diabetes-associated risks in this field. This paper introduces a dataset specifically designed for performing link prediction on a diabetes knowledge graph, so that it can be used to fill the information gaps further contributing in the domain of risk analysis in diabetes. The accuracy of the dataset is assessed through validation with state-of-the-art embedding-based link prediction methods.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102414"},"PeriodicalIF":2.7,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143419097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}