{"title":"Derived multi-objective function for latency sensitive-based cloud object storage system using hybrid heuristic algorithm","authors":"N Nataraj , RV Nataraj","doi":"10.1016/j.datak.2025.102448","DOIUrl":"10.1016/j.datak.2025.102448","url":null,"abstract":"<div><div>Cloud Object Storage System (COSS) is capable of storing and retrieving a ton of unstructured data items called objects which act as a core cloud service for contemporary web-based applications. While sharing the data among different parties, privacy preservation becomes challenging. <em>Research Problem:</em> From day-to-day activities, a high volume of requests are served daily thus, it leads to cause the latency issues. In a cloud storage system, the adaption of a holistic approach helps the user to identify sensitive information and analyze the unwanted files/data. With evolving of Internet of Things (IoT) applications are latency-sensitive, which does not function well with these new ideas and platforms that are available today. <em>Overall Purpose of the Study:</em> Therefore, a novel latency-aware COSS is implemented with the aid of multi-objective functionalities to allocate and reallocate data efficiently in order to sustain the storage process in the cloud environment. <em>Design of the Study:</em> This goal is accomplished by implementing a hybrid meta-heuristic approach with the integration of the Mother Optimization Algorithm (MOA) with Dolphin Swarm Optimization (DSO) algorithm. The implemented hybrid optimization algorithm is called the Hybrid Dolphin Swarm-based Mother Optimization Algorithm (HDS-MOA). The HDS-MOA considers the objective function by considering constraints like throughput, latency, resource usage, and active servers during the data allocation process. While considering data reallocation process, the developed HDS-MOA algorithm is also performed by considering the multi-objective constraints like cost, makespan, and energy. The diverse experimental test is conducted to prove its effectiveness by comparing it with other existing methods for storing data efficiently across cloud networks. <em>Major findings of results:</em> In the configuration 3, the proposed HDS-MOA attains 31.11 %, 55.71 %, 55.71 %, and 68.21 % enhanced than the OSSperf, queuing theory, scheduling technique, and Monte Carlo-PSO based on the latency analysis. <em>Overview of Interpretations and Conclusions:</em> The developed HDS-MOA assured the better performance on the data is preserved in the optimal locations having appropriate access time and less latency that is highly essential for the cloud object storage. This supports to enhance the overall user experience by boosting the data retrieval. <em>Limitations of this Study with Solutions:</em> The ability of the proposed algorithm needs to enhance on balancing the multiple objectives such as performance, cost, and fault tolerance for optimally performing the operations in real-time that makes the system to be more efficient as well as responsive in the dynamic variations in the demand.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102448"},"PeriodicalIF":2.7,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ECS-KG: An event-centric semantic knowledge graph for event-related news articles","authors":"MVPT Lakshika, HA Caldera, TNK De Zoysa","doi":"10.1016/j.datak.2025.102451","DOIUrl":"10.1016/j.datak.2025.102451","url":null,"abstract":"<div><div>Recent advances in deep learning techniques and contextual understanding render Knowledge Graphs (KGs) valuable tools for enhancing accessibility and news comprehension. Conventional and news-specific KGs frequently lack the specificity for efficient news-related tasks, leading to limited relevance and static data representation. To fill the gap, this study proposes an Event-Centric Semantic Knowledge Graph (ECS-KG) model that combines deep learning approaches with contextual embeddings to improve the procedural and dynamic knowledge representation observed in news articles. The ECS-KG incorporates several information extraction techniques, a temporal Graph Neural Network (GNN), and a Graph Attention Network (GAT), yielding significant improvements in news representation. Several gold-standard datasets, comprising CNN/Daily Mail, TB-Dense, and ACE 2005, revealed that the proposed model outperformed the most advanced models. By integrating temporal reasoning and semantic insights, ECS-KG not only enhances user understanding of news significance but also meets the evolving demands of news consumers. This model advances the field of event-centric semantic KGs and provides valuable resources for applications in news information processing.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102451"},"PeriodicalIF":2.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143828580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Constantin Buschhaus , Arvid Butting , Judith Michael , Verena Nitsch , Sebastian Pütz , Bernhard Rumpe , Carolin Stellmacher , Sabine Theis
{"title":"Overcoming the hurdle of legal expertise: A reusable model for smartwatch privacy policies","authors":"Constantin Buschhaus , Arvid Butting , Judith Michael , Verena Nitsch , Sebastian Pütz , Bernhard Rumpe , Carolin Stellmacher , Sabine Theis","doi":"10.1016/j.datak.2025.102443","DOIUrl":"10.1016/j.datak.2025.102443","url":null,"abstract":"<div><div>Regulations for privacy protection aim to protect individuals from the unauthorized storage, processing, and transfer of their personal data but oftentimes fail in providing helpful support for understanding these regulations. To better communicate privacy policies for smartwatches, we need an in-depth understanding of their concepts and provide better ways to enable developers to integrate them when engineering systems. Up to now, no conceptual model exists covering privacy statements from different smartwatch manufacturers that is reusable for developers. This paper introduces such a conceptual model for privacy policies of smartwatches and shows its use in a model-driven software engineering approach to create a platform for data visualization of wearable privacy policies from different smartwatch manufacturers. We have analyzed the privacy policies of various manufacturers and extracted the relevant concepts. Moreover, we have checked the model with lawyers for its correctness, instantiated it with concrete data, and used it in a model-driven software engineering approach to create a platform for data visualization. This reusable privacy policy model can enable developers to easily represent privacy policies in their systems. This provides a foundation for more structured and understandable privacy policies which, in the long run, can increase the data sovereignty of application users.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102443"},"PeriodicalIF":2.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143817727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial preface to the special issue on research challenges in information science (RCIS’2023)","authors":"Selmin Nurcan, Andreas L. Opdahl","doi":"10.1016/j.datak.2025.102446","DOIUrl":"10.1016/j.datak.2025.102446","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102446"},"PeriodicalIF":2.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143911765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik
{"title":"Customized long short-term memory architecture for multi-document summarization with improved text feature set","authors":"Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik","doi":"10.1016/j.datak.2025.102440","DOIUrl":"10.1016/j.datak.2025.102440","url":null,"abstract":"<div><div>One <strong>a</strong>mong the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU <strong>(</strong>CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102440"},"PeriodicalIF":2.7,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malte Heithoff , Christian Hopmann , Thilo Köbel , Judith Michael , Bernhard Rumpe , Patrick Sapel
{"title":"Application of digital shadows on different levels in the automation pyramid","authors":"Malte Heithoff , Christian Hopmann , Thilo Köbel , Judith Michael , Bernhard Rumpe , Patrick Sapel","doi":"10.1016/j.datak.2025.102442","DOIUrl":"10.1016/j.datak.2025.102442","url":null,"abstract":"<div><div>The concept of digital shadows helps to move from handling large amounts of heterogeneous data in production to the handling of task- and context-dependent aggregated data sets supporting a specific purpose. Current research lacks further investigations of characteristics digital shadows may have when they are applied to different levels of the automation pyramid. Within this paper, we describe the application of the digital shadow concept for two use cases in injection molding, namely geometry-dependent process configuration, and optimal production planning of jobs on an injection molding machine. In detail, we describe the creation process of digital shadows, relevant data needs for the specific purpose, as well as relevant models. Based on their usage, we describe specifics of their characteristics and discuss commonalities and differences. These aspects can be taken into account when creating digital shadows for further use cases.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102442"},"PeriodicalIF":2.7,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Julia Dal Forno , Graziela Piccoli Richetti , Vinícius Heinz Knaesel
{"title":"Fake news detection algorithms – A systematic literature review","authors":"Ana Julia Dal Forno , Graziela Piccoli Richetti , Vinícius Heinz Knaesel","doi":"10.1016/j.datak.2025.102441","DOIUrl":"10.1016/j.datak.2025.102441","url":null,"abstract":"<div><div>Social media and news platforms make available to their users, in real-time and simultaneously, access to a significant amount of content that may be true or false. It is remarkable that, with the evolution of Industry 4.0 technologies, the production and dissemination of fake news also increased in recent years. Some content quickly reaches considerable popularity because it is accessed and shared on a large scale, especially in social networks, thus having a potential for going viral. Thus, this study aimed to identify the algorithms and software used for fake news detection. The choice for this combination is justified because in Brazil this process is carried out manually by verification agencies and thus, based on the mapping of the algorithms identified in the literature, an architecture proposal will be developed using artificial intelligence. As a methodology, a systematic literature review (SLR) was conducted in the Science Direct and Scopus databases using the keywords \"fake news\" and \"machine learning\" to locate reviews and research articles published in Engineering fields from 2018 to 2023. A total of 24 articles were analyzed, and the results pointed out that Facebook and X<span><span><sup>1</sup></span></span> were the social networks most used to disseminate fake news. Moreover, the main topics addressed were the COVID-19 pandemic and the United States presidential elections of 2016 and 2020. As for the most used algorithms, a predominance of neural networks was observed. The contribution of this study is in mapping the most used algorithms and their degree of assertiveness, as well as identifying the themes, countries and related researchers that help in the evolution of the fake news theme.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102441"},"PeriodicalIF":2.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143683664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingkai Yang , Sally McClean , Kevin Burke , Mark Donnelly , Kashaf Khan
{"title":"Modelling process durations with gamma mixtures for right-censored data: Applications in customer clustering, pattern recognition, drift detection, and rationalisation","authors":"Lingkai Yang , Sally McClean , Kevin Burke , Mark Donnelly , Kashaf Khan","doi":"10.1016/j.datak.2025.102430","DOIUrl":"10.1016/j.datak.2025.102430","url":null,"abstract":"<div><div>Customer modelling, particularly concerning length of stay or process duration, is vital for identifying customer patterns and optimising business processes. Recent advancements in computing and database technologies have revolutionised statistics and business process analytics by producing heterogeneous data that reflects diverse customer behaviours. Different models should be employed for distinct customer categories, culminating in an overall mixture model. Furthermore, some customers may remain “alive” at the conclusion of the observation period, meaning their journeys are incomplete, resulting in right-censored (RC) duration data. This combination of heterogeneous and right-censored data introduces complexity to process duration modelling and analysis. This paper presents a general approach to modelling process duration data using a gamma mixture model, where each gamma distribution represents a specific customer pattern. The model is adapted to account for RC data by modifying the likelihood function during model fitting. The paper explores three key application scenarios: (1) offline pattern clustering, which categorises customers who have completed their journeys; (2) online pattern tracking, which monitors and predicts customer behaviours in real-time; and (3) concept drift detection and rationalisation, which identifies shifts in customer patterns and explains their underlying causes. The proposed method has been validated using synthetically generated data and real-world data from a hospital billing process. In all instances, the fitted models effectively represented the data and demonstrated strong performance across the three application scenarios.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102430"},"PeriodicalIF":2.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accessibility in conceptual modeling—A systematic literature review, a keyboard-only UML modeling tool, and a research roadmap","authors":"Aylin Sarioğlu, Haydar Metin, Dominik Bork","doi":"10.1016/j.datak.2025.102423","DOIUrl":"10.1016/j.datak.2025.102423","url":null,"abstract":"<div><div>The reports on Disability by the World Health Organization show that the number of people with disabilities is increasing. Consequently, accessibility should play an essential role in information systems engineering research. While there is an increasingly rich set of available web accessibility guidelines, testing frameworks, and generally accessibility features in modern web-based software systems, software development frameworks, and Integrated Development Environments, this paper shows, based on a systematic review of the literature and current modeling tools, that accessibility is, so far, only scarcely focused in conceptual modeling research. With this paper, we assess the state of the art of accessibility in conceptual modeling, we identify current research gaps, and we delineate a vision toward more accessible conceptual modeling methods and tools. As a concrete step forward toward this vision, we present a generic concept of a keyboard-only modeling tool interaction that is implemented as a new module for the Graphical Language Server Platform (GLSP) framework. We show—using a currently developed UML modeling tool—how efficiently this module allows GLSP-based tool developers to introduce accessibility features into their modeling tools, thereby engaging physically disabled users in conceptual modeling.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102423"},"PeriodicalIF":2.7,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143579543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Gaith Ayadi , Haithem Mezni , Hela Elmannai , Reem Ibrahim Alkanhel
{"title":"Privacy-preserving cross-network service recommendation via federated learning of unified user representations","authors":"Mohamed Gaith Ayadi , Haithem Mezni , Hela Elmannai , Reem Ibrahim Alkanhel","doi":"10.1016/j.datak.2025.102422","DOIUrl":"10.1016/j.datak.2025.102422","url":null,"abstract":"<div><div>With the emergence of cloud computing, the Internet of Things, and other large-scale environments, recommender systems have been faced with several issues, mainly (i) the distribution of user–item data across multiple information networks, (ii) privacy restrictions and the partial profiling of users and items caused by this distribution, (iii) the heterogeneity of user–item knowledge in different information networks. Furthermore, most approaches perform recommendations based on a single source of information, and do not handle the partial representation of users’ and items’ information in a federated way. Such isolated and non-collaborative behavior, in multi-source and cross-network information settings, often results in inaccurate and low-quality recommendations. To address these issues, we exploit the strengths of network representation learning and federated learning to propose a service recommendation approach in smart service networks. While NRL is employed to learn rich representations of entities (e.g., users, services, IoT objects), federated learning helps collaboratively infer a unified profile of users and items, based on the concept of <em>anchor user</em>, which are bridge entities connecting multiple information networks. These unified profiles are, finally, fed into a federated recommendation algorithm to select the top-rated services. Using a scenario from the smart healthcare context, the proposed approach was developed and validated on a multiplex information network built from real-world electronic medical records (157 diseases, 491 symptoms, 273 174 patients, treatments and anchors data). Experimental results under varied federated settings demonstrated the utility of cross-client knowledge (i.e. anchor links) and the collaborative reconstruction of composite embeddings (i.e. user representations) for improving recommendation accuracy. In terms of RMSE@K and MAE@K, our approach achieved an improvement of 54.41% compared to traditional single-network recommendation, as long as the federation and communication scale increased. Moreover, the gap with four federated approaches has reached 19.83 %, highlighting our approach’s ability to map local embeddings (i.e. user’s partial representations) into a complete view.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102422"},"PeriodicalIF":2.7,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143551137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}