{"title":"Time-Aware Complex Question Answering over Temporal Knowledge Graph","authors":"Luyi Bai, Tongyue Zhang, Guangchen Feng","doi":"10.1016/j.datak.2025.102503","DOIUrl":"10.1016/j.datak.2025.102503","url":null,"abstract":"<div><div>Knowledge Graph Question Answering (KGQA) is a crucial topic in Knowledge Graphs (KGs), with the objective of retrieving the corresponding facts from KGs to answer given questions. In practical applications, facts in KGs usually have time constraints, thus, question answering on Temporal Knowledge Graphs (TKGs) has attracted extensive attention. Existing Temporal Knowledge Graph Question Answering (TKGQA) methods focus on dealing with complex questions involving multiple facts, and mainly face two challenges. First, these methods only consider matching questions with facts in TKGs to identify the answer, ignoring the temporal order between different facts, which makes it challenging to solve the questions involving temporal order. Second, they usually focus on the representation of the question text while neglecting the rich semantic information within the questions, which leads to certain limitations in understanding question. To address the above challenges, this research proposes a model named Time-Aware Complex Question Answering (TA-CQA). Specifically, we extend the Temporal Knowledge Graph Embedding (TKGE) model by incorporating temporal order information into the embedding vectors, ensuring that the model can distinguish the temporal order of different facts. To enhance the semantic representation of the question, we integrate question information using attention mechanism and learnable encoder. Different from the previous TKGQA methods, we propose time relevance measurement to further enhance the accuracy of answer prediction by better capturing the correlation between question information and time information. Multiple sets of experiments on CronQuestions and TimeQuestions demonstrate our model’s superior performance across all question types. In particular, for complex questions involving multiple facts, the hit@1 values are increased by 3.2% and 3.5% respectively.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102503"},"PeriodicalIF":2.7,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144865507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandra Geisler , Christoph Quix , István Koren , Matthias Jarke
{"title":"Conceptual modeling of user perspectives — From data warehouses to alliance-driven data ecosystems","authors":"Sandra Geisler , Christoph Quix , István Koren , Matthias Jarke","doi":"10.1016/j.datak.2025.102502","DOIUrl":"10.1016/j.datak.2025.102502","url":null,"abstract":"<div><div>The increasing complexity of modern information systems has highlighted the need for advanced conceptual modeling techniques that incorporate multi-perspective and view-based approaches. This paper explores the role of multi-perspective modeling and view modeling in designing distributed, heterogeneous systems while addressing diverse user requirements and ensuring semantic consistency. These methods enable the representation of multiple viewpoints, traceability, and dynamic integration across different levels of abstraction. Key advancements in schema mapping, view maintenance, and semantic metadata management are examined, illustrating how they support query optimization, data quality, and interoperability. We discuss how data management architectures, such as data ecosystems, data warehouses, and data lakes, leverage these innovations to enable flexible and sustainable data sharing. By integrating user-centric and goal-oriented modeling frameworks, the alignment of technical design with organizational and social requirements is emphasized. Future challenges include the need for enhanced reasoning capabilities and collaborative tools to manage the growing complexity of interconnected systems while maintaining adaptability and trust.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102502"},"PeriodicalIF":2.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144988186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey
{"title":"Corrigendum to “Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories” [Knowledge and Data Engineering 154, 2024, 102351]","authors":"Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey","doi":"10.1016/j.datak.2025.102498","DOIUrl":"10.1016/j.datak.2025.102498","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102498"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephen W. Liddle , Heinrich C. Mayr , Oscar Pastor , Veda C. Storey , Bernhard Thalheim
{"title":"Conceptual modeling: A large language model assistant for characterizing research contributions","authors":"Stephen W. Liddle , Heinrich C. Mayr , Oscar Pastor , Veda C. Storey , Bernhard Thalheim","doi":"10.1016/j.datak.2025.102497","DOIUrl":"10.1016/j.datak.2025.102497","url":null,"abstract":"<div><div>The body of conceptual modeling research publications is vast and diverse, making it challenging for a single researcher or research group to fully comprehend the field’s overall development. Although some approaches have been proposed to help organize these research contributions, it is still unrealistic to expect human experts to manually comprehend and characterize all of this research. However, as generative AI tools based on large language models, such as ChatGPT, become increasingly sophisticated, it may be possible to replace or augment tedious, manual work with semi-automated approaches. In this research, we present a customized version of ChatGPT that is tuned to the task of characterizing conceptual modeling research. Experiments with this AI tool demonstrate that it is feasible to create a usable knowledge survey for the continually evolving body of conceptual modeling research contributions.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102497"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144840695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger
{"title":"Semantic-aware query answering with Large Language Models","authors":"Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger","doi":"10.1016/j.datak.2025.102494","DOIUrl":"10.1016/j.datak.2025.102494","url":null,"abstract":"<div><div>In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102494"},"PeriodicalIF":2.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144830326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Granularity History Graph Network for temporal knowledge graph reasoning","authors":"Jun Zhu , Yan Fu , Junlin Zhou , Duanbing Chen","doi":"10.1016/j.datak.2025.102496","DOIUrl":"10.1016/j.datak.2025.102496","url":null,"abstract":"<div><div>Reasoning on knowledge graphs (KGs) can be categorized into two main categories: predicting missing facts and predicting unknown facts in the future. However, when it comes to future prediction, it becomes crucial to incorporate temporal information and add timestamps to KGs, thereby forming temporal knowledge graphs (TKGs). The key aspect of reasoning lies in treating a TKG as a sequence of static KGs in order to effectively grasp temporal information. Additionally, it is equally important to consider the evolution of facts from various perspectives. Existing models tend to replicate the original time granularity of data while modeling TKGs, often disregarding the impact of the minimum time period in the evolution process. Furthermore, historical information is typically perceived as a single sequence of facts, with a lack of diversity in strategies (e.g., modeling sequences with varying granularities or lengths) to capture complex temporal dynamics. This unified approach may lead to the loss of critical information during the modeling process. However, the process of historical evolution often exhibits complex periodic transformation characteristics, and associated events do not necessarily follow a fixed time period. Therefore, a single granularity is insufficient to model periodic events with dynamic changes in history. Consequently, we propose the Multi-Granularity History Graph Network (MGHGN), an innovative model for TKG reasoning. MGHGN dynamically models various event evolution periods by constructing representations with multiple time granularities, and integrates various modeling methods to reason the potential facts in the future. Our model adeptly captures valuable insights from the history of multi-granularity and employs diverse approaches to model historical information. The experimental results on six benchmark datasets demonstrate that the MGHGN outperforms state-of-the-art methods.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102496"},"PeriodicalIF":2.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian H. Goldmann, Marcos R. Machado, Joerg R. Osterrieder
{"title":"Advancing credit risk assessment in the retail banking industry: A hybrid approach using time series and supervised learning models","authors":"Sebastian H. Goldmann, Marcos R. Machado, Joerg R. Osterrieder","doi":"10.1016/j.datak.2025.102490","DOIUrl":"10.1016/j.datak.2025.102490","url":null,"abstract":"<div><div>Credit risk assessment remains a central challenge in retail banking, with conventional models often falling short in predictive accuracy and adaptability to granular customer behavior. This study explores the potential of Time Series Classification (TSC) algorithms to enhance credit risk modeling by analyzing customers’ historical end-of-day balance data. We compare traditional Machine Learning (ML) models – including Logistic Regression and XGBoost – with advanced TSC methods such as Shapelets, Long Short-Term Memory (LSTM) networks, and Canonical Interval Forests (CIF). Our results show that TSC algorithms, particularly CIF and Shapelet-based methods, significantly outperform traditional approaches. When using CIF-derived Probability of Default (PD) estimates as additional features in an XGBoost model, predictive performance improved notably: the combined model achieved an Area under the Curve (AUC) of 0.81, compared to 0.79 for CIF alone and 0.77 for XGBoost without the CIF input. These findings underscore the value of integrating temporal features into credit risk assessment frameworks. Moreover, the complementary strengths of the TSC and XGBoost models across different Receiver Operating Characteristic (ROC) curve regions demonstrate the practical benefits of model stacking. However, performance dropped when using aggregated monthly data, highlighting the importance of preserving high-frequency behavioral signals. This research contributes to more accurate, interpretable, and robust credit risk models and offers a pathway for banks to leverage time series data for improved risk forecasting.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102490"},"PeriodicalIF":2.7,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144711634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TEDA-driven adaptive stream clustering for concept drift detection","authors":"Zahra Rezaei , Hedieh Sajedi","doi":"10.1016/j.datak.2025.102484","DOIUrl":"10.1016/j.datak.2025.102484","url":null,"abstract":"<div><div>The rapid growth of data-driven applications has underlined the need for strong methods to analyze and cluster streaming data. Data stream clustering is envisioned to uncover interesting knowledge concealed within data streams, typically fast, structure- and pattern-evolving. However, most current methods suffer significant challenges like the inability to detect clusters with arbitrarily shaped, handling outliers, adaptation to concept drift, and reducing dependency on predefined parameters. To tackle these challenges, we propose a novel Typicality and Eccentricity Data Analysis (TEDA)-based concept drift detection stream clustering algorithm, which can divide the clustering problem into two subproblems, micro-clusters and macro-clusters. Our methodology utilizes a TEDA-based concept drift detection approach to enhance data stream clustering. Our method employs two models in monitoring the data stream to keep the information of a previous concept while tracking the emergence of a new concept. The models represent two distinct concepts when the intersection of data samples is significantly low, as described by the Jaccard Index. TEDA-CDD is compared to known methods from the literature in experiments using synthetic and real-world datasets simulating real-world applications. By dynamically updating clusters through model reuse or creation, our algorithm ensures adaptability to real-time changes in data distributions. The proposed algorithm was comprehensively evaluated using the KDDCup-99 dataset, an intrusion detection system benchmark under diverse scenarios, including concept drifts, evolving data distributions, varying cluster sizes, and outlier conditions. Empirical results demonstrated the algorithm’s superiority over baseline approaches such as DenStream, DStream, ClusTree, and DGStream, achieving perfect performance metrics. These findings emphasize the effectiveness of our algorithm in addressing real-world streaming data challenges, combining high sensitivity to concept drift with computational efficiency, adaptability, and robust clustering capabilities.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102484"},"PeriodicalIF":2.7,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}