Data & Knowledge Engineering最新文献_第2页

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-09-02 DOI: 10.1016/j.datak.2025.102508

Ali Norouzifar , Marcus Dees , Wil van der Aalst

{"title":"Rule-guided process discovery","authors":"Ali Norouzifar , Marcus Dees , Wil van der Aalst","doi":"10.1016/j.datak.2025.102508","DOIUrl":"10.1016/j.datak.2025.102508","url":null,"abstract":"<div><div>Event data extracted from information systems serves as the foundation for process mining, enabling the extraction of insights and identification of improvements. Process discovery focuses on deriving descriptive process models from event logs, which form the basis for conformance checking, performance analysis, and other applications. Traditional process discovery techniques predominantly rely on event logs, often overlooking supplementary information such as domain knowledge and process rules. These rules, which define relationships between activities, can be obtained through automated techniques like declarative process discovery or provided by domain experts based on process specifications. When used as an additional input alongside event logs, such rules have significant potential to guide process discovery. However, leveraging rules to discover high-quality imperative process models, such as BPMN models and Petri nets, remains an underexplored area in the literature. To address this gap, we propose an enhanced framework, IMr, which integrates discovered or user-defined rules into the process discovery workflow via a novel recursive approach. The IMr framework employs a divide-and-conquer strategy, using rules to guide the selection of process structures at each recursion step in combination with the input event log. We evaluate our approach on several real-world event logs and demonstrate that the discovered models better align with the provided rules without compromising their conformance to the event log. Additionally, we show that high-quality rules can improve model quality across well-known conformance metrics. This work highlights the importance of integrating domain knowledge into process discovery, enhancing the quality, interpretability, and applicability of the resulting process models.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102508"},"PeriodicalIF":2.7,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LQ-FJS: A logical query-digging fake-news judgment system with structured video-summarization engine using LLM LQ-FJS：基于LLM的结构化视频摘要引擎逻辑查询挖掘假新闻判断系统

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-09-02 DOI: 10.1016/j.datak.2025.102507

Jhing-Fa Wang , Din-Yuen Chan , Hsin-Chun Tsai , Bo-Xuan Fang

{"title":"LQ-FJS: A logical query-digging fake-news judgment system with structured video-summarization engine using LLM","authors":"Jhing-Fa Wang , Din-Yuen Chan , Hsin-Chun Tsai , Bo-Xuan Fang","doi":"10.1016/j.datak.2025.102507","DOIUrl":"10.1016/j.datak.2025.102507","url":null,"abstract":"<div><div>The proliferation of online social platforms can greatly benefit people by fostering remote relationships, but it also inevitably amplifies the impact of multimodal fake news on societal trust and ethics. Existing fake-news detection AI systems are still vulnerable to the inconspicuous and indiscernible multimodal misinformation, and often lacking interpretability and accuracy in cross-platform settings. Hence, we propose a new innovative logical query-digging fake-news judgment system (LQ-FJS) to tackle the above problem based on multimodal approach. The LQ-FJS verifies the truthfulness of claims made within multimedia news by converting video content into structured textual summaries. It then acts as an interpretable agent, explaining the reasons for identified fake news by the structured video-summarization engine (SVSE) to act as an interpretable detection intermediary agent. The SVSE generates condensed captions for raw video content, converting it into structured textual narratives. Then, LQ-FJS exploits these condensed captions to retrieve reliable information related to the video content from LLM. Thus, LQ-FJS cross-verifies external knowledge sources and internal LLM responses to determine whether contradictions exist with factual information through a multimodal inconsistency verification procedure. Our experiments demonstrate that the subtle summarization produced by SVSE can facilitate the generation of explanatory reports that mitigate large-scale trust deficits caused by opaque “black-box” models. Our experiments show that LQ-FJS improves F1 scores by 4.5% and 7.2% compared to state-of-the-art models (FactLLaMA 2023 and HiSS 2023), and increases 14% user trusts through interpretable conclusions.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102507"},"PeriodicalIF":2.7,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145019198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ABBA: Index structure for sequential pattern-based aggregate queries ABBA：基于顺序模式的聚合查询的索引结构

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-08-27 DOI: 10.1016/j.datak.2025.102506

Witold Andrzejewski, Tadeusz Morzy, Maciej Zakrzewicz

引用次数: 0

Source-Free Domain Adaptation with complex distribution considerations for time series data 考虑复杂分布的无源域自适应时间序列数据

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-08-21 DOI: 10.1016/j.datak.2025.102501

Jing Shang, Zunming Chen, Zhiwen Xiao, Zhihui Wu, Yifei Zhang, Jibing Wang

{"title":"Source-Free Domain Adaptation with complex distribution considerations for time series data","authors":"Jing Shang, Zunming Chen, Zhiwen Xiao, Zhihui Wu, Yifei Zhang, Jibing Wang","doi":"10.1016/j.datak.2025.102501","DOIUrl":"10.1016/j.datak.2025.102501","url":null,"abstract":"<div><div>Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained model from a labeled source domain to an unlabeled target domain without accessing source domain data, thereby protecting source domain privacy. Although SFDA has recently been applied to time series data, the inherent complex distribution characteristics including temporal variability and distributional diversity of such data remain underexplored. Time series data exhibit significant dynamic variability influenced by collection environments, leading to discrepancies between sequences. Additionally, multidimensional time series data face distributional diversity across dimensions. These complex characteristics increase the learning difficulty for source models and widen the adaptation gap between the source and target domains. To address these challenges, this paper proposes a novel SFDA method for time series data, named Adaptive Latent Subdomain feature extraction and joint Prediction (ALSP). The method divides the source domain, which has a complex distribution, into multiple latent subdomains with relatively simple distributions, thereby effectively capturing the features of different subdistributions. It extracts latent domain-specific and domain-invariant features to identify subdomain-specific characteristics. Furthermore, it combines domain-specific classifiers and a domain-invariant classifier to enhance model performance through multi-classifier joint prediction. During target domain adaptation, ALSP reduces domain dependence by extracting invariant features, thereby narrowing the distributional gap between the source and target domains. Simultaneously, it leverages prior knowledge from the source domain distribution to support the hypothesis space and dynamically adapt to the target domain. Experiments on three real-world datasets demonstrate that ALSP achieves superior performance in cross-domain time series classification tasks, significantly outperforming existing methods.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102501"},"PeriodicalIF":2.7,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144918105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Elevating human-machine collaboration in NLP for enhanced content creation and decision support 提升NLP中的人机协作，以增强内容创建和决策支持

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-08-21 DOI: 10.1016/j.datak.2025.102505

Priyanka V. Deshmukh, Aniket K. Shahade

{"title":"Elevating human-machine collaboration in NLP for enhanced content creation and decision support","authors":"Priyanka V. Deshmukh, Aniket K. Shahade","doi":"10.1016/j.datak.2025.102505","DOIUrl":"10.1016/j.datak.2025.102505","url":null,"abstract":"<div><div>Human-machine collaboration in Natural Language Processing (NLP) is revolutionizing content creation and decision support by seamlessly combining the strengths of both entities for enhanced efficiency and quality. The lack of seamless integration between human creativity and machine efficiency in NLP hinders optimal content creation and decision support. The objective of this study is to explore and promote the integration of human-machine collaboration in NLP to enhance both content creation and decision support processes. Data Acquisition for NLP requests involves defining the task and target audience, identifying relevant data sources like text documents and web data, and incorporating human expertise for data curation through validation and annotation. Machine processing techniques like tokenization, stemming/lemmatization, and removal of stop words, as well as human input for tasks like data annotation and error correction, to improve data quality and relevance for NLP applications. The combination of automated processing and human feedback leads to more precise and dependable effects. Techniques such as sentiment analysis, topic modelling, and entity recognition are utilized to excerpt valued perceptions from the data and enhance collaboration between humans and machines. These techniques help to streamline the NLP process and ensure that the system is providing accurate and relevant information to users. The analysis of NLP models in machine processing involves training the models to perform specific tasks, such as summarization, sentiment analysis, information extraction, trend identification, and creative content generation. The results show that social media leads with 90% usage, pivotal for audience engagement, while blogs at 78% highlight their depth in content creation implementation using Python software. These trained models are then used to improve decision-making processes, generate creative content, and enhance the accuracy of search results. The future scope involves leveraging advanced NLP techniques to deepen the collaboration between humans and machines for more effective content creation and decision support.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102505"},"PeriodicalIF":2.7,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian 在印尼语中扩展端到端实体链接评估的大型语言模型

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-08-19 DOI: 10.1016/j.datak.2025.102504

Ria Hari Gusmita , Asep Fajar Firmansyah , Hamada M. Zahera , Axel-Cyrille Ngonga Ngomo

{"title":"ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian","authors":"Ria Hari Gusmita , Asep Fajar Firmansyah , Hamada M. Zahera , Axel-Cyrille Ngonga Ngomo","doi":"10.1016/j.datak.2025.102504","DOIUrl":"10.1016/j.datak.2025.102504","url":null,"abstract":"<div><div>Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their effectiveness in low-resource languages remains underexplored, particularly in complex tasks such as end-to-end Entity Linking (EL), which requires both mention detection and disambiguation against a knowledge base (KB). In earlier work, we introduced IndEL — the first end-to-end EL benchmark dataset for the Indonesian language — covering both a general domain (news) and a specific domain (religious text from the Indonesian translation of the Quran), and evaluated four traditional end-to-end EL systems on this dataset. In this study, we propose ELEVATE-ID, a comprehensive evaluation framework for assessing LLM performance on end-to-end EL in Indonesian. The framework evaluates LLMs under both zero-shot and fine-tuned conditions, using multilingual and Indonesian monolingual models, with Wikidata as the target KB. Our experiments include performance benchmarking, generalization analysis across domains, and systematic error analysis. Results show that GPT-4 and GPT-3.5 achieve the highest accuracy in zero-shot and fine-tuned settings, respectively. However, even fine-tuned GPT-3.5 underperforms compared to DBpedia Spotlight — the weakest of the traditional model baselines — in the general domain. Interestingly, GPT-3.5 outperforms Babelfy in the specific domain. Generalization analysis indicates that fine-tuned GPT-3.5 adapts more effectively to cross-domain and mixed-domain scenarios. Error analysis uncovers persistent challenges that hinder LLM performance: difficulties with non-complete mentions, acronym disambiguation, and full-name recognition in formal contexts. These issues point to limitations in mention boundary detection and contextual grounding. Indonesian-pretrained LLMs, Komodo and Merak, reveal core weaknesses: template leakage and entity hallucination, respectively—underscoring architectural and training limitations in low-resource end-to-end EL.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102504"},"PeriodicalIF":2.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144889217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Time-Aware Complex Question Answering over Temporal Knowledge Graph 基于时间知识图的时间感知复杂问题回答

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-08-18 DOI: 10.1016/j.datak.2025.102503

Luyi Bai, Tongyue Zhang, Guangchen Feng

{"title":"Time-Aware Complex Question Answering over Temporal Knowledge Graph","authors":"Luyi Bai, Tongyue Zhang, Guangchen Feng","doi":"10.1016/j.datak.2025.102503","DOIUrl":"10.1016/j.datak.2025.102503","url":null,"abstract":"<div><div>Knowledge Graph Question Answering (KGQA) is a crucial topic in Knowledge Graphs (KGs), with the objective of retrieving the corresponding facts from KGs to answer given questions. In practical applications, facts in KGs usually have time constraints, thus, question answering on Temporal Knowledge Graphs (TKGs) has attracted extensive attention. Existing Temporal Knowledge Graph Question Answering (TKGQA) methods focus on dealing with complex questions involving multiple facts, and mainly face two challenges. First, these methods only consider matching questions with facts in TKGs to identify the answer, ignoring the temporal order between different facts, which makes it challenging to solve the questions involving temporal order. Second, they usually focus on the representation of the question text while neglecting the rich semantic information within the questions, which leads to certain limitations in understanding question. To address the above challenges, this research proposes a model named Time-Aware Complex Question Answering (TA-CQA). Specifically, we extend the Temporal Knowledge Graph Embedding (TKGE) model by incorporating temporal order information into the embedding vectors, ensuring that the model can distinguish the temporal order of different facts. To enhance the semantic representation of the question, we integrate question information using attention mechanism and learnable encoder. Different from the previous TKGQA methods, we propose time relevance measurement to further enhance the accuracy of answer prediction by better capturing the correlation between question information and time information. Multiple sets of experiments on CronQuestions and TimeQuestions demonstrate our model’s superior performance across all question types. In particular, for complex questions involving multiple facts, the hit@1 values are increased by 3.2% and 3.5% respectively.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102503"},"PeriodicalIF":2.7,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144865507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conceptual modeling of user perspectives — From data warehouses to alliance-driven data ecosystems 用户视角的概念建模——从数据仓库到联盟驱动的数据生态系统

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-08-14 DOI: 10.1016/j.datak.2025.102502

Sandra Geisler , Christoph Quix , István Koren , Matthias Jarke

{"title":"Conceptual modeling of user perspectives — From data warehouses to alliance-driven data ecosystems","authors":"Sandra Geisler , Christoph Quix , István Koren , Matthias Jarke","doi":"10.1016/j.datak.2025.102502","DOIUrl":"10.1016/j.datak.2025.102502","url":null,"abstract":"<div><div>The increasing complexity of modern information systems has highlighted the need for advanced conceptual modeling techniques that incorporate multi-perspective and view-based approaches. This paper explores the role of multi-perspective modeling and view modeling in designing distributed, heterogeneous systems while addressing diverse user requirements and ensuring semantic consistency. These methods enable the representation of multiple viewpoints, traceability, and dynamic integration across different levels of abstraction. Key advancements in schema mapping, view maintenance, and semantic metadata management are examined, illustrating how they support query optimization, data quality, and interoperability. We discuss how data management architectures, such as data ecosystems, data warehouses, and data lakes, leverage these innovations to enable flexible and sustainable data sharing. By integrating user-centric and goal-oriented modeling frameworks, the alignment of technical design with organizational and social requirements is emphasized. Future challenges include the need for enhanced reasoning capabilities and collaborative tools to manage the growing complexity of interconnected systems while maintaining adaptability and trust.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102502"},"PeriodicalIF":2.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144988186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Corrigendum to “Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories” [Knowledge and Data Engineering 154, 2024, 102351] “揭示概念建模的基础和演变——知识结构、当前主题和轨迹”的勘误表[j] .知识与数据工程154,2024,102351。

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-08-11 DOI: 10.1016/j.datak.2025.102498

Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey

引用次数: 0

Conceptual modeling: A large language model assistant for characterizing research contributions 概念建模：描述研究成果的大型语言模型助手

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-08-11 DOI: 10.1016/j.datak.2025.102497

Stephen W. Liddle , Heinrich C. Mayr , Oscar Pastor , Veda C. Storey , Bernhard Thalheim

引用次数: 0