Data & Knowledge Engineering最新文献

筛选
英文 中文
Reinforcement learning for optimizing responses in care processes
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-02-03 DOI: 10.1016/j.datak.2025.102412
Olusanmi A. Hundogan , Bart J. Verhoef , Patrick Theeven , Hajo A. Reijers , Xixi Lu
{"title":"Reinforcement learning for optimizing responses in care processes","authors":"Olusanmi A. Hundogan ,&nbsp;Bart J. Verhoef ,&nbsp;Patrick Theeven ,&nbsp;Hajo A. Reijers ,&nbsp;Xixi Lu","doi":"10.1016/j.datak.2025.102412","DOIUrl":"10.1016/j.datak.2025.102412","url":null,"abstract":"<div><div>Prescriptive process monitoring aims to derive recommendations for optimizing complex processes. While previous studies have successfully used reinforcement learning techniques to derive actionable policies in business processes, care processes present unique challenges due to their dynamic and multifaceted nature. For example, at any stage of a care process, a multitude of actions is possible. In this study, we follow the Reinforcement Learning (RL) approach and present a general approach that uses event data to build and train Markov decision processes. We proposed three algorithms including one that takes the elapsed time into account when transforming an event log into a semi-Markov decision process. We evaluated the RL approach using an aggression incident data set. Specifically, the goal is to optimize staff member actions when clients are displaying different types of aggressive behavior. The Q-learning and SARSA are used to find optimal policies. Our results showed that the derived policies align closely with current practices while offering alternative options in specific situations. By employing RL in the context of care processes, we contribute to the ongoing efforts to enhance decision-making and efficiency in dynamic and complex environments.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102412"},"PeriodicalIF":2.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetric non negative matrices factorization applied to the detection of communities in graphs and forensic image analysis
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-01-31 DOI: 10.1016/j.datak.2025.102411
Gaël Marec , Nédra Mellouli
{"title":"Symmetric non negative matrices factorization applied to the detection of communities in graphs and forensic image analysis","authors":"Gaël Marec ,&nbsp;Nédra Mellouli","doi":"10.1016/j.datak.2025.102411","DOIUrl":"10.1016/j.datak.2025.102411","url":null,"abstract":"<div><div>With the proliferation of data, particularly on social networks, the accuracy of the information becomes uncertain. In this context, a major challenge lies in detecting image manipulations, where alterations are made to deceive observers. Aligning with the anomaly detection issue, recent methods approach the detection of image transformations as a community detection problem within graphs associated with the images. In this study, we propose using a community clustering method based on non-negative symmetric matrix factorization. By examining several experiments detecting alterations in manipulated images, we assess the method’s robustness and discuss potential enhancements. We also present a process for automatically generating visually and semantically coherent forged images. Additionally, we provide a web application to demonstrate this process.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102411"},"PeriodicalIF":2.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143346792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REDIRE: Extreme REduction DImension for extRactivE Summarization
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-01-26 DOI: 10.1016/j.datak.2025.102407
Christophe Rodrigues , Marius Ortega , Aurélien Bossard , Nédra Mellouli
{"title":"REDIRE: Extreme REduction DImension for extRactivE Summarization","authors":"Christophe Rodrigues ,&nbsp;Marius Ortega ,&nbsp;Aurélien Bossard ,&nbsp;Nédra Mellouli","doi":"10.1016/j.datak.2025.102407","DOIUrl":"10.1016/j.datak.2025.102407","url":null,"abstract":"<div><div>This paper presents an automatic unsupervised summarization model capable of extracting the most important sentences from a corpus. The unsupervised aspect makes it possible to do away with large corpora, made up of documents and their reference summaries, and to directly process documents potentially made up of several thousand words. To extract sentences in a summary, we use pre-entrained word embeddings to represent the documents. From this thick cloud of word vectors, we apply an extreme dimension reduction to identify important words, which we group by proximity. Sentences are extracted using linear constraint solving to maximize the information present in the summary. We evaluate the approach on large documents and present very encouraging initial results.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102407"},"PeriodicalIF":2.7,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143135194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Logic-infused knowledge graph QA: Enhancing large language models for specialized domains through Prolog integration
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-01-24 DOI: 10.1016/j.datak.2025.102406
Aneesa Bashir, Rong Peng, Yongchang Ding
{"title":"Logic-infused knowledge graph QA: Enhancing large language models for specialized domains through Prolog integration","authors":"Aneesa Bashir,&nbsp;Rong Peng,&nbsp;Yongchang Ding","doi":"10.1016/j.datak.2025.102406","DOIUrl":"10.1016/j.datak.2025.102406","url":null,"abstract":"<div><div>Efficiently answering questions over complex, domain-specific knowledge graphs remain a substantial challenge, as large language models (LLMs) often lack the logical reasoning abilities and particular knowledge required for such tasks. This paper presents a novel framework integrating LLMs with logical programming languages like Prolog for Logic-Infused Knowledge Graph Question Answering (KGQA) in specialized domains. The proposed methodology uses a transformer-based encoder–decoder architecture. An encoder reads the question, and a named entity recognition (NER) module connects entities to the knowledge graph. The extracted entities are fed into a grammar-guided decoder, producing a logical form (Prolog query) that captures the semantic constraints and relationships. The Prolog query is executed over the knowledge graph to perform symbolic reasoning and retrieve relevant answer entities. Comprehensive experiments on the MetaQA benchmark dataset demonstrate the superior performance of this logic-infused method in accurately identifying correct answer entities from the knowledge graph. Even when trained on a limited subset of annotated data, it outperforms state-of-the-art baselines, achieving 89.60 % and F1-scores of up to 89.61 %, showcasing its effectiveness in enhancing large language models with symbolic reasoning capabilities for specialized question-answering tasks. The seamless integration of LLMs and logical programming enables the proposed framework to reason effectively over complex, domain-specific knowledge graphs, overcoming a key limitation of existing KGQA systems. In specialized domains, the interpretability provided by representing questions such as Prologue queries is a valuable asset.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102406"},"PeriodicalIF":2.7,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143135551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A methodology for the systematic design of storytelling dashboards applied to Industry 4.0
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-01-22 DOI: 10.1016/j.datak.2025.102410
Ana Lavalle , Alejandro Maté , Maribel Yasmina Santos , Pedro Guimarães , Juan Trujillo , Antonina Santos
{"title":"A methodology for the systematic design of storytelling dashboards applied to Industry 4.0","authors":"Ana Lavalle ,&nbsp;Alejandro Maté ,&nbsp;Maribel Yasmina Santos ,&nbsp;Pedro Guimarães ,&nbsp;Juan Trujillo ,&nbsp;Antonina Santos","doi":"10.1016/j.datak.2025.102410","DOIUrl":"10.1016/j.datak.2025.102410","url":null,"abstract":"<div><div>Dashboards are popular tools for presenting key insights to decision-makers by translating large volumes of data into clear information. However, while individual visualizations may effectively answer specific questions, they often fail to connect in a way that conveys the overall narrative, leaving decision-makers without a cohesive understanding of the area under analysis.</div><div>This paper presents a novel methodology for the systematic design of holistic dashboards, moving from analytical requirements to storytelling dashboards. Our approach ensures that all visualizations are aligned with the analytical goals of decision-makers. It includes several key steps: capturing analytical requirements through the i* framework; structuring and refining these requirements into a tree model to reflect the decision-maker’s mental analysis; identifying and preparing relevant data; capturing the key concepts and relationships for the composition of the cohesive storytelling dashboard through a novel storytelling conceptual model; finally, implementing and integrating the visualizations into the dashboard, ensuring coherence and alignment with the decision-maker’s needs. Our methodology has been applied in real-world industrial environments. We evaluated its impact through a controlled experiment. The findings show that storytelling dashboards significantly improve data interpretation, reduce misinterpretations, and enhance the overall user experience compared to traditional dashboards.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102410"},"PeriodicalIF":2.7,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensuring safety in digital spaces: Detecting code-mixed hate speech in social media posts
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-01-18 DOI: 10.1016/j.datak.2025.102409
Pradeep Kumar Roy , Abhinav Kumar
{"title":"Ensuring safety in digital spaces: Detecting code-mixed hate speech in social media posts","authors":"Pradeep Kumar Roy ,&nbsp;Abhinav Kumar","doi":"10.1016/j.datak.2025.102409","DOIUrl":"10.1016/j.datak.2025.102409","url":null,"abstract":"<div><div>Social networks strive to offer positive content to users, yet a considerable amount of inappropriate material, such as rumors, fake news, and hate speech, persists. Despite significant efforts to detect and prevent hate speech early, it remains widespread due to issues like misspellings and mixed language in posts. To address these challenges, this research utilizes advanced algorithms like CNN, LSTM, and BERT to develop an automated system for detecting hate speech in Telugu-English code-mixed posts. Additionally, evaluating the effectiveness of data translation and transliteration approaches for detecting hate in mixed language. Results indicate that the transliteration approach achieves the highest accuracy, with a performance of 75% accuracy, surpassing raw and translated data by 1% and 3%, respectively. The proposed system may effectively minimizes hate speech and offensive content on social media platforms, resulting in an enhanced user experience. From a managerial perspective, this research presents numerous benefits, such as improved content moderation, optimized resource allocation, data-driven decision-making, enhanced user satisfaction, strengthened reputation management, and greater scalability. These advancements underscore the potential of utilizing advanced technologies to address complex challenges in social media management.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102409"},"PeriodicalIF":2.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on big data classification
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-01-11 DOI: 10.1016/j.datak.2025.102408
Keerthana G , Sherly Puspha Annabel L
{"title":"A survey on big data classification","authors":"Keerthana G ,&nbsp;Sherly Puspha Annabel L","doi":"10.1016/j.datak.2025.102408","DOIUrl":"10.1016/j.datak.2025.102408","url":null,"abstract":"<div><div>Big data refers to vast volumes of structured and unstructured data that are too large or complex for traditional data-processing methods to handle efficiently. The importance of big data lies in its ability to provide actionable insights and drive decision-making across various industries, such as healthcare, finance, marketing, and government, by enabling more accurate predictions, and personalized services. Moreover, traditional big data classification approaches, often struggle with big data's complexity. They failed to manage high-dimensionality, deal with non-linearity, or process data in real time. For effective big data classification, robust computing infrastructure, scalable storage solutions, and advanced algorithms are required. This survey provides a thorough assessment of 50 research papers based on big data classification, by identifying the struggle faced by current big data classification techniques to process and classify data efficiently without substantial computational resources. The analysis is enabled on a variety of scenarios and key points. In this case, this survey will enable the classification of the techniques utilized for big data classification that is made based on the rule-based, deep learning-based, optimization-based, machine learning-based techniques and so on. Furthermore, the classification of techniques, tools used, published year, used software tool, and performance metrics are contemplated for the analysis in big data classification. At last, the research gaps and technical problems of the techniques in a way that makes the motivations for creating an efficient model of enabling big data classification optimal.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102408"},"PeriodicalIF":2.7,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Textual data augmentation using generative approaches - Impact on named entity recognition tasks
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-01-10 DOI: 10.1016/j.datak.2024.102403
Danrun Cao , Nicolas Béchet , Pierre-François Marteau , Oussama Ahmia
{"title":"Textual data augmentation using generative approaches - Impact on named entity recognition tasks","authors":"Danrun Cao ,&nbsp;Nicolas Béchet ,&nbsp;Pierre-François Marteau ,&nbsp;Oussama Ahmia","doi":"10.1016/j.datak.2024.102403","DOIUrl":"10.1016/j.datak.2024.102403","url":null,"abstract":"<div><div>Industrial applications of Named Entity Recognition (NER) are usually confronted with small and imbalanced corpora. This could harm the performance of trained and finetuned recognition models, especially when they encounter unknown data. In this study we develop three generation-based data enrichment approaches, in order to increase the number of examples of underrepresented entities. We compare the impact of enriched corpora on NER models, using both non-contextual (fastText) and contextual (Bert-like) embedding models to provide discriminant features to a biLSTM-CRF used as an entity classifier. The approach is evaluated on a contract renewal detection task applied to a corpus of calls for tenders. The results show that the proposed data enrichment procedure effectively improves the NER model’s effectiveness when applied on both known and unknown data.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102403"},"PeriodicalIF":2.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-01-03 DOI: 10.1016/j.datak.2024.102405
Wissal Benjira , Faten Atigui , Bénédicte Bucher , Malika Grim-Yefsah , Nicolas Travers
{"title":"Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach","authors":"Wissal Benjira ,&nbsp;Faten Atigui ,&nbsp;Bénédicte Bucher ,&nbsp;Malika Grim-Yefsah ,&nbsp;Nicolas Travers","doi":"10.1016/j.datak.2024.102405","DOIUrl":"10.1016/j.datak.2024.102405","url":null,"abstract":"<div><div>Meeting the Sustainable Development Goals (SDGs) presents a large-scale challenge for all countries. SDGs established by the United Nations provide a comprehensive framework for addressing global issues. To monitor progress towards these goals, we need to develop key performance indicators and integrate and analyze heterogeneous datasets. The definition of these indicators requires the use of existing data and metadata. However, the diversity of data sources and formats raises major issues in terms of structuring and integration. Despite the abundance of open data and metadata, its exploitation remains limited, leaving untapped potential for guiding urban policies towards sustainability. Thus, this paper introduces a novel approach for SDG indicator computation, leveraging the capabilities of Large Language Models (LLMs) and Knowledge Graphs (KGs). We propose a method that combines rule-based filtering with LLM-powered schema mapping to establish semantic correspondences between diverse data sources and SDG indicators, including disaggregation. Our approach integrates these mappings into a KG, which enables indicator computation by querying graph’s topology. We evaluate our method through a case study focusing on the SDG Indicator 11.7.1 about accessibility of public open spaces. Our experimental results show significant improvements in accuracy, precision, recall, and F1-score compared to traditional schema mapping techniques.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102405"},"PeriodicalIF":2.7,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmenting post-hoc explanations for predictive process monitoring with uncertainty quantification via conformalized Monte Carlo dropout
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-12-28 DOI: 10.1016/j.datak.2024.102402
Nijat Mehdiyev, Maxim Majlatow, Peter Fettke
{"title":"Augmenting post-hoc explanations for predictive process monitoring with uncertainty quantification via conformalized Monte Carlo dropout","authors":"Nijat Mehdiyev,&nbsp;Maxim Majlatow,&nbsp;Peter Fettke","doi":"10.1016/j.datak.2024.102402","DOIUrl":"10.1016/j.datak.2024.102402","url":null,"abstract":"<div><div>This study presents a novel approach to improve the transparency and reliability of deep learning models in predictive process monitoring (PPM) by integrating uncertainty quantification (UQ) and explainable artificial intelligence (XAI) techniques. We introduce the conformalized Monte Carlo dropout method, which combines Monte Carlo dropout for uncertainty estimation with conformal prediction (CP) to generate reliable prediction intervals. Additionally, we enhance post-hoc explanation techniques such as individual conditional expectation (ICE) plots and partial dependence plots (PDP) with uncertainty information, including credible and conformal predictive intervals. Our empirical evaluation in the manufacturing industry demonstrates the effectiveness of these approaches in refining strategic and operational decisions. This research contributes to advancing PPM and machine learning by bridging the gap between model transparency and high-stakes decision-making scenarios.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102402"},"PeriodicalIF":2.7,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信