Data & Knowledge Engineering最新文献

筛选
英文 中文
Coupling MDL and Markov chain Monte Carlo to sample diverse pattern sets 耦合MDL和马尔可夫链蒙特卡罗采样不同的模式集
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-12-20 DOI: 10.1016/j.datak.2024.102393
François Camelin , Samir Loudni , Gilles Pesant , Charlotte Truchet
{"title":"Coupling MDL and Markov chain Monte Carlo to sample diverse pattern sets","authors":"François Camelin ,&nbsp;Samir Loudni ,&nbsp;Gilles Pesant ,&nbsp;Charlotte Truchet","doi":"10.1016/j.datak.2024.102393","DOIUrl":"10.1016/j.datak.2024.102393","url":null,"abstract":"<div><div>Exhaustive methods of pattern extraction in a database face real obstacles to speed and output control of patterns: a large number of patterns are extracted, many of which are redundant. Pattern extraction methods through sampling, which allow for controlling the size of the outputs while ensuring fast response times, provide a solution to these two problems. However, these methods do not provide high-quality patterns: they return patterns that are very infrequent in the database. Furthermore, they do not scale. To ensure more frequent and diversified patterns in the output, we propose integrating compression methods into sampling to select the most representative patterns from the sampled transactions. We demonstrate that our approach improves the state of the art in terms of diversity of produced patterns.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102393"},"PeriodicalIF":2.7,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiBenchLLM: Historical Inquiry Benchmarking for Large Language Models HiBenchLLM:大型语言模型的历史查询基准测试
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-12-15 DOI: 10.1016/j.datak.2024.102383
Mathieu Chartier , Nabil Dakkoune , Guillaume Bourgeois , Stéphane Jean
{"title":"HiBenchLLM: Historical Inquiry Benchmarking for Large Language Models","authors":"Mathieu Chartier ,&nbsp;Nabil Dakkoune ,&nbsp;Guillaume Bourgeois ,&nbsp;Stéphane Jean","doi":"10.1016/j.datak.2024.102383","DOIUrl":"10.1016/j.datak.2024.102383","url":null,"abstract":"<div><div>Large Language Models (LLMs) such as ChatGPT or Bard have significantly transformed information retrieval and captured the public’s attention with their ability to generate customized responses across various topics. In this paper, we analyze the capabilities of different LLMs to generate responses related to historical facts in French. Our objective is to evaluate their reliability, comprehensiveness, and relevance for direct usability or extraction. To accomplish this, we propose a benchmark consisting of numerous historical questions covering various types, themes, and difficulty levels. Our evaluation of responses provided by 14 selected LLMs reveals several limitations in both content and structure. In addition to an overall insufficient precision rate, we observe uneven treatment of the French language, along with issues related to verbosity and inconsistency in the responses generated by LLMs.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102383"},"PeriodicalIF":2.7,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering of timed sequences – Application to the analysis of care pathways 时间序列的聚类。护理路径分析中的应用
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-12-10 DOI: 10.1016/j.datak.2024.102401
Thomas Guyet , Pierre Pinson , Enoal Gesny
{"title":"Clustering of timed sequences – Application to the analysis of care pathways","authors":"Thomas Guyet ,&nbsp;Pierre Pinson ,&nbsp;Enoal Gesny","doi":"10.1016/j.datak.2024.102401","DOIUrl":"10.1016/j.datak.2024.102401","url":null,"abstract":"<div><div>Improving the future of healthcare starts by better understanding the current actual practices in . This motivates the objective of discovering typical care pathways from patient data. Revealing care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms.</div><div>In this article, we adapt two methods developed for time series to the clustering of timed sequences: the drop-DTW metric and the DBA approach for the construction of averaged time sequences. These methods are then applied in clustering algorithms to propose original and sound clustering algorithms for timed sequences. This approach is experimented with and evaluated on synthetic and .</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102401"},"PeriodicalIF":2.7,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic vs. LLM-based approach: A case study of KOnPoTe vs. Claude for ontology population from French advertisements 语义vs.基于法学硕士的方法:KOnPoTe vs. Claude对法国广告本体人口的案例研究
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-12-04 DOI: 10.1016/j.datak.2024.102392
Aya Sahbi , Céline Alec , Pierre Beust
{"title":"Semantic vs. LLM-based approach: A case study of KOnPoTe vs. Claude for ontology population from French advertisements","authors":"Aya Sahbi ,&nbsp;Céline Alec ,&nbsp;Pierre Beust","doi":"10.1016/j.datak.2024.102392","DOIUrl":"10.1016/j.datak.2024.102392","url":null,"abstract":"<div><div>Automatic ontology population is the process of identifying, extracting, and integrating relevant information from diverse sources to instantiate the classes and properties specified in an ontology, thereby creating a Knowledge Graph (KG) for a particular domain. In this study, we evaluate two approaches for ontology population from text: KOnPoTe, a semantic technique that employs textual and domain knowledge analysis, and a generative AI method leveraging Claude, a Large Language Model (LLM). We conduct comparative experiments on three French advertisement domains: real estate, boats, and restaurants to assess the performance of these techniques. Our analysis highlights the respective strengths and limitations of the semantic approach and the LLM-based one in the context of the ontology population process.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102392"},"PeriodicalIF":2.7,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Curvature constrained MPNNs: Improving message passing with local structural properties 曲率约束的mpnn:利用局部结构特性改进消息传递
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-11-29 DOI: 10.1016/j.datak.2024.102382
Hugo Attali, Davide Buscaldi, Nathalie Pernelle
{"title":"Curvature constrained MPNNs: Improving message passing with local structural properties","authors":"Hugo Attali,&nbsp;Davide Buscaldi,&nbsp;Nathalie Pernelle","doi":"10.1016/j.datak.2024.102382","DOIUrl":"10.1016/j.datak.2024.102382","url":null,"abstract":"<div><div>Graph neural networks operate through an iterative process that involves updating node representations by aggregating information from neighboring nodes, a concept commonly referred to as the message passing paradigm. Despite their widespread usage, a recognized issue with these networks is the tendency to over-squash, leading to diminished efficiency. Recent studies have highlighted that this bottleneck phenomenon is often associated with specific regions within graphs, that can be identified through a measure of edge curvature. In this paper, we present a novel framework designed for any Message Passing Neural Network (MPNN) architecture, wherein information distribution is guided by the curvature of the graph’s edges. Our approach aims to address the over-squashing problem by strategically considering the geometric properties of the underlying graph. The experiments carried out show that our method demonstrates significant improvements in mitigating over-squashing, surpassing the performance of existing graph rewiring techniques across multiple node classification datasets.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102382"},"PeriodicalIF":2.7,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving multi-view ensemble learning with Round-Robin feature set partitioning 利用循环特征集划分改进多视图集成学习
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-11-24 DOI: 10.1016/j.datak.2024.102380
Aditya Kumar , Jainath Yadav
{"title":"Improving multi-view ensemble learning with Round-Robin feature set partitioning","authors":"Aditya Kumar ,&nbsp;Jainath Yadav","doi":"10.1016/j.datak.2024.102380","DOIUrl":"10.1016/j.datak.2024.102380","url":null,"abstract":"<div><div>Multi-view Ensemble Learning (MEL) techniques have shown remarkable success in improving the accuracy and resilience of classification algorithms by combining multiple base classifiers trained over different perspectives of a dataset, known as views. One crucial factor affecting ensemble performance is the selection of diverse and informative feature subsets. Feature Set Partitioning (FSP) methods address this challenge by creating distinct views of features for each base classifier. In this context, we propose the Round-Robin Feature Set Partitioning (<span><math><mi>RR</mi></math></span>-FSP) technique, which introduces a novel approach to feature allocation among views. This novel approach evenly distributes highly correlated features across views, thereby enhancing ensemble diversity, promoting balanced feature utilization, and encouraging the more equitable distribution of correlated features, <span><math><mi>RR</mi></math></span>-FSP contributes to the advancement of MEL techniques. Through experiments on various datasets, we demonstrate that <span><math><mi>RR</mi></math></span>-FSP offers improved classification accuracy and robustness, making it a valuable addition to the arsenal of FSP techniques for MEL.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102380"},"PeriodicalIF":2.7,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
White box specification of intervention policies for prescriptive process monitoring 规定性流程监控干预政策的白盒规范
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-11-23 DOI: 10.1016/j.datak.2024.102379
Mahmoud Shoush, Marlon Dumas
{"title":"White box specification of intervention policies for prescriptive process monitoring","authors":"Mahmoud Shoush,&nbsp;Marlon Dumas","doi":"10.1016/j.datak.2024.102379","DOIUrl":"10.1016/j.datak.2024.102379","url":null,"abstract":"<div><div>Prescriptive process monitoring methods seek to enhance business process performance by triggering real-time interventions, such as offering discounts to increase the likelihood of a positive outcome (e.g., a purchase). At the core of a prescriptive process monitoring method lies an intervention policy, which determines under which conditions and when to trigger an intervention. While state-of-the-art prescriptive process monitoring approaches rely on black-box intervention policies derived through reinforcement learning, algorithmic decision-making requirements sometimes dictate that the business stakeholders must be able to understand, justify, and adjust these intervention policies manually. To address this requirement, this article proposes <em>WB-PrPM</em> (White-Box Prescriptive Process Monitoring), a framework that enables stakeholders to define intervention policies in business processes. WB-PrPM is a rule-based system that helps decision-makers balance the demand for effective interventions with the imperatives of limited resource capacity. The framework incorporates an automated method for tuning the parameters of the intervention policies to optimize a total gain function. An evaluation is presented using real-life datasets to examine the tradeoffs among various parameters. The evaluation reveals that different variants of the proposed framework outperform existing baselines in terms of total gain, even when default parameter values are used. Additionally, the automated parameter optimization approach further enhances the total gain.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"155 ","pages":"Article 102379"},"PeriodicalIF":2.7,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142721292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A goal-oriented document-grounded dialogue based on evidence generation 基于证据生成的以目标为导向、以文件为基础的对话
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-11-22 DOI: 10.1016/j.datak.2024.102378
Yong Song , Hongjie Fan , Junfei Liu , Yunxin Liu , Xiaozhou Ye , Ye Ouyang
{"title":"A goal-oriented document-grounded dialogue based on evidence generation","authors":"Yong Song ,&nbsp;Hongjie Fan ,&nbsp;Junfei Liu ,&nbsp;Yunxin Liu ,&nbsp;Xiaozhou Ye ,&nbsp;Ye Ouyang","doi":"10.1016/j.datak.2024.102378","DOIUrl":"10.1016/j.datak.2024.102378","url":null,"abstract":"<div><div>Goal-oriented Document-grounded Dialogue (DGD) is used for retrieving specific domain documents, assisting users in document content retrieval, question answering, and document management. Existing methods typically employ keyword extraction and vector space models to understand the content of documents, identify the intent of questions, and generate answers based on the capabilities of generation models. However, challenges remain in semantic understanding, long text processing, and context understanding. The emergence of Large Language Models (LLMs) has brought new capabilities in context learning and step-by-step reasoning. These models, combined with Retrieval Augmented Generation(RAG) methods, have made significant breakthroughs in text comprehension, intent detection, language organization, offering exciting prospects for DGD research. However, the “hallucination” issue arising from LLMs requires complementary methods to ensure the credibility of their outputs. In this paper we propose a goal-oriented document-grounded dialogue approach based on evidence generation using LLMs. It designs and implements methods for document content retrieval &amp; reranking, fine-tuning and inference, and evidence generation. Through experiments, the method of combining LLMs with vector space model, or with key information matching technique is used as a comparison, the accuracy of the proposed method is improved by 21.91% and 12.81%, while the comprehensiveness is increased by 10.89% and 69.83%, coherence is enhanced by 38.98% and 53.27%, and completeness is boosted by 16.13% and 36.97%, respectively, on average. Additional, ablation analysis conducted reveals that the evidence generation method also contributes significantly to the comprehensiveness and completeness.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"155 ","pages":"Article 102378"},"PeriodicalIF":2.7,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-aware process models: From soundness checking to repair 数据感知流程模型:从健全性检查到修复
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-11-17 DOI: 10.1016/j.datak.2024.102377
Matteo Zavatteri, Davide Bresolin, Massimiliano de Leoni, Aurelo Makaj
{"title":"Data-aware process models: From soundness checking to repair","authors":"Matteo Zavatteri,&nbsp;Davide Bresolin,&nbsp;Massimiliano de Leoni,&nbsp;Aurelo Makaj","doi":"10.1016/j.datak.2024.102377","DOIUrl":"10.1016/j.datak.2024.102377","url":null,"abstract":"<div><div>Process-aware Information Systems support the enactment of business processes and rely on a model that prescribes which executions are allowed. As a result, the model needs to be sound for the process to be carried out. Traditionally, soundness has been defined and studied by only focusing on the control-flow. Some works proposed techniques to repair the process model to ensure soundness, ignoring data and decision perspectives. This paper puts forward a technique to repair the data perspective of process models, keeping intact the control flow structure. Processes are modeled by Data Petri nets. Our approach repairs the Constraint Graph, a finite symbolic abstraction of the infinite state–space of the underlying Data Petri net. The changes in the Constraint Graph are then projected back onto the Data Petri net.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"155 ","pages":"Article 102377"},"PeriodicalIF":2.7,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context normalization: A new approach for the stability and improvement of neural network performance 上下文正常化:稳定和提高神经网络性能的新方法
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-11-15 DOI: 10.1016/j.datak.2024.102371
Bilal Faye , Hanane Azzag , Mustapha Lebbah , Fangchen Feng
{"title":"Context normalization: A new approach for the stability and improvement of neural network performance","authors":"Bilal Faye ,&nbsp;Hanane Azzag ,&nbsp;Mustapha Lebbah ,&nbsp;Fangchen Feng","doi":"10.1016/j.datak.2024.102371","DOIUrl":"10.1016/j.datak.2024.102371","url":null,"abstract":"<div><div>Deep neural networks face challenges with distribution shifts across layers, affecting model convergence and performance. While Batch Normalization (BN) addresses these issues, its reliance on a single Gaussian distribution assumption limits adaptability. To overcome this, alternatives like Layer Normalization, Group Normalization, and Mixture Normalization emerged, yet struggle with dynamic activation distributions. We propose ”Context Normalization” (CN), introducing contexts constructed from domain knowledge. CN normalizes data within the same context, enabling local representation. During backpropagation, CN learns normalized parameters and model weights for each context, ensuring efficient convergence and superior performance compared to BN and MN. This approach emphasizes context utilization, offering a fresh perspective on activation normalization in neural networks. We release our code at <span><span>https://github.com/b-faye/Context-Normalization</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"155 ","pages":"Article 102371"},"PeriodicalIF":2.7,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信