Data & Knowledge Engineering最新文献

筛选
英文 中文
Reasoning on responsibilities for optimal process alignment computation 最佳流程对齐计算的责任推理
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-09-19 DOI: 10.1016/j.datak.2024.102353
{"title":"Reasoning on responsibilities for optimal process alignment computation","authors":"","doi":"10.1016/j.datak.2024.102353","DOIUrl":"10.1016/j.datak.2024.102353","url":null,"abstract":"<div><p>Process alignment aims at establishing a matching between a process model run and a log trace. To improve such a matching, process alignment techniques often exploit contextual conditions to enable computations that are more informed than the simple edit distance between model runs and log traces. The paper introduces a novel approach to process alignment which relies on contextual information expressed as <em>responsibilities</em>. The notion of responsibility is fundamental in business and organization models, but it is often overlooked. We show the computation of optimal alignments can take advantage of responsibilities. We leverage on them in two ways. First, responsibilities may sometimes justify deviations. In these cases, we consider them as correct behaviors rather than errors. Second, responsibilities can either be met or neglected in the execution of a trace. Thus, we prefer alignments where neglected responsibilities are minimized.</p><p>The paper proposes a formal framework for responsibilities in a process model, including the definition of cost functions for computing optimal alignments. We also propose a branch-and-bound algorithm for optimal alignment computation and exemplify its usage by way of two event logs from real executions.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000776/pdfft?md5=df35ebc627d0abaf942b9666c2d2c159&pid=1-s2.0-S0169023X24000776-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SRank: Guiding schema selection in NoSQL document stores SRank:指导 NoSQL 文档存储中的模式选择
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-09-14 DOI: 10.1016/j.datak.2024.102360
{"title":"SRank: Guiding schema selection in NoSQL document stores","authors":"","doi":"10.1016/j.datak.2024.102360","DOIUrl":"10.1016/j.datak.2024.102360","url":null,"abstract":"<div><div>The rise of big data has led to a greater need for applications to change their schema frequently. NoSQL databases provide flexibility in organizing data and offer multiple choices for structuring and storing similar information. While schema flexibility speeds up initial development, choosing schemas wisely is crucial, as they significantly impact performance, affecting data redundancy, navigation cost, data access cost, and maintainability. This paper emphasizes the importance of schema design in NoSQL document stores. It proposes a model to analyze and evaluate different schema alternatives and suggest the best schema out of various schema alternatives. The model is divided into four phases. The model inputs the Entity-Relationship (ER) model and workload queries. In the Transformation Phase, the schema alternatives are initially developed for each ER model, and subsequently, a schema graph is generated for each alternative. Concurrently, workload queries undergo conversion into query graphs. In the Schema Evaluation phase, the Schema Rank (SRank) is calculated for each schema alternative using query metrics derived from the query graphs and path coverage generated from the schema graphs. Finally, in the Output phase, the schema with the highest SRank is recommended as the most suitable choice for the application. The paper includes a case study of a Hotel Reservation System (HRS) to demonstrate the application of the proposed model. It comprehensively evaluates various schema alternatives based on query response time, storage efficiency, scalability, throughput, and latency. The paper validates the SRank computation for schema selection in NoSQL databases through an extensive experimental study. The alignment of SRank values with each schema's performance metrics underscores this ranking system's effectiveness. The SRank simplifies the schema selection process, assisting users in making informed decisions by reducing the time, cost, and effort of identifying the optimal schema for NoSQL document stores.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relating behaviour of data-aware process models 数据感知流程模型的相关行为
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-09-12 DOI: 10.1016/j.datak.2024.102363
{"title":"Relating behaviour of data-aware process models","authors":"","doi":"10.1016/j.datak.2024.102363","DOIUrl":"10.1016/j.datak.2024.102363","url":null,"abstract":"<div><p>Data Petri nets (DPNs) have gained traction as a model for data-aware processes, thanks to their ability to balance simplicity with expressiveness, and because they can be automatically discovered from event logs. While model checking techniques for DPNs have been studied, more complex analysis tasks that are highly relevant for BPM are beyond methods known in the literature. We focus here on equivalence and inclusion of process behaviour with respect to language and configuration spaces, optionally taking data into account. Such comparisons are important in the context of key process mining tasks, namely process repair and discovery, and related to conformance checking. To solve these tasks, we propose approaches for bounded DPNs based on <em>constraint graphs</em>, which are faithful abstractions of the reachable state space. Though the considered verification tasks are undecidable in general, we show that our method is a decision procedure DPNs that admit a <em>finite history set</em>. This property guarantees that constraint graphs are finite and computable, and was shown to hold for large classes of DPNs that are mined automatically, and DPNs presented in the literature. The new techniques are implemented in the tool <span>ada</span>, and an evaluation proving feasibility is provided.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000879/pdfft?md5=ee932b18bac18fd1e3c1e769269d7d67&pid=1-s2.0-S0169023X24000879-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142241277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A framework for understanding event abstraction problem solving: Current states of event abstraction studies 理解事件抽象解决问题的框架:事件抽象研究的现状
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-09-06 DOI: 10.1016/j.datak.2024.102352
{"title":"A framework for understanding event abstraction problem solving: Current states of event abstraction studies","authors":"","doi":"10.1016/j.datak.2024.102352","DOIUrl":"10.1016/j.datak.2024.102352","url":null,"abstract":"<div><p>Event abstraction is a crucial step in applying process mining in real-world scenarios. However, practitioners often face challenges in selecting relevant research for their specific needs. To address this, we present a comprehensive framework for understanding event abstraction, comprising four key components: event abstraction sub-problems, consideration of process properties, data types for event abstraction, and various approaches to event abstraction. By systematically examining these components, practitioners can efficiently identify research that aligns with their requirements. Additionally, we analyze existing studies using this framework to provide practitioners with a clearer view of current research and suggest expanded applications of existing methods.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142171787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A conceptual framework for the government big data ecosystem (‘datagov.eco’) 政府大数据生态系统("datagov.eco")概念框架
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-09-05 DOI: 10.1016/j.datak.2024.102348
{"title":"A conceptual framework for the government big data ecosystem (‘datagov.eco’)","authors":"","doi":"10.1016/j.datak.2024.102348","DOIUrl":"10.1016/j.datak.2024.102348","url":null,"abstract":"<div><p>The public sector, private firms, and civil society constantly create data of high volume, velocity, and veracity from diverse sources. This kind of data is known as big data. As in other industries, public administrations consider big data as the “new oil\" and employ data-centric policies to transform data into knowledge, stimulate good governance, innovative digital services, transparency, and citizens' engagement in public policy. More and more public organizations understand the value created by exploiting internal and external data sources, delivering new capabilities, and fostering collaboration inside and outside of public administrations. Despite the broad interest in this ecosystem, we still lack a detailed and systematic view of it. In this paper, we attempt to describe the emerging Government Big Data Ecosystem as a <em>socio-technical network</em> of people, organizations, processes, technology, infrastructure, standards &amp; policies, procedures, and resources. This ecosystem supports <em>data functions</em> such as data collection, integration, analysis, storage, sharing, use, protection, and archiving. Through these functions, <em>value is created</em> by promoting evidence-based policymaking, modern public services delivery, data-driven administration and open government, and boosting the data economy. Through a Design Science Research methodology, we propose a conceptual framework, which we call ‘datagov.eco’. We believe our ‘datagov.eco’ framework will provide insights and support to different stakeholders’ profiles, including administrators, consultants, data engineers, and data scientists.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data engineering and modeling for artificial intelligence 人工智能的数据工程和建模
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-09-01 DOI: 10.1016/j.datak.2024.102346
{"title":"Data engineering and modeling for artificial intelligence","authors":"","doi":"10.1016/j.datak.2024.102346","DOIUrl":"10.1016/j.datak.2024.102346","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Capturing and Analysing Employee Behaviour: An Honest Day’s Work Record 捕捉和分析员工行为:诚实的日常工作记录
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-08-31 DOI: 10.1016/j.datak.2024.102350
{"title":"Capturing and Analysing Employee Behaviour: An Honest Day’s Work Record","authors":"","doi":"10.1016/j.datak.2024.102350","DOIUrl":"10.1016/j.datak.2024.102350","url":null,"abstract":"<div><p>For a range of reasons, organisations collect data on the work behaviour of their employees. However, each data collection technique displays its own unique mix of intrusiveness, information richness, and risks. For the sake of understanding the differences between data collection techniques, we conducted a multiple-case study in a multinational professional services organisation, tracking six participants throughout a workday using non-participant observation, screen recording, and timesheet techniques. This led to 136 hours of data. Our findings show that relying on one data collection technique alone cannot provide a comprehensive and accurate account of activities that are screen-based, offline, or overtime. The collected data also provided an opportunity to investigate the use of <em>process mining</em> for analysing employee behaviour, specifically with respect to the completeness of the collected data. Our study underlines the importance of judiciously selecting data collection techniques, as well as using a sufficiently broad data set to generate reliable insights into employee behaviour.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000740/pdfft?md5=0803a6136e27919fd8c8a868fa63e889&pid=1-s2.0-S0169023X24000740-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering outlying attributes of outliers in data streams 发现数据流中异常值的离群属性
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-08-30 DOI: 10.1016/j.datak.2024.102349
{"title":"Discovering outlying attributes of outliers in data streams","authors":"","doi":"10.1016/j.datak.2024.102349","DOIUrl":"10.1016/j.datak.2024.102349","url":null,"abstract":"<div><p>Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor 针对具有强干扰因素的不同密度数据集的基于密度的自适应聚类算法
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-08-07 DOI: 10.1016/j.datak.2024.102345
{"title":"A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor","authors":"","doi":"10.1016/j.datak.2024.102345","DOIUrl":"10.1016/j.datak.2024.102345","url":null,"abstract":"<div><p>Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141979340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach 为医疗实践开发决策支持系统:设计科学研究方法
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2024-07-17 DOI: 10.1016/j.datak.2024.102344
{"title":"Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach","authors":"","doi":"10.1016/j.datak.2024.102344","DOIUrl":"10.1016/j.datak.2024.102344","url":null,"abstract":"<div><p>We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信