{"title":"A framework for understanding event abstraction problem solving: Current states of event abstraction studies","authors":"Jungeun Lim, Minseok Song","doi":"10.1016/j.datak.2024.102352","DOIUrl":"10.1016/j.datak.2024.102352","url":null,"abstract":"<div><p>Event abstraction is a crucial step in applying process mining in real-world scenarios. However, practitioners often face challenges in selecting relevant research for their specific needs. To address this, we present a comprehensive framework for understanding event abstraction, comprising four key components: event abstraction sub-problems, consideration of process properties, data types for event abstraction, and various approaches to event abstraction. By systematically examining these components, practitioners can efficiently identify research that aligns with their requirements. Additionally, we analyze existing studies using this framework to provide practitioners with a clearer view of current research and suggest expanded applications of existing methods.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102352"},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142171787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Syed Iftikhar Hussain Shah , Vassilios Peristeras , Ioannis Magnisalis
{"title":"A conceptual framework for the government big data ecosystem (‘datagov.eco’)","authors":"Syed Iftikhar Hussain Shah , Vassilios Peristeras , Ioannis Magnisalis","doi":"10.1016/j.datak.2024.102348","DOIUrl":"10.1016/j.datak.2024.102348","url":null,"abstract":"<div><p>The public sector, private firms, and civil society constantly create data of high volume, velocity, and veracity from diverse sources. This kind of data is known as big data. As in other industries, public administrations consider big data as the “new oil\" and employ data-centric policies to transform data into knowledge, stimulate good governance, innovative digital services, transparency, and citizens' engagement in public policy. More and more public organizations understand the value created by exploiting internal and external data sources, delivering new capabilities, and fostering collaboration inside and outside of public administrations. Despite the broad interest in this ecosystem, we still lack a detailed and systematic view of it. In this paper, we attempt to describe the emerging Government Big Data Ecosystem as a <em>socio-technical network</em> of people, organizations, processes, technology, infrastructure, standards & policies, procedures, and resources. This ecosystem supports <em>data functions</em> such as data collection, integration, analysis, storage, sharing, use, protection, and archiving. Through these functions, <em>value is created</em> by promoting evidence-based policymaking, modern public services delivery, data-driven administration and open government, and boosting the data economy. Through a Design Science Research methodology, we propose a conceptual framework, which we call ‘datagov.eco’. We believe our ‘datagov.eco’ framework will provide insights and support to different stakeholders’ profiles, including administrators, consultants, data engineers, and data scientists.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102348"},"PeriodicalIF":2.7,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey
{"title":"Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories","authors":"Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey","doi":"10.1016/j.datak.2024.102351","DOIUrl":"10.1016/j.datak.2024.102351","url":null,"abstract":"<div><div>The field of conceptual modeling has now been in existence for over five decades. To understand how this field has evolved and should continue to evolve, it is useful to examine the contributions made over time and the themes that have emerged. In this research, we apply bibliometric analysis to a corpus of over 4700 research papers spanning from 1976 to 2023. We successively apply co-citation, bibliographic coupling, and main path analysis. Co-citation and citation networks are produced that surface the intellectual structure of the field, the main themes, and the relationships among major and influential research papers over time. We identify four areas in the intellectual structure of the field: conceptual modeling and databases; grammars and guidelines for conceptual modeling; requirements engineering and information systems design methodologies; and ontology constructs for conceptual modeling. Between 2017 and 2023, we distinguish nine research themes, including domain-specific conceptual modeling and applications, ontologies and applications, genomics, and datastores and multi-model data. The main path analysis identifies several trajectories among the major and most influential papers. This leads to insights into the lineage of key, influential papers in conceptual modeling research. The primordial nature of the main paths identified encompasses two important aspects. The first revolves around refining and complementing the entity-relationship model. The second identifies the contribution of ontologies for conceptual modeling to make the models more robust. Based on the findings from this bibliometric analysis, we propose several directions for future conceptual modeling research.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102351"},"PeriodicalIF":2.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capturing and Analysing Employee Behaviour: An Honest Day’s Work Record","authors":"Iris Beerepoot, Tea Šinik, Hajo A. Reijers","doi":"10.1016/j.datak.2024.102350","DOIUrl":"10.1016/j.datak.2024.102350","url":null,"abstract":"<div><p>For a range of reasons, organisations collect data on the work behaviour of their employees. However, each data collection technique displays its own unique mix of intrusiveness, information richness, and risks. For the sake of understanding the differences between data collection techniques, we conducted a multiple-case study in a multinational professional services organisation, tracking six participants throughout a workday using non-participant observation, screen recording, and timesheet techniques. This led to 136 hours of data. Our findings show that relying on one data collection technique alone cannot provide a comprehensive and accurate account of activities that are screen-based, offline, or overtime. The collected data also provided an opportunity to investigate the use of <em>process mining</em> for analysing employee behaviour, specifically with respect to the completeness of the collected data. Our study underlines the importance of judiciously selecting data collection techniques, as well as using a sufficiently broad data set to generate reliable insights into employee behaviour.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102350"},"PeriodicalIF":2.7,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000740/pdfft?md5=0803a6136e27919fd8c8a868fa63e889&pid=1-s2.0-S0169023X24000740-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering outlying attributes of outliers in data streams","authors":"Egawati Panjei , Le Gruenwald","doi":"10.1016/j.datak.2024.102349","DOIUrl":"10.1016/j.datak.2024.102349","url":null,"abstract":"<div><p>Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102349"},"PeriodicalIF":2.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor","authors":"Zihao Cai, Zhaodong Gu, Kejing He","doi":"10.1016/j.datak.2024.102345","DOIUrl":"10.1016/j.datak.2024.102345","url":null,"abstract":"<div><p>Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102345"},"PeriodicalIF":2.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141979340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach","authors":"Arun Sen , Atish P. Sinha , Cong Zhang","doi":"10.1016/j.datak.2024.102344","DOIUrl":"10.1016/j.datak.2024.102344","url":null,"abstract":"<div><p>We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102344"},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek
{"title":"Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN","authors":"Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek","doi":"10.1016/j.datak.2024.102343","DOIUrl":"10.1016/j.datak.2024.102343","url":null,"abstract":"<div><p>Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102343"},"PeriodicalIF":2.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141716049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elena Romanenko , Diego Calvanese , Giancarlo Guizzardi
{"title":"Evaluating quality of ontology-driven conceptual models abstractions","authors":"Elena Romanenko , Diego Calvanese , Giancarlo Guizzardi","doi":"10.1016/j.datak.2024.102342","DOIUrl":"10.1016/j.datak.2024.102342","url":null,"abstract":"<div><p>The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102342"},"PeriodicalIF":2.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000661/pdfft?md5=3da15f24c92422d6dac0dc27c996166b&pid=1-s2.0-S0169023X24000661-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}