Information Systems最新文献

筛选
英文 中文
Quantifying and relating the completeness and diversity of process representations using species estimation
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-30 DOI: 10.1016/j.is.2024.102512
Martin Kabierski, Markus Richter, Matthias Weidlich
{"title":"Quantifying and relating the completeness and diversity of process representations using species estimation","authors":"Martin Kabierski,&nbsp;Markus Richter,&nbsp;Matthias Weidlich","doi":"10.1016/j.is.2024.102512","DOIUrl":"10.1016/j.is.2024.102512","url":null,"abstract":"<div><div>The analysis of process representations, such as event logs or process models, has become a staple in the context of business process management. Insights gained from such an analysis serve to monitor and improve the business processes that is captured. Yet, any process representation is merely a sample of the past and possible behaviour of a business process, which raises the question of its representativeness: To which extent does the process representation capture the process characteristics that are relevant for the analysis? In this paper, we propose to answer this question using estimators from biodiversity research. Specifically, we propose to infer a completeness profile based on the estimated number of distinct relevant characteristics of the process representation and a diversity profile, that captures the heterogeneity of relevant distinct characteristics using asymptotic Hill numbers. We validate the applicability of the proposed estimators for process analysis in a series of controlled experiments. Applying the estimators to real-world event logs, we highlight potential issues in terms of trustworthiness of analysis that is based on them, and show how the profiles can be leveraged to compare different process representations concerning their similarity and completeness.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"130 ","pages":"Article 102512"},"PeriodicalIF":3.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143311862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive sliding window normalization
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-28 DOI: 10.1016/j.is.2024.102515
George Papageorgiou, Christos Tjortjis
{"title":"Adaptive sliding window normalization","authors":"George Papageorgiou,&nbsp;Christos Tjortjis","doi":"10.1016/j.is.2024.102515","DOIUrl":"10.1016/j.is.2024.102515","url":null,"abstract":"<div><div>Time series data, frequent in various domains such as finance, healthcare, environmental monitoring, and energy management, often exhibit nonstationary behaviors and anomalies that challenge traditional normalization techniques. This research proposes an innovative methodology termed Adaptive Sliding Window Normalization (ASWN) to address these limitations. ASWN dynamically adjusts normalization window sizes based on detected anomalies with multiple methods, applied Density-Based Spatial Clustering of Applications with Noise (DBSCAN) for the finalization of those, and utilizes the Akaike Information Criterion (AIC) with AutoRegressive Integrated Moving Average (ARIMA) models to determine optimal window sizes in the absence of anomalies. This approach integrates multiple anomaly detection methods to ensure responsiveness to changes in data patterns and effective management of outliers. ASWN is applied to diverse time series datasets, including energy consumption, and financial data, demonstrating significant improvements in predictive accuracy. Extensive experiments show that ASWN outperforms traditional normalization methods, providing empirical evidence of its benefits in handling nonstationary and anomalous data. This research enhances the robustness and reliability of time series forecasting and contributes to the broader field by thoroughly documenting the methodology, experimental setup, and results. The findings are intended to foster further advancements in time series normalization and forecasting.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102515"},"PeriodicalIF":3.0,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proactive event matching with predictive analysis in content-based publish/subscribe systems
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-23 DOI: 10.1016/j.is.2024.102508
Yongpeng Dong, Shiyou Qian, Tianchen Ding, Jian Cao, Guangtao Xue, Minglu Li
{"title":"Proactive event matching with predictive analysis in content-based publish/subscribe systems","authors":"Yongpeng Dong,&nbsp;Shiyou Qian,&nbsp;Tianchen Ding,&nbsp;Jian Cao,&nbsp;Guangtao Xue,&nbsp;Minglu Li","doi":"10.1016/j.is.2024.102508","DOIUrl":"10.1016/j.is.2024.102508","url":null,"abstract":"<div><div>The real-time efficacy of content-based publish/subscribe systems is largely dependent on the efficiency of matching algorithms. Current methodologies mainly focus on overall matching performance, often ignoring the dynamic nature and evolving trends of hot events. This paper introduces a novel, learning-driven approach – the proactive adjustment framework (PAF) – tailored to dynamically adapt to hot event trends. By strategically prioritizing subscriptions in alignment with the changing dynamics of hot events, PAF enhances the efficiency of matching algorithms and optimize the system real-time performance. One challenge of PAF is the trade-off that needs to be made between the gains of improving real-time performance by identifying matching subscriptions earlier and the cost of increasing matching time due to subscription classification and adjustment. We design a concise scheme to classify subscriptions, establish a lightweight adjustment mechanism to handle dynamics, and propose an efficient greedy algorithm to compute adjustment plans. This approach helps to mitigate the impact of PAF on matching performance. The experiment results show that the 95th percentile of the determining time of matching subscriptions is improved by about 50.5% and the throughput is also increased by 13%, compared to the baseline SCSL. Furthermore, we integrate PAF into Apache Kafka to augment it as a content-based publish/subscribe system. We test the effectiveness of PAF using two real-world datasets. Compared with two baselines, SCSL and REIN, PAF achieves an improvement of 22.5% and 51.8% respectively in average event transfer latency.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102508"},"PeriodicalIF":3.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Capturing end-to-end provenance for machine learning pipelines
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-22 DOI: 10.1016/j.is.2024.102495
Marius Schlegel, Kai-Uwe Sattler
{"title":"Capturing end-to-end provenance for machine learning pipelines","authors":"Marius Schlegel,&nbsp;Kai-Uwe Sattler","doi":"10.1016/j.is.2024.102495","DOIUrl":"10.1016/j.is.2024.102495","url":null,"abstract":"<div><div>Modern workflows for developing ML pipelines utilize ML artifact management systems (ML AMSs) such as MLflow in addition to traditional version control systems such as Git. ML AMSs collect data, model, metadata and software artifacts used and produced in pipeline development workflows. While ensuring repeatability and reproducibility, the provenance capabilities are still rudimentary, mainly due to incomplete traces, coarse granularity, and limited query capabilities. In this paper, we introduce a comprehensive PROV-compliant provenance model that captures end-to-end provenance traces of ML pipelines, their artifacts, and their relationships based on MLflow and Git activities. Moreover, we present the tool MLflow2PROV for continuously extracting provenance graphs according to our model, enabling querying, analyzing, and processing of the collected provenance information.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102495"},"PeriodicalIF":3.0,"publicationDate":"2024-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143686876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the discovery of seasonal gradual patterns through periodic patterns mining
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-19 DOI: 10.1016/j.is.2024.102511
Jerry Lonlac , Arnaud Doniec , Marin Lujak , Stéphane Lecoeuche
{"title":"On the discovery of seasonal gradual patterns through periodic patterns mining","authors":"Jerry Lonlac ,&nbsp;Arnaud Doniec ,&nbsp;Marin Lujak ,&nbsp;Stéphane Lecoeuche","doi":"10.1016/j.is.2024.102511","DOIUrl":"10.1016/j.is.2024.102511","url":null,"abstract":"<div><div>Gradual patterns, capturing intricate attribute co-variations expressed as “when X increases/decreases, Y increases/decreases” in numerical data, play a vital role in managing vast volumes of complex numerical data in real-world applications. Recently, the data science community has focused on efficient extraction methods for gradual patterns from temporal data. However, there is a notable gap in approaches addressing the extraction of gradual patterns that capture seasonality from the graduality point of view in the temporal data sequences, despite their potential to yield valuable insights in applications such as e-commerce. This paper proposes a new method for extracting co-variations of periodically repeating attributes termed as seasonal gradual patterns. To achieve this, we formulate the task of mining seasonal gradual patterns as the problem of mining periodic patterns in multiple sequences and then, leverage periodic pattern mining algorithms to extract seasonal gradual patterns. Additionally, we propose a new antimonotonic support definition associated with these seasonal gradual patterns. Illustrative results from real-world datasets demonstrate the efficiency of the proposed approach and its ability to sift through numerous non-seasonal patterns to identify the seasonal ones.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102511"},"PeriodicalIF":3.0,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate conformance checking: Fast computation of multi-perspective, probabilistic alignments
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-16 DOI: 10.1016/j.is.2024.102510
Alessandro Gianola , Jonghyeon Ko , Fabrizio Maria Maggi , Marco Montali , Sarah Winkler
{"title":"Approximate conformance checking: Fast computation of multi-perspective, probabilistic alignments","authors":"Alessandro Gianola ,&nbsp;Jonghyeon Ko ,&nbsp;Fabrizio Maria Maggi ,&nbsp;Marco Montali ,&nbsp;Sarah Winkler","doi":"10.1016/j.is.2024.102510","DOIUrl":"10.1016/j.is.2024.102510","url":null,"abstract":"<div><div>In the context of process mining, alignments are increasingly being adopted for conformance checking, due to their ability in providing sophisticated diagnostics on the nature and extent of deviations between observed traces and a reference process model. On the downside, deriving alignments is challenging from the computational point of view, even more so when dealing with multiple perspectives in the process, such as, in particular, data. In fact, every observed trace must in principle be compared with infinitely many model traces. In this work, we tackle this computational bottleneck by borrowing the classical idea of <em>encoding</em> from machine learning. Instead of computing alignments directly and exactly, we do so in an approximate way after applying a lossy trace encoding that maps each trace into a corresponding compact, vectorial representation that retains only certain information of the original trace. We study trace encoding-based approximate alignments for processes equipped with event data attributes, from three different angles. First, we indeed show that computing approximate alignments in this way is much more efficient than in the exact setting. Second, we evaluate how accurate such approximate alignments are, considering different encoding strategies that focus on different features of the trace. Our findings suggest that sufficiently rich encodings actually yield good accuracy. Third, we consider the impact of frequency and density of model variants, comparing the effectiveness of using standard approximate multi-perspective alignments as opposed to a variant that incorporates probabilities. As a by-product of this analysis, we also obtain insights on how these two approaches perform in the presence of noise.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102510"},"PeriodicalIF":3.0,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian label distribution propagation: A semi-supervised probabilistic k nearest neighbor classifier
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-15 DOI: 10.1016/j.is.2024.102507
Jonatan M.N. Gøttcke, Arthur Zimek, Ricardo J.G.B. Campello
{"title":"Bayesian label distribution propagation: A semi-supervised probabilistic k nearest neighbor classifier","authors":"Jonatan M.N. Gøttcke,&nbsp;Arthur Zimek,&nbsp;Ricardo J.G.B. Campello","doi":"10.1016/j.is.2024.102507","DOIUrl":"10.1016/j.is.2024.102507","url":null,"abstract":"<div><div>Semi-supervised classification methods are specialized to use a very limited amount of labeled data for training and ultimately for assigning labels to the vast majority of unlabeled data. Label propagation is such a technique, that assigns labels to those parts of unlabeled data that are in some sense close to labeled examples and then uses these predicted labels in turn to predict labels of more remote data. Here we propose to not propagate an immediate label decision to neighbors but to propagate the label probability distribution. This way we keep more information and take into account the remaining uncertainty of the classifier. We employ a Bayesian schema that is more straightforward than existing methods. As a consequence, we avoid propagating errors by decisions taken too early. A crisp decision can be derived from the propagated label distributions at will. We implement and test this strategy with a probabilistic <span><math><mi>k</mi></math></span>-nearest neighbor classifier, providing semi-supervised classification results comparable to several state-of-the-art competitors in quality while being more efficient in terms of computational resources. Furthermore, we establish a theoretical connection between the <span><math><mi>k</mi></math></span>-nearest neighbor classifier and density-based label propagation.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102507"},"PeriodicalIF":3.0,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TREATS: Fairness-aware entity resolution over streaming data
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-12 DOI: 10.1016/j.is.2024.102506
Tiago Brasileiro Araújo , Vasilis Efthymiou , Vassilis Christophides , Evaggelia Pitoura , Kostas Stefanidis
{"title":"TREATS: Fairness-aware entity resolution over streaming data","authors":"Tiago Brasileiro Araújo ,&nbsp;Vasilis Efthymiou ,&nbsp;Vassilis Christophides ,&nbsp;Evaggelia Pitoura ,&nbsp;Kostas Stefanidis","doi":"10.1016/j.is.2024.102506","DOIUrl":"10.1016/j.is.2024.102506","url":null,"abstract":"<div><div>Currently, the growing proliferation of information systems generates large volumes of data continuously, stemming from a variety of sources such as web platforms, social networks, and multiple devices. These data, often lacking a defined schema, require an initial process of consolidation and cleansing before analysis and knowledge extraction can occur. In this context, Entity Resolution (ER) plays a crucial role, facilitating the integration of knowledge bases and identifying similarities among entities from different sources. However, the traditional ER process is computationally expensive, and becomes more complicated in the streaming context where the data arrive continuously. Moreover, there is a lack of studies involving fairness and ER, which is related to the absence of discrimination or bias. In this sense, fairness criteria aim to mitigate the implications of data bias in ER systems, which requires more than just optimizing accuracy, as traditionally done. Considering this context, this work presents TREATS, a schema-agnostic and fairness-aware ER workflow developed for managing streaming data incrementally. The proposed fairness-aware ER framework tackles constraints across various groups of interest, presenting a resilient and equitable solution to the related challenges. Through experimental evaluation, the proposed techniques and heuristics are compared against state-of-the-art approaches over five real-world data source pairs, in which the results demonstrated significant improvements in terms of fairness, without degradation of effectiveness and efficiency measures in the streaming environment. In summary, our contributions aim to propel the ER field forward by providing a workflow that addresses both technical challenges and ethical concerns.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102506"},"PeriodicalIF":3.0,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in databases and information systems — Selected papers from ADBIS 2023
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-12 DOI: 10.1016/j.is.2024.102509
Alberto Abelló, Ladjel Bellatreche, Oscar Romero, Panos Vassiliadis, Robert Wrembel
{"title":"Advances in databases and information systems — Selected papers from ADBIS 2023","authors":"Alberto Abelló,&nbsp;Ladjel Bellatreche,&nbsp;Oscar Romero,&nbsp;Panos Vassiliadis,&nbsp;Robert Wrembel","doi":"10.1016/j.is.2024.102509","DOIUrl":"10.1016/j.is.2024.102509","url":null,"abstract":"","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102509"},"PeriodicalIF":3.0,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Business Process Compliance with impact constraints
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-12-10 DOI: 10.1016/j.is.2024.102505
Tewabe Chekole Workneh, Pietro Sala, Romeo Rizzi, Matteo Cristani
{"title":"Business Process Compliance with impact constraints","authors":"Tewabe Chekole Workneh,&nbsp;Pietro Sala,&nbsp;Romeo Rizzi,&nbsp;Matteo Cristani","doi":"10.1016/j.is.2024.102505","DOIUrl":"10.1016/j.is.2024.102505","url":null,"abstract":"<div><div>Business Process Compliance is a family of methods to evaluate Business Processes in terms of the existence of <em>one execution</em> (one trace) that does not violate constraints superimposed on the process itself. The dual version is formulated as the superimposition of a set of constraints and consequent evaluation of the process for <em>all the executions</em>. These problems are relevant to a large part of actual applications, especially those in the context of <em>regulatory compliance</em> where we aim at verifying the process against a normative background (including, for instance, soft ones, such as guidelines, product specification, and product standards) or goals fixed by the owner of the process. In this paper we discuss one new type of compliance, that is <em>impact compliance</em>, devised to verify when a process respects a set of constraints, to establish that certain amounts, measuring the undesired effects of the tasks executed to implement the process, are <em>below given limits</em>.</div><div>In the current literature on Business Process Management, Business Process Analysis, and Business Process Compliance, this type of compliance checking process has not yet been addressed. As we demonstrate in this paper, this problem is significant and complex to address.</div><div>In particular, we show that the checking problems described above are, under certain structural conditions, polynomially solvable on deterministic machines. In general, however, the first problem is NP-complete whilst the second is polynomially solvable on deterministic machines.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102505"},"PeriodicalIF":3.0,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信