{"title":"Class Representatives Selection in non-metric spaces for nearest prototype classification","authors":"Jaroslav Hlaváč , Martin Kopp , Tomáš Skopal","doi":"10.1016/j.is.2025.102564","DOIUrl":"10.1016/j.is.2025.102564","url":null,"abstract":"<div><div>The nearest prototype classification is a less computationally intensive replacement for the <span><math><mi>k</mi></math></span>-NN method, especially when large datasets are considered. Centroids are often used as prototypes to represent whole classes in metric spaces. Selection of class prototypes in non-metric spaces is more challenging as the idea of computing centroids is not directly applicable. Instead, a set of representative objects can be used as the class prototype.</div><div>This paper presents the Class Representatives Selection (CRS) method, a novel memory and computationally efficient method that finds a small yet representative set of objects from each class to be used as a prototype. CRS leverages the similarity graph representation of each class created by the NN-Descent algorithm to pick a low number of representatives that ensure sufficient class coverage. Thanks to the graph-based approach, CRS can be applied to any space where at least a pairwise similarity can be defined. In the experimental evaluation, we demonstrate that our method outperforms the state-of-the-art techniques on multiple datasets from different domains.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102564"},"PeriodicalIF":3.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143948792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Back to the Order: Partial orders in streaming conformance checking","authors":"Kristo Raun , Riccardo Tommasini , Ahmed Awad","doi":"10.1016/j.is.2025.102566","DOIUrl":"10.1016/j.is.2025.102566","url":null,"abstract":"<div><div>Most organizations are built around their business processes. Commonly, these processes follow a predefined path. Deviations from the expected path can lead to lower quality products and services, reduced efficiencies, and compliance liabilities. Rapid identification of deviations helps mitigate such risks. For identifying deviations, the conformance checker would need to know the sequence in which events occurred. In this paper, we tackle two challenges associated with knowing the right sequence of events. First, we look at out-of-order event arrival, a common occurrence in modern information systems. Second, we extend the previous work by incorporating partial order handling. Partially ordered events are a well-studied problem in process mining, but to the best of our knowledge it has not been researched in terms of fast-paced streaming conformance checking. Real-life and semi-synthetic datasets are used for validating the proposed methods.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102566"},"PeriodicalIF":3.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143948791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Timeline-based process discovery","authors":"Christoffer Rubensson , Harleen Kaur , Timotheus Kampik , Jan Mendling","doi":"10.1016/j.is.2025.102568","DOIUrl":"10.1016/j.is.2025.102568","url":null,"abstract":"<div><div>A key concern of automatic process discovery is providing insights into business process performance. Process analysts are specifically interested in waiting times and delays for identifying opportunities to speed up processes. Against this backdrop, it is surprising that current techniques for automatic process discovery generate directly-follows graphs and comparable process models without representing the time axis explicitly. This paper presents four layout strategies for automatically constructing process models that explicitly align with a time axis. We exemplify our approaches for directly-follows graphs. We evaluate their effectiveness by applying them to real-world event logs with varying complexities. Our specific focus is on their ability to handle the trade-off between high control-flow abstraction and high consistency of temporal activity order. Our results show that timeline-based layouts provide benefits in terms of an explicit representation of temporal distances. They face challenges for logs with many repeating and concurrent activities.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102568"},"PeriodicalIF":3.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic conformance checking based on variable-length Markov chains","authors":"Emilio Incerto , Andrea Vandin , Sima Sarv Ahrabi","doi":"10.1016/j.is.2025.102561","DOIUrl":"10.1016/j.is.2025.102561","url":null,"abstract":"<div><div>Conformance checking is central in process mining (PM). It studies deviations of logs from reference processes. Originally, the proposed approaches did not focus on stochastic aspects of the underlying process, and gave qualitative models as output. Recently, these have been extended in approaches for <em>stochastic conformance checking</em> (SCC), giving quantitative models as output. A different community, namely the <em>software performance engineering</em> (PE) one, interested in the synthesis of stochastic processes since decades, has developed independently techniques to synthesize Markov Chains (MC) that describe the stochastic process underlying program runs. However, these were never applied to SCC problems. We propose a novel approach to SCC based on PE results for the synthesis of stochastic processes. Thanks to a rich experimental evaluation, we show that it outperforms the state-of-the-art. In doing so, we further bridge PE and PM, fostering cross-fertilization. We use techniques for the synthesis of Variable-length MC (VLMC), higher-order MC able to compactly encode complex path dependencies in the control-flow. VLMCs are equipped with a notion of likelihood that a trace belongs to a model. We use it to perform SCC of a log against a model. We establish the degree of conformance by equipping VLMCs with uEMSC, a standard conformance measure in the SCC literature. We compare with 18 SCC techniques from the PM literature, using 11 benchmark datasets from the PM community. We outperform all approaches in 10 out of 11 datasets, i.e., we get uEMSC values closer to 1 for logs conforming to a model. Furthermore, we show that VLMC are efficient, as they handled all considered datasets in a few seconds.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102561"},"PeriodicalIF":3.0,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stef van den Elzen , Mieke Jans , Niels Martin , Femke Pieters , Christian Tominski , Maria-Cruz Villa-Uriol , Sebastiaan J. van Zelst
{"title":"Towards Multi-Faceted Visual Process Analytics","authors":"Stef van den Elzen , Mieke Jans , Niels Martin , Femke Pieters , Christian Tominski , Maria-Cruz Villa-Uriol , Sebastiaan J. van Zelst","doi":"10.1016/j.is.2025.102560","DOIUrl":"10.1016/j.is.2025.102560","url":null,"abstract":"<div><div>Both the fields of Process Mining (PM) and Visual Analytics (VA) aim to make complex phenomena understandable. In PM, the goal is to gain insights into the execution of complex processes by analyzing the event data that is captured in event logs. This data is inherently multi-faceted, meaning that it covers various data facets, including spatial and temporal dependencies, relations between data entities (such as cases/events), and multivariate data attributes per entity. However, the multi-faceted nature of the data has not received much attention in PM. Conversely, VA research has investigated interactive visual methods for making multi-faceted data understandable for about two decades. In this study, we bring together PM and VA with the goal of advancing towards Visual Process Analytics (VPA) of multi-faceted processes. To this end, we present a systematic view of relevant (VA) data facets in the context of PM and assess to what extent existing PM visualizations address the data facets’ characteristics, making use of VA guidelines. In addition to visualizations, we look at how PM can benefit from analytical abstraction and interaction techniques known in the VA realm. Based on this, we discuss open challenges and opportunities for future research towards multi-faceted VPA.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102560"},"PeriodicalIF":3.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143934651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Support estimation in frequent itemsets mining on Enriched Two Level Tree","authors":"Clémentin Tayou Djamegni , William Kery Branston Ndemaze , Edith Belise Kenmogne , Hervé Maradona Nana Kouassi , Arnauld Nzegha Fountsop , Idriss Tetakouchom , Laurent Cabrel Tabueu Fotso","doi":"10.1016/j.is.2025.102559","DOIUrl":"10.1016/j.is.2025.102559","url":null,"abstract":"<div><div>Efficiently counting the support of candidate itemsets is a crucial aspect of extracting frequent itemsets because it directly impacts the overall performance of the mining process. Researchers have developed various techniques and data structures to overcome this challenge, but the problem is still open. In this paper, we investigate the two-level tree enrichment technique as a potential solution without adding significant computational overhead. In addition, we introduce ETL_Miner, a novel algorithm that provides an estimated bound for the support value of all candidate itemsets within the search space. The method presented in this article is flexible and can be used with various algorithms. To demonstrate this point, we introduce a modified version of Apriori that integrates ETL_Miner as an extra pruning phase. Preliminary empirical experimental results on both real and synthetic datasets confirm the accuracy of the proposed method and reduce the total extraction time.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102559"},"PeriodicalIF":3.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143934650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Substring compression variations and LZ78-Derivates","authors":"Dominik Köppl","doi":"10.1016/j.is.2025.102553","DOIUrl":"10.1016/j.is.2025.102553","url":null,"abstract":"<div><div>We propose algorithms computing the semi-greedy Lempel–Ziv 78 (LZ78), the Lempel–Ziv Double (LZD), and the Lempel–Ziv–Miller–Wegman (LZMW) factorizations in linear time for integer alphabets. For LZD and LZMW, we additionally propose data structures that can be constructed in linear time, which can solve the substring compression problems for these factorizations in time linear in the output size. For substring compression, we give the first results for lexparse and closed factorizations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102553"},"PeriodicalIF":3.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143906791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to resolve inconsistencies in qualitative constraint networks","authors":"Anastasia Paparrizou, Michael Sioutis","doi":"10.1016/j.is.2025.102557","DOIUrl":"10.1016/j.is.2025.102557","url":null,"abstract":"<div><div>In this paper, we present a reinforcement learning approach for resolving inconsistencies in qualitative constraint networks (<span><math><mi>QCN</mi></math></span>s). <span><math><mi>QCN</mi></math></span>s are typically used in constraint programming to represent and reason about intuitive spatial or temporal relations like <em>x</em> {<em>is inside of</em> <span><math><mo>∨</mo></math></span> <em>overlaps</em>} <em>y</em>. Naturally, <span><math><mi>QCN</mi></math></span>s are not immune to uncertainty, noise, or imperfect data that may be present in information, and thus, more often than not, they are hampered by inconsistencies. We propose a multi-armed bandit approach that defines a well-suited ordering of constraints for finding a maximal satisfiable subset of them. Specifically, our learning approach interacts with a solver, and after each trial a reward is returned to measure the performance of the selected action (constraint addition). The reward function is based on the reduction of the solution space of a consistent reconstruction of the input <span><math><mi>QCN</mi></math></span>. Experimental results with different bandit policies and various rewards that are obtained by our algorithm suggest that we can do better than the state of the art in terms of both effectiveness, viz., lower number of repairs obtained for an inconsistent <span><math><mi>QCN</mi></math></span>, and efficiency, viz., faster runtime.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102557"},"PeriodicalIF":3.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incremental checking of SQL assertions in an RDBMS","authors":"Xavier Oriol, Ernest Teniente","doi":"10.1016/j.is.2025.102550","DOIUrl":"10.1016/j.is.2025.102550","url":null,"abstract":"<div><div>The notion of SQL assertion was introduced, in SQL-92 standard, to define general constraints over a relational database. They can be used, for instance, to specify cross-row constraints or multitable check constraints. However, up to now, none of the current relational database management systems (RDBMSs) support SQL assertions due to the difficulty of providing an efficient solution.</div><div>To implement SQL assertions efficiently, the RDBMs require an incremental checking mechanism. I.e., given an assertion, the RDBMS should revalidate it only when a transaction changes data in a manner that could violate it, and only for the affected data. Some years ago, the deductive database community provided several <em>incremental checking</em> methods, however, their results could not get into practice in RDBMS.</div><div>In this paper, we propose an approach to efficiently implement SQL assertions in an RDBMS through an incremental revalidation technique. Such an approach is compatible with any RDBMS since it is fully based on standard SQL concepts (tables, triggers, and procedures). Our proposal uses and extends <em>the Event Rules</em>, an existing proposal for incremental checking in deductive databases. This extension is required to handle distributive aggregates, which pushes the expressiveness of the handled SQL assertions beyond first-order constraints. Moreover, we exploit this extension to improve the treatment of constraints involving existential variables, which are a very common kind of constraints difficult and expensive to handle. Finally, we show the efficiency of our approach through some experiments, and we formally prove its soundness and completeness.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102550"},"PeriodicalIF":3.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143848283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cong Xu , Mengxin Shi , Xiang Gao , Zhongkang Yin , Xiujuan Yao , Wei Li , Jiasen Yang
{"title":"A high-accuracy unsupervised statistical learning method for joint dangling entity detection and entity alignment","authors":"Cong Xu , Mengxin Shi , Xiang Gao , Zhongkang Yin , Xiujuan Yao , Wei Li , Jiasen Yang","doi":"10.1016/j.is.2025.102554","DOIUrl":"10.1016/j.is.2025.102554","url":null,"abstract":"<div><div>Dangling entities are common in knowledge graphs but there is a lack of research on entity alignment involving them. Most existing studies leverage neural network methods through supervised learning. However, these data-driven methods suffer from poor interpretability and high computation overhead. In this paper, we propose a Simple Unsupervised Dangling entity detection and entity Alignment method (SUDA)<span><span><sup>1</sup></span></span> without employing neural networks. Our method consists of three modules: entity embedding, dangling entity detection, and entity alignment. While the state-of-the-art Simple but Effective Unsupervised entity alignment method (SEU)<span><span><sup>2</sup></span></span> is incapable of dealing with dangling entities, SUDA further extends it and addresses the bilateral dangling entities problem. Theoretical proof of our method is given. We also design a new adjacent matrix for incorporating richer entity relations. Then we construct entity similarity outlier intervals to detect dangling entities and align entities through assignment problem after removing them. Extensive experiments demonstrate that our method outperforms those supervised and unsupervised methods. Additionally, in the entity alignment tasks, SUDA consumes less runtime compared to neural network methods, while maintaining high efficiency, interpretability, and stability. Code is available at <span><span>https://github.com/skyccong/SUDA.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102554"},"PeriodicalIF":3.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143838186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}