Jonatan M.N. Gøttcke, Arthur Zimek, Ricardo J.G.B. Campello
{"title":"Bayesian label distribution propagation: A semi-supervised probabilistic k nearest neighbor classifier","authors":"Jonatan M.N. Gøttcke, Arthur Zimek, Ricardo J.G.B. Campello","doi":"10.1016/j.is.2024.102507","DOIUrl":"10.1016/j.is.2024.102507","url":null,"abstract":"<div><div>Semi-supervised classification methods are specialized to use a very limited amount of labeled data for training and ultimately for assigning labels to the vast majority of unlabeled data. Label propagation is such a technique, that assigns labels to those parts of unlabeled data that are in some sense close to labeled examples and then uses these predicted labels in turn to predict labels of more remote data. Here we propose to not propagate an immediate label decision to neighbors but to propagate the label probability distribution. This way we keep more information and take into account the remaining uncertainty of the classifier. We employ a Bayesian schema that is more straightforward than existing methods. As a consequence, we avoid propagating errors by decisions taken too early. A crisp decision can be derived from the propagated label distributions at will. We implement and test this strategy with a probabilistic <span><math><mi>k</mi></math></span>-nearest neighbor classifier, providing semi-supervised classification results comparable to several state-of-the-art competitors in quality while being more efficient in terms of computational resources. Furthermore, we establish a theoretical connection between the <span><math><mi>k</mi></math></span>-nearest neighbor classifier and density-based label propagation.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102507"},"PeriodicalIF":3.0,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TREATS: Fairness-aware entity resolution over streaming data","authors":"Tiago Brasileiro Araújo , Vasilis Efthymiou , Vassilis Christophides , Evaggelia Pitoura , Kostas Stefanidis","doi":"10.1016/j.is.2024.102506","DOIUrl":"10.1016/j.is.2024.102506","url":null,"abstract":"<div><div>Currently, the growing proliferation of information systems generates large volumes of data continuously, stemming from a variety of sources such as web platforms, social networks, and multiple devices. These data, often lacking a defined schema, require an initial process of consolidation and cleansing before analysis and knowledge extraction can occur. In this context, Entity Resolution (ER) plays a crucial role, facilitating the integration of knowledge bases and identifying similarities among entities from different sources. However, the traditional ER process is computationally expensive, and becomes more complicated in the streaming context where the data arrive continuously. Moreover, there is a lack of studies involving fairness and ER, which is related to the absence of discrimination or bias. In this sense, fairness criteria aim to mitigate the implications of data bias in ER systems, which requires more than just optimizing accuracy, as traditionally done. Considering this context, this work presents TREATS, a schema-agnostic and fairness-aware ER workflow developed for managing streaming data incrementally. The proposed fairness-aware ER framework tackles constraints across various groups of interest, presenting a resilient and equitable solution to the related challenges. Through experimental evaluation, the proposed techniques and heuristics are compared against state-of-the-art approaches over five real-world data source pairs, in which the results demonstrated significant improvements in terms of fairness, without degradation of effectiveness and efficiency measures in the streaming environment. In summary, our contributions aim to propel the ER field forward by providing a workflow that addresses both technical challenges and ethical concerns.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102506"},"PeriodicalIF":3.0,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Abelló, Ladjel Bellatreche, Oscar Romero, Panos Vassiliadis, Robert Wrembel
{"title":"Advances in databases and information systems — Selected papers from ADBIS 2023","authors":"Alberto Abelló, Ladjel Bellatreche, Oscar Romero, Panos Vassiliadis, Robert Wrembel","doi":"10.1016/j.is.2024.102509","DOIUrl":"10.1016/j.is.2024.102509","url":null,"abstract":"","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102509"},"PeriodicalIF":3.0,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tewabe Chekole Workneh, Pietro Sala, Romeo Rizzi, Matteo Cristani
{"title":"Business Process Compliance with impact constraints","authors":"Tewabe Chekole Workneh, Pietro Sala, Romeo Rizzi, Matteo Cristani","doi":"10.1016/j.is.2024.102505","DOIUrl":"10.1016/j.is.2024.102505","url":null,"abstract":"<div><div>Business Process Compliance is a family of methods to evaluate Business Processes in terms of the existence of <em>one execution</em> (one trace) that does not violate constraints superimposed on the process itself. The dual version is formulated as the superimposition of a set of constraints and consequent evaluation of the process for <em>all the executions</em>. These problems are relevant to a large part of actual applications, especially those in the context of <em>regulatory compliance</em> where we aim at verifying the process against a normative background (including, for instance, soft ones, such as guidelines, product specification, and product standards) or goals fixed by the owner of the process. In this paper we discuss one new type of compliance, that is <em>impact compliance</em>, devised to verify when a process respects a set of constraints, to establish that certain amounts, measuring the undesired effects of the tasks executed to implement the process, are <em>below given limits</em>.</div><div>In the current literature on Business Process Management, Business Process Analysis, and Business Process Compliance, this type of compliance checking process has not yet been addressed. As we demonstrate in this paper, this problem is significant and complex to address.</div><div>In particular, we show that the checking problems described above are, under certain structural conditions, polynomially solvable on deterministic machines. In general, however, the first problem is NP-complete whilst the second is polynomially solvable on deterministic machines.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102505"},"PeriodicalIF":3.0,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computation over APT compressed data","authors":"Avivit Levy , Dana Shapira","doi":"10.1016/j.is.2024.102504","DOIUrl":"10.1016/j.is.2024.102504","url":null,"abstract":"<div><div>The Arithmetic Progressions Tree (<span>APT</span>) is a data structure storing an encoding of a monotonic sequence <span><math><mi>L</mi></math></span> in <span><math><mrow><mo>[</mo><mn>1</mn><mo>.</mo><mo>.</mo><mi>n</mi><mo>]</mo></mrow></math></span>. Previous work on APTs focused on its theoretical and experimental compression guarantees. This paper is the first to consider computations over <span>APT</span> compressed data. In particular:</div><div>1. We show how to perform a search for any sub-sequence/a set of the monotone sequence <span><math><mi>L</mi></math></span> in time proportional to the query sub-sequence length/set size multiplied by the size of the <em><span>APT</span> compressed representation of</em> <span><math><mi>L</mi></math></span>.</div><div>2. We show how, given the <span>APT</span> compressed representation of the monotone sequence <span><math><mi>L</mi></math></span>, we can find a minimum run-length of <span><math><mi>L</mi></math></span> in constant time, a maximum run-length of <span><math><mi>L</mi></math></span> in <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mo>log</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span> time, and all runs of <span><math><mi>L</mi></math></span> in constant time plus the output size.</div><div>3. We show how, given the <span>APT</span> compressed representation of the monotone sequence <span><math><mi>L</mi></math></span>, we can answer whether a consecutive periodic pattern <span><math><mi>P</mi></math></span> is represented by an <span>APT</span>-node in <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mo>log</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span> time and report occurrences of <span><math><mi>P</mi></math></span> in <span><math><mi>L</mi></math></span> within the processing time of the output size.</div><div>4. In addition, we improve the <span>APT</span> construction algorithm time and space complexity.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102504"},"PeriodicalIF":3.0,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143164403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Open V2X Management Platform: An intelligent charging station management system","authors":"Christos Dalamagkas, V.D. Melissianos, George Papadakis, Angelos Georgakis, Vasileios-Martin Nikiforidis, Kostas Hrissagis-Chrysagis","doi":"10.1016/j.is.2024.102494","DOIUrl":"10.1016/j.is.2024.102494","url":null,"abstract":"<div><div>We present an open-source web-based system, called Open V2X Management Platform (O-V2X-MP), which facilitates the management of charging points for electric vehicles with the goal of realizing Vehicle-to-Everything (V2X) scenarios. First, we describe its backend, which comprises several components connected through a microservices architecture leveraging Docker containers. Then, we elaborate on its frontend, which provides numerous functionalities for common users (i.e., EV drivers) and administrators. Finally, we demonstrate its data analytics capabilities, showing that O-V2X-MP can seamlessly integrate AI pipelines from the Python ecosystem. In particular, we examine two tasks of particular interest for charging point operators: (i) the clustering of EV drivers into profiles of predictable behavior, and (ii) the prediction of the overall daily load for each individual charging station. In our experiments, we use proprietary and public real-world data, verifying the high effectiveness achieved in both tasks.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102494"},"PeriodicalIF":3.0,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143164402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli, Stefano Rizzi
{"title":"VOOL: A modular insight-based framework for vocalizing OLAP sessions","authors":"Matteo Francia, Enrico Gallinucci, Matteo Golfarelli, Stefano Rizzi","doi":"10.1016/j.is.2024.102496","DOIUrl":"10.1016/j.is.2024.102496","url":null,"abstract":"<div><div>OLAP streamlines the exploration of multidimensional data cubes by allowing decision-makers to build sessions of analytical queries via a “point-and-click” interaction. However, new scenarios are appearing in which alternative forms of user-system communication, based for instance on natural language, are necessary. To cope with these scenarios, we present VOOL, an extensible framework for the vocalization of the results of OLAP sessions. To avoid flooding the user with long and tedious descriptions, we choose to vocalize only selected insights automatically extracted from query results. Insights are quantitative and rich-in-semantics characterizations of the results of an OLAP query, and they also take into account the user’s intentions as expressed by OLAP operators. Firstly, they are extracted using statistics and machine learning algorithms; then an optimization algorithm is applied to select the most relevant insights respecting a limit on the overall duration of vocalization. Finally, the selected insights are sorted into a comprehensive description that is vocalized to the user. After describing and formalizing our approach, we evaluate it from the points of view of efficiency, effectiveness, and operativity, also by comparing it with LLM-based applications.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"129 ","pages":"Article 102496"},"PeriodicalIF":3.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143164401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Humam Kourani , Sebastiaan J. van Zelst , Daniel Schuster , Wil M.P. van der Aalst
{"title":"Discovering partially ordered workflow models","authors":"Humam Kourani , Sebastiaan J. van Zelst , Daniel Schuster , Wil M.P. van der Aalst","doi":"10.1016/j.is.2024.102493","DOIUrl":"10.1016/j.is.2024.102493","url":null,"abstract":"<div><div>In many real-world scenarios, processes naturally define partial orders over their constituent tasks. Partially ordered representations can be exploited in process discovery as they facilitate modeling such processes. The Partially Ordered Workflow Language (POWL) extends partially ordered representations with control-flow operators to support modeling common process constructs such as choice and loop structures. POWL integrates the hierarchical nature of process trees with the flexibility of partially ordered representations, opening up significant opportunities in process discovery. This paper presents and compares various approaches for the automated discovery of POWL models. We investigate the effects of applying varying validity criteria to partial orders, and we propose methods for incorporating frequency information to improve the quality of the discovered models. Additionally, we propose alternative visualizations for POWL models, offering different approaches that may be useful in various contexts. The discovery approaches are evaluated using various real-life data sets, demonstrating the ability of POWL models to capture complex process structures.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102493"},"PeriodicalIF":3.0,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142744007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeroen Middelhuis , Riccardo Lo Bianco , Eliran Sherzer , Zaharah Bukhsh , Ivo Adan , Remco Dijkman
{"title":"Learning policies for resource allocation in business processes","authors":"Jeroen Middelhuis , Riccardo Lo Bianco , Eliran Sherzer , Zaharah Bukhsh , Ivo Adan , Remco Dijkman","doi":"10.1016/j.is.2024.102492","DOIUrl":"10.1016/j.is.2024.102492","url":null,"abstract":"<div><div>Efficient allocation of resources to activities is pivotal in executing business processes but remains challenging. While resource allocation methodologies are well-established in domains like manufacturing, their application within business process management remains limited. Existing methods often do not scale well to large processes with numerous activities or optimize across multiple cases. This paper aims to address this gap by proposing two learning-based methods for resource allocation in business processes to minimize the average cycle time of cases. The first method leverages Deep Reinforcement Learning (DRL) to learn policies by allocating resources to activities. The second method is a score-based value function approximation approach, which learns the weights of a set of curated features to prioritize resource assignments. We evaluated the proposed approaches on six distinct business processes with archetypal process flows, referred to as scenarios, and three realistically sized business processes, referred to as composite business processes, which are a combination of the scenarios. We benchmarked our methods against traditional heuristics and existing resource allocation methods. The results show that our methods learn adaptive resource allocation policies that outperform or are competitive with the benchmarks in five out of six scenarios. The DRL approach outperforms all benchmarks in all three composite business processes and finds a policy that is, on average, 12.7% better than the best-performing benchmark.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102492"},"PeriodicalIF":3.0,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142744006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Petri Puustinen, Maria Stratigi, Kostas Stefanidis
{"title":"STracker: A framework for identifying sentiment changes in customer feedbacks","authors":"Petri Puustinen, Maria Stratigi, Kostas Stefanidis","doi":"10.1016/j.is.2024.102491","DOIUrl":"10.1016/j.is.2024.102491","url":null,"abstract":"<div><div>Companies and organizations monitor customer satisfaction by collecting feedback through Likert scale questions and free-text responses. Freely expressed opinions, not bound to fixed questions, provide a detailed source of information that organizations can use to improve their daily operations. The organization’s quality assurance review processes require a timely follow-up on these customer opinions. However, solutions often address the analytics of textual information with topic discovery and sentiment analysis for a fixed time period. These frameworks also tend to focus on serving the purpose of a specific domain and terminology. In this study, we focus on a facilitation service to track discovered topics and their sentiments over time. This service is generic and can be applied to different domains. To evaluate the capabilities of the framework, we used two datasets with opposite types of wording. The study shows that the framework is capable of discovering similar topics over time and identifying their sentiment changes.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102491"},"PeriodicalIF":3.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}