{"title":"Data-driven prescriptive analytics applications: A comprehensive survey","authors":"Martin Moesmann, Torben Bach Pedersen","doi":"10.1016/j.is.2025.102576","DOIUrl":"10.1016/j.is.2025.102576","url":null,"abstract":"<div><div>Prescriptive Analytics (PSA), an emerging business analytics field suggesting concrete options for solving business problems, has seen an increasing amount of interest after more than a decade of multidisciplinary research. This paper is a comprehensive survey of existing applications within PSA in terms of their use cases, methodologies, and possible future research directions. To ensure a manageable scope, we focus on PSA applications that develop data-driven, automatic workflows, i.e., <em>Data-Driven PSA (DPSA)</em>. Following a systematic methodology, we identify and include 104 papers in our survey. As our key contributions, we derive a number of novel taxonomies of the field and use them to analyse the field’s temporal development. In terms of use cases, we derive <em>10 application domains</em> for DPSA, from Healthcare to Manufacturing, and subsumed problem types within each. In terms of <em>individual</em> method usage, we derive <em>5 method types</em> and map them to a comprehensive taxonomy of method usage within DPSA applications, covering <em>mathematical optimization</em>, <em>data mining and machine learning</em>, <em>probabilistic modelling</em>, <em>domain expertise</em>, as well as <em>simulations</em>. As for <em>combined</em> method usage, we provide a statistical overview of how different method usage combinations are distributed and derive <em>2 generic workflow patterns</em> along with subsumed workflow patterns, combining methods by either sequential or simultaneous relationships. Finally, we derive <em>5 possible research directions</em> based on frequently recurring issues among surveyed papers, suggesting new frontiers in terms of methods, tools, and use cases.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"134 ","pages":"Article 102576"},"PeriodicalIF":3.0,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144364662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Root-cause analysis of business processes: How humans utilize multiple sources of information to explain observations","authors":"Arava Tsoury , Pnina Soffer , Iris Reinhartz-Berger","doi":"10.1016/j.is.2025.102578","DOIUrl":"10.1016/j.is.2025.102578","url":null,"abstract":"<div><div>Root-cause analysis of business processes seeks explanations and solutions to observed behaviors and problems in organizational business processes. Such analysis is usually based on event logs, utilizing process mining techniques. However, event logs hold a limited set of data attributes, and the analysis depends on data availability. To overcome this dependency, event log data can be complemented from additional sources that are commonly available in organizations. The aim of this research is to investigate how humans utilize potential combinations of event logs, databases, and transaction logs to explain observations. In particular, we conducted an empirical study, involving 73 participants, in order to: (1) find how these information sources and their combinations are used for answering questions related to violation of business rules; (2) identify composite operations that are performed when combining the information sources; and (3) gain insights into the perceived usefulness and usability of these combinations. Our findings provide evidence of the dominance of databases and event logs as the main sources of information. We further succeeded to classify typical composite operations into organizational information extension, behavioral information extension/refinement, single-source manipulation, and multi-source manipulation. Finally, these findings call for further support in process analysis and mining environments to improve usefulness and usability of multi-source root-cause analysis.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"134 ","pages":"Article 102578"},"PeriodicalIF":3.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144365117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Fehrer , Linda Moder , Maximilian Röglinger
{"title":"An interactive approach for group-based event log exploration","authors":"Tobias Fehrer , Linda Moder , Maximilian Röglinger","doi":"10.1016/j.is.2025.102575","DOIUrl":"10.1016/j.is.2025.102575","url":null,"abstract":"<div><div>A major goal in process mining is to analyze processes to determine possible improvements. However, event logs often bear substantial complexity, posing challenges for process analysts. Consequently, analysts often split event logs into more serviceable groups. While tool support is a crucial enabler for this task, and many approaches for event log analysis are available, a gap remains regarding tools and methods for organizing and structuring event logs. To address this gap, we propose the Case Group Explorer, an approach to support event log grouping using interaction and visualization computationally. We instantiate our artifact as a software prototype and evaluate it through a competing artifact analysis, the application on several event logs, and a user study involving 13 practitioners. Thus, we contribute by creating design knowledge for event log exploration and process group analysis at the intersection of process analysis and visual analytics.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"134 ","pages":"Article 102575"},"PeriodicalIF":3.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PARK: Personalized academic retrieval with knowledge-graphs","authors":"Pranav Kasela , Gabriella Pasi , Raffaele Perego","doi":"10.1016/j.is.2025.102574","DOIUrl":"10.1016/j.is.2025.102574","url":null,"abstract":"<div><div>Academic Search is a search task aimed to manage and retrieve scientific documents like journal articles and conference papers. Personalization in this context meets individual researchers’ needs by leveraging, through user profiles, the user related information (e.g. documents authored by a researcher), to improve search effectiveness and to reduce the information overload. While citation graphs are a valuable means to support the outcome of recommender systems, their use in personalized academic search (with, e.g. nodes as papers and edges as citations) is still under-explored.</div><div>Existing personalized models for academic search often struggle to fully capture users’ academic interests. To address this, we propose a two-step approach: first, training a neural language model for retrieval, then converting the academic graph into a knowledge graph and embedding it into a shared semantic space with the language model using translational embedding techniques. This allows user models to capture both explicit relationships and hidden structures in citation graphs and paper content. We evaluate our approach in four academic search domains, outperforming traditional graph-based and personalized models in three out of four, with up to a 10% improvement in MAP@100 over the second-best model. This highlights the potential of knowledge graph-based user models to enhance retrieval effectiveness.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"134 ","pages":"Article 102574"},"PeriodicalIF":3.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giacomo Acitelli , Anti Alman , Fabrizio Maria Maggi , Andrea Marrella
{"title":"Achieving framed autonomy in AI-augmented business process management systems through automated planning","authors":"Giacomo Acitelli , Anti Alman , Fabrizio Maria Maggi , Andrea Marrella","doi":"10.1016/j.is.2025.102573","DOIUrl":"10.1016/j.is.2025.102573","url":null,"abstract":"<div><div>AI-augmented Business Process Management Systems (ABPMSs) are an emerging class of process-aware information systems empowered by Artificial Intelligence (AI) technology for autonomously unfolding and adapting the execution flow of business processes (BPs) within a set of potentially conflicting procedural and declarative constraints, called <em>process framing</em>. In this respect, <em>framed autonomy</em> enables an ABPMS to autonomously decide how to progress the execution of a BP, as long as the boundaries imposed by the frame are respected. Among these constraints, there could be a partial BP execution that needs to be completed, activating a different near-optimal framing that enables the BP to progress its execution. In this paper, we present an <em>automata-based technique</em> that pairs <em>constraint-based framing</em> with <em>automated planning</em> in AI to recommend, given a partial BP execution trace, the continuation of that trace that minimizes the violation cost of the conforming space defined by the process frame. We report on the results of experiments of increasing complexity to showcase our technique’s performance and scalability.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102573"},"PeriodicalIF":3.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144231848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ADAPT: Fairness & diversity for sequential group recommendations","authors":"Emilia Lenzi , Kostas Stefanidis","doi":"10.1016/j.is.2025.102572","DOIUrl":"10.1016/j.is.2025.102572","url":null,"abstract":"<div><div>In group recommendation systems, achieving a balance between fairness and diversity is a challenging yet crucial task, particularly in sequential settings where preferences evolve over multiple iterations. This paper introduces ADAPT, a novel framework designed to optimize fairness and diversity in sequential group recommendations. ADAPT employs two novel aggregation methods, FaDJO and DiGSFO, to equitably meet group members’ needs while promoting diverse content. In addition to the novel aggregation methods ADAPT introduces a novel definition for the inter-round diversity based on item-lists embeddings. Experimental results on three real datasets and different group formation demonstrate ADAPT’s ability to optimize user satisfaction, fairness, and diversity, outperforming baseline methods in two different metrics (f-score and NDCG) and highlighting the importance of balancing these critical factors in sequential group settings.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102572"},"PeriodicalIF":3.0,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Félix Iglesias Vázquez , Tanja Zseby , Arthur Zimek
{"title":"Parameterization-free clustering with sparse data observers","authors":"Félix Iglesias Vázquez , Tanja Zseby , Arthur Zimek","doi":"10.1016/j.is.2025.102562","DOIUrl":"10.1016/j.is.2025.102562","url":null,"abstract":"<div><div>Given a set of data points, clustering serves to discover groups based on pairwise similarities and the shapes drawn by the data in the feature space. In other words, it is a tool to describe data and reveal their intrinsic nature in terms of patterns or groups. In this paper, we review the methodology of clustering when used to explore a priori unknown data, i.e., we do not know how data spaces are manipulated, how algorithms are tuned, and how results are validated. Under this practical approach, we examine the advantages of SDOclust, a clustering method that stands out for its simplicity, lightness, no need for parameterization and not being subject to traditional clustering limitations. We test SDOclust and main established alternatives — HDBSCAN, <span><math><mi>k</mi></math></span>-means--, Fuzzy C-means, Hierarchical Clustering, CLASSIX, and N2D Deep Clustering — by extensive experimentation with more than 200 datasets, both real and synthetic, that have been collected from the literature on evaluation and represent different data analysis challenges. We submit only SDOclust to unfavorable testing conditions by denying it a parameter tuning phase. Nevertheless, its overall performance is excellent and positions it as one of the best general-purpose alternatives.</div><div>With deep clustering as the consolidation of a new paradigm, trends in clustering consist mainly in projecting data into spaces that are easier to dissect. Therefore, in cases where the original space does not show clustering-friendly structures and when we can assume transformation costs, SDOclust easily adapts and is a most natural choice to perform the partitioning task.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102562"},"PeriodicalIF":3.0,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144130817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe De Giacomo , Marco Favorito , Luciana Silo
{"title":"Service composition for ltl f task specifications","authors":"Giuseppe De Giacomo , Marco Favorito , Luciana Silo","doi":"10.1016/j.is.2025.102571","DOIUrl":"10.1016/j.is.2025.102571","url":null,"abstract":"<div><div>Service compositions <em>à la</em> Roman model consist of realizing a virtual service by orchestrating suitably, a set of already available services, where all services are described procedurally as (possibly nondeterministic) transition systems. In this paper, we study a goal-oriented variant of the service composition <em>à la</em> Roman Model, where the goals specified allowed traces declaratively via Linear Temporal Logic on finite traces (<span>ltl</span> <sub><em>f</em></sub>). Specifically, we synthesize a controller to orchestrate the available services to produce a trace satisfying a specification in <span>ltl</span> <sub><em>f</em></sub>. We demonstrate that this framework has several interesting applications, like Smart Manufacturing and Digital Twins.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102571"},"PeriodicalIF":3.0,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144134672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Zhang , Di Mei , Haozheng Luo , Chenwei Xu , Richard Tzong-Han Tsai
{"title":"SMUTF: Schema Matching Using Generative Tags and Hybrid Features","authors":"Yu Zhang , Di Mei , Haozheng Luo , Chenwei Xu , Richard Tzong-Han Tsai","doi":"10.1016/j.is.2025.102570","DOIUrl":"10.1016/j.is.2025.102570","url":null,"abstract":"<div><div>We introduce <strong>SMUTF</strong> (<strong>S</strong>chema <strong>M</strong>atching <strong>U</strong>sing Generative <strong>T</strong>ags and Hybrid <strong>F</strong>eatures), a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the Humanitarian Exchange Language, we deploy ”generative tags” for each data column, enhancing the effectiveness of SM. SMUTF exhibits extensive versatility, working seamlessly with any pre-existing pre-trained embeddings, classification methods, and generative models.</div><div>Recognizing the lack of extensive, publicly available datasets for SM, we have created and open-sourced the HDXSM dataset from the public humanitarian data. We believe this to be the most exhaustive SM dataset currently available. In evaluations across various public datasets and the novel HDXSM dataset, SMUTF demonstrated exceptional performance, surpassing existing state-of-the-art models in terms of accuracy and efficiency, and improving the F1 score by 11.84% and the AUC of ROC by 5.08%. Code is available at <span><span>GitHub</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102570"},"PeriodicalIF":3.0,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144107794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy-preserving record linkage using reference set based encoding: A single parameter method","authors":"Sumayya Ziyad , Peter Christen , Anushka Vidanage , Charini Nanayakkara , Rainer Schnell","doi":"10.1016/j.is.2025.102569","DOIUrl":"10.1016/j.is.2025.102569","url":null,"abstract":"<div><div>Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a <em>novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy.</em> All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102569"},"PeriodicalIF":3.0,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144089512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}