{"title":"Extended parameterized Burrows–Wheeler transform","authors":"Eric M. Osterkamp , Dominik Köppl","doi":"10.1016/j.is.2025.102611","DOIUrl":"10.1016/j.is.2025.102611","url":null,"abstract":"<div><div>The Burrows–Wheeler transform (BWT) lies at the heart of succinct and compressed full-text indexes for pattern matching queries. Notable variants are (a) the extended BWT (eBWT) capable to index multiple circular texts for pattern matching, or (b) the parameterized BWT (pBWT) for parameterized pattern matching. A natural extension is the combination of the virtues of both variants into a new data structure, whose name we coin with <em>extended parameterized BWT</em> (epBWT). We show that the epBWT supports pattern matching in context of parameterized pattern matching on multiple circular texts, within the same complexities as known solutions presented for the pBWT [Kim and Cho, IPL’21] for patterns not longer than the shortest indexed text. Additionally, we show how to compute the epBWT within the same complexities as [Iseri et al., ICALP’24], i.e., in compact space and quasilinear time. As an application, we extend the matching statistics problem to the parameterized pattern matching setting on circular texts.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102611"},"PeriodicalIF":3.4,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145106044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nesma M. Zaki , Iman M.A. Helal , Ehab E. Hassanein , Ahmed Awad
{"title":"Validating temporal compliance patterns: A unified approach with MTLf over various data models","authors":"Nesma M. Zaki , Iman M.A. Helal , Ehab E. Hassanein , Ahmed Awad","doi":"10.1016/j.is.2025.102623","DOIUrl":"10.1016/j.is.2025.102623","url":null,"abstract":"<div><div>Process mining extracts valuable insights from event data to help organizations improve their business processes, which is essential for their growth and success. By leveraging process mining techniques, organizations gain a comprehensive understanding of their processes’ execution, enabling the discovery of process models, detection of deviations, i.e., conformance checking, identification of bottlenecks, and assessment of performance. Compliance checking, a specific area within conformance checking, ensures that the organizational activities adhere to prescribed process models and regulations. Linear Temporal Logic over finite traces (<span><math><mrow><mi>L</mi><mi>T</mi><msub><mrow><mi>L</mi></mrow><mrow><mi>f</mi></mrow></msub></mrow></math></span> ) is commonly used for conformance checking, but it may not capture all temporal aspects accurately. This paper proposes Metric Temporal Logic over finite traces (<span><math><mrow><mi>M</mi><mi>T</mi><msub><mrow><mi>L</mi></mrow><mrow><mi>f</mi></mrow></msub></mrow></math></span> ) to define explicit time-related constraints effectively in addition to the implicit time-ordering covered by <span><math><mrow><mi>L</mi><mi>T</mi><msub><mrow><mi>L</mi></mrow><mrow><mi>f</mi></mrow></msub></mrow></math></span>. Therefore, it provides a universal formal approach to capture compliance rules. Moreover, we define a minimal set of generic <span><math><mrow><mi>M</mi><mi>T</mi><msub><mrow><mi>L</mi></mrow><mrow><mi>f</mi></mrow></msub></mrow></math></span> formulas and show that they are capable of capturing all the common patterns for compliance rules.</div><div>As compliance validation is largely driven by the data model used to represent the event logs, we provide a mapping from <span><math><mrow><mi>M</mi><mi>T</mi><msub><mrow><mi>L</mi></mrow><mrow><mi>f</mi></mrow></msub></mrow></math></span> to the common data models we found in the literature to encode event logs, namely, the relational and the graph models. A comprehensive study comparing various data models and an empirical evaluation across real-life event logs demonstrate the effectiveness of the proposed approach.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102623"},"PeriodicalIF":3.4,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145106043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic group recommender methodology: Leveraging temporal trust and confidence graphs","authors":"Khadijeh Rahimkhani, Kamran Zamanifar","doi":"10.1016/j.is.2025.102612","DOIUrl":"10.1016/j.is.2025.102612","url":null,"abstract":"<div><div>Group recommender systems aim to recommend items to groups with shared interests, aiming to satisfy each member. Managing trust and mutual influence within the group is a key challenge that influences the choice of items by users. These systems generate suggestions for the group based on inter-member trust. A less explored but critical aspect of this trust is its evolution, which can affect the group's item selections. This paper aims to assess the impact of time on trust in group recommendations. We begin by constructing a time-based confidence graph derived from the items selected by the group members. This graph allows us to measure the confidence levels between members and plays a crucial role in identifying their risk tolerance towards new items. Recognizing that members' risk-taking behavior can influence the group, we identify members who significantly affect group decisions. The confidence graph is periodically updated to reflect new user choices and the influence of key members. Ultimately, we introduce a novel method for calculating implicit trust based on similarity and confidence metrics, providing a recommendation list that maximizes group satisfaction based on the computed trust levels. Finally, the proposed method is evaluated using MovieLens100k, MovieLens10M, Epinions and Yelp datasets. The results demonstrate significant improvements in Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Precision, and group satisfaction measures compared to current state-of-the-art techniques.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102612"},"PeriodicalIF":3.4,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145027195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paweł Gawrychowski , Garance Gourdel , Tatiana Starikovskaya , Teresa Anna Steiner
{"title":"Compressed consecutive pattern matching","authors":"Paweł Gawrychowski , Garance Gourdel , Tatiana Starikovskaya , Teresa Anna Steiner","doi":"10.1016/j.is.2025.102607","DOIUrl":"10.1016/j.is.2025.102607","url":null,"abstract":"<div><div>Originating from the work of Navarro and Thankachan [TCS 2016], the problem of consecutive pattern matching is a variant of the fundamental pattern matching problem. In this problem, one is given a text and a pair of patterns <span><math><mrow><msub><mrow><mi>p</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math></span>, and must compute consecutive occurrences of <span><math><mrow><msub><mrow><mi>p</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math></span> in the text. Assuming that the text is given as a straight-line program of size <span><math><mi>g</mi></math></span>, we develop an algorithm that computes all consecutive occurrences of <span><math><mrow><msub><mrow><mi>p</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math></span> in optimal <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>g</mi><mo>+</mo><mrow><mo>|</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>|</mo></mrow><mo>+</mo><mrow><mo>|</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>|</mo></mrow><mo>+</mo><mi>output</mi><mo>)</mo></mrow></mrow></math></span> time, where <span><math><mi>output</mi></math></span> is the size of the output. As a corollary, we also derive an algorithm that reports all co-occurrences separated by a distance <span><math><mrow><mi>d</mi><mo>∈</mo><mrow><mo>[</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo>]</mo></mrow></mrow></math></span> in <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>g</mi><mo>+</mo><mrow><mo>|</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>|</mo></mrow><mo>+</mo><mrow><mo>|</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>|</mo></mrow><mo>+</mo><mi>output</mi><mo>)</mo></mrow></mrow></math></span> time and an algorithm that reports the top-<span><math><mi>k</mi></math></span> closest co-occurrences in <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>g</mi><mo>+</mo><mrow><mo>|</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>|</mo></mrow><mo>+</mo><mrow><mo>|</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>|</mo></mrow><mo>+</mo><mi>k</mi><mo>)</mo></mrow></mrow></math></span> time.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102607"},"PeriodicalIF":3.4,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jihane Mali , Shohreh Ahvar , Faten Atigui , Ahmed Azough , Nicolas Travers
{"title":"DaMoOp: A global approach for optimizing denormalized schemas through a multidimensional cost model","authors":"Jihane Mali , Shohreh Ahvar , Faten Atigui , Ahmed Azough , Nicolas Travers","doi":"10.1016/j.is.2025.102598","DOIUrl":"10.1016/j.is.2025.102598","url":null,"abstract":"<div><div>The complexity of database systems has increased alongside the exponential growth of data, necessitating Information Systems (IS) architects to continuously refine data models and meticulously select storage and management options that align with requirements. While existing solutions focus on data model transformation, none offer guidance in selecting the most suitable data model for a given use case. In this context, we propose <span>DaMoOp</span>, an automated approach for leading data model selection process. <span>DaMoOp</span> starts from a conceptual model and associated use case comprising queries, settings and infrastructure constraints, to generate relevant logical data models. A cost model, considering environmental, financial, and temporal factors, facilitates comparison and selection of the most suitable data model. Our cost model incorporates both data model and queries costs. Additionally, we suggest a data model selection process that enhances the ability to choose the optimal data model(s) for a specific use case, while also adapting to rapidly evolving use cases. We provide a strategic optimization approach designed to identify the most cost-efficient and stable data model as use case scenarios evolve. Moreover, we offer a simulation tool for the entire process, which enables visualizing the impact of use case variations on data model costs, thus empowering IS architects to make informed decisions.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102598"},"PeriodicalIF":3.4,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144988988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jana-Rebecca Rehse , Michael Grohs , Finn Klessascheck , Lisa-Marie Klein , Tatiana von Landesberger , Luise Pufahl
{"title":"A task taxonomy for conformance checking","authors":"Jana-Rebecca Rehse , Michael Grohs , Finn Klessascheck , Lisa-Marie Klein , Tatiana von Landesberger , Luise Pufahl","doi":"10.1016/j.is.2025.102605","DOIUrl":"10.1016/j.is.2025.102605","url":null,"abstract":"<div><div>Conformance checking is a sub-discipline of process mining, which compares observed process traces with a process model to analyze whether the process execution conforms with or deviates from the process design. Organizations can leverage this analysis, for example to check whether their processes comply with internal or external regulations or to identify potential improvements. Gaining these insights requires suitable visualizations, which make complex results accessible and actionable. So far, however, the development of conformance checking visualizations has largely been left to tool vendors. As a result, current tools offer a wide variety of visual representations for conformance checking, but the analytical purposes they serve often remain unclear. However, without a systematic understanding of these purposes, it is difficult to evaluate the visualizations’ usefulness. Such an evaluation hence requires a deeper understanding of conformance checking as an analysis domain. To this end, we propose a task taxonomy, which categorizes the tasks that can occur when conducting conformance checking analyses. This taxonomy supports researchers in determining the purpose of visualizations, specifying relevant conformance checking tasks in terms of their goal, means, constraint type, data characteristics, data target, and data cardinality. Combining concepts from process mining and visual analytics, we address researchers from both disciplines to enable and support closer collaborations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102605"},"PeriodicalIF":3.4,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Behavioral similarity in business process models: A perspective that needs more attention","authors":"Francesca Zampino , Laura Genga , Antonella Longo","doi":"10.1016/j.is.2025.102608","DOIUrl":"10.1016/j.is.2025.102608","url":null,"abstract":"<div><div>Although extensive research has explored business process model similarity, the coherence and structure of these studies remain underexplored. This paper systematically reviews the literature, with a particular focus on behavioral similarity. We conduct a systematic review of the literature on process model similarity, with a focus on two primary measurement approaches: trace-based and model-based similarity. Based on 99 reviewed studies, we developed a three-dimensional framework and conducted a quantitative comparison of selected similarity measures for deeper analysis. Our findings provide structured insights to strengthen the assessment of process model similarity, particularly from a behavioral perspective. The review process follows a six-phase systematic methodology, from the identification of relevant keywords to the creation of bibliographic maps that visually represent the findings. These insights offer a foundation for future research and practical applications within the field.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102608"},"PeriodicalIF":3.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Personalized Recommendation Systems: A systematic Review","authors":"Bachir Asri, Sara Qassimi, Said Rakrak","doi":"10.1016/j.is.2025.102594","DOIUrl":"10.1016/j.is.2025.102594","url":null,"abstract":"<div><div>Recommender systems assist users in navigating the vast selection of choices by offering personalized suggestions based on preferences. Originally used in e-commerce and streaming services, these systems are now applied in various sectors such as healthcare, education, and more, making them increasingly important. Despite their growth, recommender systems still face challenges, especially when addressing users whose preferences change over time.</div><div>This paper presents a review of recent research on recommender systems that deliver personalized and adaptive recommendations for users with evolving preferences. Analyzing 97 studies published between 2020 and 2024, the review categorizes them across multiple dimensions to address key research questions.</div><div>The findings reveal a diverse landscape of evaluation metrics, datasets, adaptation mechanisms, and application domains within adaptive personalized recommender systems (AdPRSs), with MovieLens as the most widely used dataset and the attention mechanism as the predominant adaptation approach. Furthermore, the review introduces a novel categorization of AdPRSs based on adaptation mechanism. By synthesizing current research, this review highlights key challenges faced in the field and identifies future directions for enhancing the efficiency and effectiveness of AdPRSs. These insights are of significant value to both practitioners and academic researchers, providing a foundation for advancing the development and optimization of AdPRSs.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"135 ","pages":"Article 102594"},"PeriodicalIF":3.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144890541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ZhuoFan Chen , Yao Hui Hoon , Renne Ye Kai Ong , Justin Juin Hng Wong
{"title":"GlintLM: Graph-Layered Integration with Nodal Topology with Language Models — A Bipartite Approach to Question Answering","authors":"ZhuoFan Chen , Yao Hui Hoon , Renne Ye Kai Ong , Justin Juin Hng Wong","doi":"10.1016/j.is.2025.102610","DOIUrl":"10.1016/j.is.2025.102610","url":null,"abstract":"<div><div>In modern Question Answering (QA) systems, Language Models (LMs) are often combined with Knowledge Graphs (KGs) to better handle challenges like word ambiguity and complex sentence structures. This combination helps LMs gain a deeper understanding by grounding them in structured knowledge. However, existing approaches often fall short in two areas: (1) they do not fully use the features of Knowledge Graphs and Graph Neural Networks (GNNs) during reasoning, and (2) they miss opportunities to better rank and filter information using the outputs of LMs and GNNs. To address this, we propose GlintLM, a system with two key innovations. First, the Enhanced Topological Node Representation (ETNR) module, which uses graph structure and a custom node feature method to improve reasoning. Second, the Multiplex Contextual Scorer (MCS) module, which combines pre-trained LM outputs with GNN attention to better score and filter relevant nodes. Together, these components create a more effective and adaptable system for QA. GlintLM demonstrates improved performance on common-sense (CommonsenseQA, OpenBookQA) and biomedical (MedQA-USMLE) QA benchmarks, showing improved performance across commonsense and medical domains.<span><span><sup>2</sup></span></span></div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"135 ","pages":"Article 102610"},"PeriodicalIF":3.4,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144890473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CrossER: A robust and adaptable generalized entity resolution framework for diverse and heterogeneous datasets","authors":"Yunong Tian , Ning Wang , Anshun Zhou","doi":"10.1016/j.is.2025.102609","DOIUrl":"10.1016/j.is.2025.102609","url":null,"abstract":"<div><div>Entity Resolution (ER) is a critical task in data cleaning and integration, traditionally focusing on structured relational tables with aligned schemas. However, real-world applications often involve diverse data formats, leading to the emergence of Generalized Entity Resolution, which addresses structured, semi-structured, and unstructured data. While prompt-based methods have shown promise in improving entity resolution, they suffer from significant limitations such as sensitivity to prompt design and instability across heterogeneous data formats. To address these challenges, we propose CrossER, a novel framework that integrates cross-attention mechanisms, contrastive learning, and data augmentation. CrossER employs a cross-attention module to dynamically align attributes across heterogeneous data sources, enabling accurate entity resolution. To enhance robustness, contrastive learning constructs discriminative feature representations, and data augmentation introduces variability to improve adaptability to noisy and complex datasets. Experimental results on multiple real-world datasets demonstrate that CrossER significantly outperforms state-of-the-art Generalized Entity Resolution methods in F1 scores while maintaining computational efficiency. Furthermore, CrossER exhibits minimal dependency on specific pre-trained language models and delivers superior recall rates compared to baseline methods, especially in challenging heterogeneous datasets.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"135 ","pages":"Article 102609"},"PeriodicalIF":3.4,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144858374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}