Humam Kourani , Sebastiaan J. van Zelst , Daniel Schuster , Wil M.P. van der Aalst
{"title":"Discovering partially ordered workflow models","authors":"Humam Kourani , Sebastiaan J. van Zelst , Daniel Schuster , Wil M.P. van der Aalst","doi":"10.1016/j.is.2024.102493","DOIUrl":"10.1016/j.is.2024.102493","url":null,"abstract":"<div><div>In many real-world scenarios, processes naturally define partial orders over their constituent tasks. Partially ordered representations can be exploited in process discovery as they facilitate modeling such processes. The Partially Ordered Workflow Language (POWL) extends partially ordered representations with control-flow operators to support modeling common process constructs such as choice and loop structures. POWL integrates the hierarchical nature of process trees with the flexibility of partially ordered representations, opening up significant opportunities in process discovery. This paper presents and compares various approaches for the automated discovery of POWL models. We investigate the effects of applying varying validity criteria to partial orders, and we propose methods for incorporating frequency information to improve the quality of the discovered models. Additionally, we propose alternative visualizations for POWL models, offering different approaches that may be useful in various contexts. The discovery approaches are evaluated using various real-life data sets, demonstrating the ability of POWL models to capture complex process structures.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102493"},"PeriodicalIF":3.0,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142744007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeroen Middelhuis , Riccardo Lo Bianco , Eliran Sherzer , Zaharah Bukhsh , Ivo Adan , Remco Dijkman
{"title":"Learning policies for resource allocation in business processes","authors":"Jeroen Middelhuis , Riccardo Lo Bianco , Eliran Sherzer , Zaharah Bukhsh , Ivo Adan , Remco Dijkman","doi":"10.1016/j.is.2024.102492","DOIUrl":"10.1016/j.is.2024.102492","url":null,"abstract":"<div><div>Efficient allocation of resources to activities is pivotal in executing business processes but remains challenging. While resource allocation methodologies are well-established in domains like manufacturing, their application within business process management remains limited. Existing methods often do not scale well to large processes with numerous activities or optimize across multiple cases. This paper aims to address this gap by proposing two learning-based methods for resource allocation in business processes to minimize the average cycle time of cases. The first method leverages Deep Reinforcement Learning (DRL) to learn policies by allocating resources to activities. The second method is a score-based value function approximation approach, which learns the weights of a set of curated features to prioritize resource assignments. We evaluated the proposed approaches on six distinct business processes with archetypal process flows, referred to as scenarios, and three realistically sized business processes, referred to as composite business processes, which are a combination of the scenarios. We benchmarked our methods against traditional heuristics and existing resource allocation methods. The results show that our methods learn adaptive resource allocation policies that outperform or are competitive with the benchmarks in five out of six scenarios. The DRL approach outperforms all benchmarks in all three composite business processes and finds a policy that is, on average, 12.7% better than the best-performing benchmark.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102492"},"PeriodicalIF":3.0,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142744006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Petri Puustinen, Maria Stratigi, Kostas Stefanidis
{"title":"STracker: A framework for identifying sentiment changes in customer feedbacks","authors":"Petri Puustinen, Maria Stratigi, Kostas Stefanidis","doi":"10.1016/j.is.2024.102491","DOIUrl":"10.1016/j.is.2024.102491","url":null,"abstract":"<div><div>Companies and organizations monitor customer satisfaction by collecting feedback through Likert scale questions and free-text responses. Freely expressed opinions, not bound to fixed questions, provide a detailed source of information that organizations can use to improve their daily operations. The organization’s quality assurance review processes require a timely follow-up on these customer opinions. However, solutions often address the analytics of textual information with topic discovery and sentiment analysis for a fixed time period. These frameworks also tend to focus on serving the purpose of a specific domain and terminology. In this study, we focus on a facilitation service to track discovered topics and their sentiments over time. This service is generic and can be applied to different domains. To evaluate the capabilities of the framework, we used two datasets with opposite types of wording. The study shows that the framework is capable of discovering similar topics over time and identifying their sentiment changes.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102491"},"PeriodicalIF":3.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paolo Ferragina, Mariagiovanna Rotundo, Giorgio Vinciguerra
{"title":"Two-level massive string dictionaries","authors":"Paolo Ferragina, Mariagiovanna Rotundo, Giorgio Vinciguerra","doi":"10.1016/j.is.2024.102490","DOIUrl":"10.1016/j.is.2024.102490","url":null,"abstract":"<div><div>We study the problem of engineering space–time efficient data structures that support membership and rank queries on <em>very</em> large static dictionaries of strings.</div><div>Our solution is based on a very simple approach that decouples string storage and string indexing by means of a block-wise compression of the sorted dictionary strings (to be stored in external memory) and a succinct implementation of a Patricia trie (to be stored in internal memory) built on the first string of each block. On top of this, we design an in-memory cache that, given a sample of the query workload, augments the Patricia trie with additional information to reduce the number of I/Os of future queries.</div><div>Our experimental evaluation on two new datasets, which are at least one order of magnitude larger than the ones used in the literature, shows that (i) the state-of-the-art compressed string dictionaries, compared to Patricia tries, do not provide significant benefits when used in a large-scale indexing setting, and (ii) our two-level approach enables the indexing and storage of 3.5 billion strings taking 273 GB in just less than 200 MB of internal memory and 83 GB of compressed disk space, while still guaranteeing comparable or faster query performance than those offered by array-based solutions used in modern storage systems, such as RocksDB, thus possibly influencing their future design.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102490"},"PeriodicalIF":3.0,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A generative and discriminative model for diversity-promoting recommendation","authors":"Yuli Liu","doi":"10.1016/j.is.2024.102488","DOIUrl":"10.1016/j.is.2024.102488","url":null,"abstract":"<div><div>Diversity-promoting recommender systems with the goal of recommending diverse and relevant results to users, have received significant attention. However, current studies often face a trade-off: they either recommend highly accurate but homogeneous items or boost diversity at the cost of relevance, making it challenging for users to find truly satisfying recommendations that meet both their obvious and potential needs. To overcome this competitive trade-off, we introduce a unified framework that simultaneously leverages a discriminative model and a generative model. This approach allows us to adjust the focus of learning dynamically. Specifically, our framework uses Variational Graph Auto-Encoders to enhance the diversity of recommendations, while Graph Convolution Networks are employed to ensure high accuracy in predicting user preferences. This dual focus enables our system to deliver recommendations that are both diverse and closely aligned with user interests. Inspired by the quality <em>vs.</em> diversity decomposition of Determinantal Point Process (DPP) kernel, we design the DPP likelihood-based loss function as the joint modeling loss. Extensive experiments on three real-world datasets, demonstrating that the unified framework goes beyond quality-diversity trade-off, <em>i.e.</em>, instead of sacrificing accuracy for promoting diversity, the joint modeling actually boosts both metrics.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102488"},"PeriodicalIF":3.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soundness unknotted: An efficient soundness checking algorithm for arbitrary cyclic process models by loosening loops","authors":"Thomas M. Prinz , Yongsun Choi , N. Long Ha","doi":"10.1016/j.is.2024.102476","DOIUrl":"10.1016/j.is.2024.102476","url":null,"abstract":"<div><div>Although domain experts usually create business process models, these models can still contain errors. For this reason, research and practice establish criteria for process models to provide confidence in the correctness or correct behavior of processes. One widespread criterion is soundness, which guarantees the absence of deadlocks and lacks of synchronization. Checking soundness of process models is not trivial. However, cyclic process models additionally increase the complexity to check soundness. This paper presents a novel approach for verifying soundness that has an efficient cubic worst-case runtime behavior, even for arbitrary cyclic process models. This approach relies on three key techniques — loop conversion, loop reduction, and loop decomposition — to convert any cyclic process model into a set of acyclic process models. Using this approach, we have developed five straightforward rules to verify the soundness, reusing existing approaches for checking soundness of acyclic models.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102476"},"PeriodicalIF":3.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pavol Jurik , Peter Schmidt , Martin Misut , Ivan Brezina , Marian Reiff
{"title":"The composition diagram of a complex process: Enhancing understanding of hierarchical business processes","authors":"Pavol Jurik , Peter Schmidt , Martin Misut , Ivan Brezina , Marian Reiff","doi":"10.1016/j.is.2024.102489","DOIUrl":"10.1016/j.is.2024.102489","url":null,"abstract":"<div><div>The article presents the Composition Diagram of a Complex Process (CDCP), a new diagramming method for modelling business processes with complex vertical structures. This Method addresses the limitations of traditional modelling techniques such as BPMN, Activity Diagrams (AD), and Event-Driven Process Chains (EPC).</div><div>The experiment was carried out on 277 students from different study programs and grades to determine the effectiveness of the methods. The main objective was to evaluate the usability and effectiveness of CDCP compared to established methods, focusing on two primary tasks: interpretation and diagram creation. The participant's performance was evaluated based on the objective results of the tasks and the subjective feedback of the questionnaire. The results indicate that CDCP was the effective method for the reading and drawing tasks, outperforming BPMN and EPC in terms of understanding and ease of use. Statistical analysis of variance showed that while the year of the study did not significantly affect performance, the study program and Method used had a significant effect. These findings highlight the potential of CDCP as a more accessible and intuitive business process modelling tool, even for users with minimal prior experience.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102489"},"PeriodicalIF":3.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baocheng Yang , Bing Zhang , Kevin Cutsforth , Shanfu Yu , Xiaowen Yu
{"title":"Emerging industry classification based on BERT model","authors":"Baocheng Yang , Bing Zhang , Kevin Cutsforth , Shanfu Yu , Xiaowen Yu","doi":"10.1016/j.is.2024.102484","DOIUrl":"10.1016/j.is.2024.102484","url":null,"abstract":"<div><div>Accurate industry classification is central to economic analysis and policy making. Current classification systems, while foundational, exhibit limitations in the face of the exponential growth of big data. These limitations include subjectivity, leading to inconsistencies and misclassifications. To overcome these shortcomings, this paper focuses on utilizing the BERT model for classifying emerging industries through the identification of salient attributes within business descriptions. The proposed method identifies clusters of firms within distinct industries, thereby transcending the restrictions inherent in existing classification systems. The model exhibits an impressive degree of precision in categorizing business descriptions, achieving accuracy rates spanning from 84.11% to 99.66% across all 16 industry classifications. This research enriches the field of industry classification literature through a practical examination of the efficacy of machine learning techniques. Our experiments achieved strong performance, highlighting the effectiveness of the BERT model in accurately classifying and identifying emerging industries, providing valuable insights for industry analysts and policymakers.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102484"},"PeriodicalIF":3.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ExamGuard: Smart contracts for secure online test","authors":"Mayuri Diwakar Kulkarni, Ashish Awate, Makarand Shahade, Bhushan Nandwalkar","doi":"10.1016/j.is.2024.102485","DOIUrl":"10.1016/j.is.2024.102485","url":null,"abstract":"<div><div>The education sector is currently experiencing profound changes, primarily driven by the widespread adoption of online platforms for conducting examinations. This paper delves into the utilization of smart contracts as a means to revolutionize the monitoring and execution of online examinations, thereby guaranteeing the traceability of evaluation data and examinee activities. In this context, the integration of advanced technologies such as the PoseNet algorithm, derived from the TensorFlow Model, emerges as a pivotal component. By leveraging PoseNet, the system adeptly identifies both single and multiple faces of examinees, thereby ensuring the authenticity and integrity of examination sessions. Moreover, the incorporation of the COCO dataset facilitates the recognition of objects within examination environments, further bolstering the system's capabilities in monitoring examinee activities.of paramount importance is the secure storage of evidence collected during examinations, a task efficiently accomplished through the implementation of the blockchain technology. This platform not only ensures the immutability of data but also safeguards against potential instances of tampering, thereby upholding the credibility of examination results. Through the utilization of smart contracts, the proposed framework not only streamlines the examination process but also instills transparency and integrity, thereby addressing inherent challenges encountered in traditional examination methods. One of the key advantages of this technological integration lies in its ability to modernize examination procedures while concurrently reinforcing trust and accountability within the educational assessment ecosystem. By harnessing the power of smart contracts, educational institutions can mitigate concerns pertaining to data manipulation and malpractice, thereby fostering a more secure and reliable examination environment. Furthermore, the transparency afforded by blockchain technology ensures that examination outcomes are verifiable and auditable, instilling confidence among stakeholders and enhancing the overall credibility of the assessment process. In conclusion, the adoption of smart contracts represents a paradigm shift in the realm of educational assessment, offering a comprehensive solution to the challenges posed by traditional examination methods. By embracing advanced technologies such as PoseNet and blockchain, educational institutions can not only streamline examination procedures but also uphold the highest standards of integrity and accountability. As such, the integration of smart contracts holds immense potential in shaping the future of online examinations, paving the way for a more efficient, transparent, and trustworthy assessment ecosystem.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102485"},"PeriodicalIF":3.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explaining results of path queries on graphs: Single-path results for context-free path queries","authors":"Jelle Hellings","doi":"10.1016/j.is.2024.102475","DOIUrl":"10.1016/j.is.2024.102475","url":null,"abstract":"<div><div>Many graph query languages use, at their core, <em>path queries</em> that yield node pairs <span><math><mrow><mo>(</mo><mi>m</mi><mo>,</mo><mi>n</mi><mo>)</mo></mrow></math></span> that are connected by a path of interest. For the end-user, such node pairs only give limited insight as to <em>why</em> this result is obtained, as the pair does not directly identify the underlying path of interest.</div><div>In this paper, we propose the <em>single-path semantics</em> to address this limitation of path queries. Under single-path semantics, path queries evaluate to a single path connecting nodes <span><math><mi>m</mi></math></span> and <span><math><mi>n</mi></math></span> and that satisfies the conditions of the query. To put our proposal in practice, we provide an efficient algorithm for evaluating <em>context-free path queries</em> using the single-path semantics. Additionally, we perform a short evaluation of our techniques that shows that the single-path semantics is practically feasible, even when query results grow large.</div><div>In addition, we explore the formal relationship between the single-path semantics we propose the problem of finding the <em>shortest string</em> in the intersection of a regular language (representing a graph) and a context-free language (representing a path query). As our formal results show, there is a distinction between the complexity of the single-path semantics for queries that use a single edge label and queries that use multiple edge labels: for queries that use a single edge label, the length of the shortest path is <em>linearly upper bounded</em> by the number of nodes in the graph; whereas for queries that use multiple edge labels, the length of the shortest path has a worst-case <em>quadratic lower bound</em>.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102475"},"PeriodicalIF":3.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}