{"title":"Enhancing business process simulation models with extraneous activity delays","authors":"David Chapela-Campa, Marlon Dumas","doi":"10.1016/j.is.2024.102346","DOIUrl":"10.1016/j.is.2024.102346","url":null,"abstract":"<div><p><span>Business Process Simulation (BPS) is a common approach to estimate the impact of changes to a business process on its performance measures. For example, it allows us to estimate what would be the cycle time of a process if we automated one of its activities, or if some resources become unavailable. The starting point of BPS is a business process model annotated with simulation parameters (a BPS model). In traditional approaches, BPS models are manually designed by modeling specialists. This approach is time-consuming and error-prone. To address this shortcoming, several studies have proposed methods to automatically discover BPS models from event logs via process mining techniques. However, current techniques in this space discover BPS models that only capture waiting times caused by </span>resource contention or resource unavailability. Oftentimes, a considerable portion of the waiting time in a business process corresponds to extraneous delays, e.g., a resource waits for the customer to return a phone call. This article proposes a method that discovers extraneous delays from event logs of business process executions. The proposed approach computes, for each pair of causally consecutive activity instances in the event log, the time when the target activity instance should theoretically have started, given the availability of the relevant resource. Based on the difference between the theoretical and the actual start times, the approach estimates the distribution of extraneous delays, and it enhances the BPS model with timer events to capture these delays. An empirical evaluation involving synthetic and real-life logs shows that the approach produces BPS models that better reflect the temporal dynamics of the process, relative to BPS models that do not capture extraneous delays.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"122 ","pages":"Article 102346"},"PeriodicalIF":3.7,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139516507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Repairing raw metadata for metadata management","authors":"Hiba Khalid, Esteban Zimányi","doi":"10.1016/j.is.2024.102344","DOIUrl":"10.1016/j.is.2024.102344","url":null,"abstract":"<div><p>With the exponential growth of data production, the generation of metadata has become an integral part of the process. Metadata plays a crucial role in facilitating enhanced data analytics, data integration, and resource management by offering valuable insights. However, inconsistencies arise due to deviations from standards in metadata recording, including missing attribute information, publishing URLs, and provenance. Furthermore, the recorded metadata may exhibit inconsistencies, such as varied value formats, special characters, and inaccurately entered values. Addressing these inconsistencies through metadata preparation can greatly enhance the user experience during data management tasks.</p><p>This paper introduces MDPrep, a system that explores the usability and applicability of data preparation techniques in improving metadata quality. Our approach involves three steps: (1) detecting and identifying problematic metadata elements and structural issues, (2) employing a keyword-based approach to enhance metadata elements and a syntax-based approach to rectify structural metadata issues, and (3) comparing the outcomes to ensure improved readability and reusability of prepared metadata files.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"122 ","pages":"Article 102344"},"PeriodicalIF":3.7,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139510480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Filtering with relational similarity","authors":"Vladimir Mic , Pavel Zezula","doi":"10.1016/j.is.2024.102345","DOIUrl":"10.1016/j.is.2024.102345","url":null,"abstract":"<div><p>For decades, the success of the similarity search has been based on detailed quantifications of pairwise similarities of objects. Currently, the search features have become much more precise but also bulkier, and the similarity computations are more time-consuming. We show that nearly no precise similarity quantifications are needed to evaluate the <span><math><mi>k</mi></math></span> nearest neighbours (<span><math><mi>k</mi></math></span>NN) queries that dominate real-life applications. Based on the well-known fact that a selection of the most similar alternative out of several options is a much easier task than deciding the absolute similarity scores, we propose the search based on an epistemologically simpler concept of relational similarity. Having arbitrary objects <span><math><mrow><mi>q</mi><mo>,</mo><msub><mrow><mi>o</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>o</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math></span> from the search domain, the <span><math><mi>k</mi></math></span>NN search is solvable just by the ability to choose the more similar object to <span><math><mi>q</mi></math></span> out of <span><math><mrow><msub><mrow><mi>o</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>o</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math></span>. To support the filtering efficiency, we also consider a neutral option, i.e., equal similarities of <span><math><mrow><mi>q</mi><mo>,</mo><msub><mrow><mi>o</mi></mrow><mrow><mn>1</mn></mrow></msub></mrow></math></span> and <span><math><mrow><mi>q</mi><mo>,</mo><msub><mrow><mi>o</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math></span>. We formalise such concept and discuss its advantages with respect to similarity quantifications, namely the efficiency, robustness and scalability with respect to the dataset size. Our pioneering implementation of the relational similarity search for the Euclidean and Cosine spaces demonstrates robust filtering power and efficiency compared to several contemporary techniques.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"122 ","pages":"Article 102345"},"PeriodicalIF":3.7,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000036/pdfft?md5=02857cd176b247b381941578e10c094d&pid=1-s2.0-S0306437924000036-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139510378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artem Polyvyanyy , Arthur H.M. ter Hofstede , Marcello La Rosa , Chun Ouyang , Anastasiia Pika
{"title":"Process Query Language: Design, Implementation, and Evaluation","authors":"Artem Polyvyanyy , Arthur H.M. ter Hofstede , Marcello La Rosa , Chun Ouyang , Anastasiia Pika","doi":"10.1016/j.is.2023.102337","DOIUrl":"10.1016/j.is.2023.102337","url":null,"abstract":"<div><p>Organizations can benefit from the use of practices, techniques, and tools from the area of business process management. Through the focus on processes, they create process models that require management, including support for versioning, refactoring and querying. Querying thus far has primarily focused on structural properties of models rather than on exploiting behavioral properties capturing aspects of model execution. While the latter is more challenging, it is also more effective, especially when models are used for auditing or process automation. The focus of this paper is to overcome the challenges associated with behavioral querying of process models in order to unlock its benefits. The first challenge concerns determining decidability of the building blocks of the query language, which are the possible behavioral relations between process tasks. The second challenge concerns achieving acceptable performance of query evaluation. The evaluation of a query may require expensive checks in all process models, of which there may be thousands. In light of these challenges, this paper proposes a special-purpose programming language, namely Process Query Language (PQL) for behavioral querying of process model collections. The language relies on a set of behavioral predicates between process tasks, whose usefulness has been empirically evaluated with a pool of process model stakeholders. This study resulted in a selection of the predicates to be implemented in PQL, whose decidability has also been formally proven. The computational performance of the language has been extensively evaluated through a set of experiments against two large process model collections.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"122 ","pages":"Article 102337"},"PeriodicalIF":3.7,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437923001734/pdfft?md5=bf63d7c9889b99ccdd113784876bd7b9&pid=1-s2.0-S0306437923001734-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139101888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers","authors":"Marco Siino, Ilenia Tinnirello, Marco La Cascia","doi":"10.1016/j.is.2023.102342","DOIUrl":"10.1016/j.is.2023.102342","url":null,"abstract":"<div><p>With the advent of the modern pre-trained Transformers, the text preprocessing has started to be neglected and not specifically addressed in recent NLP literature. However, both from a linguistic and from a computer science point of view, we believe that even when using modern Transformers, text preprocessing can significantly impact on the performance of a classification model. We want to investigate and compare, through this study, how preprocessing impacts on the Text Classification (TC) performance of modern and traditional classification models. We report and discuss the preprocessing techniques found in the literature and their most recent variants or applications to address TC tasks in different domains. In order to assess how much the preprocessing affects classification performance, we apply the three top referenced preprocessing techniques (alone or in combination) to four publicly available datasets from different domains. Then, nine machine learning models – including modern Transformers – get the preprocessed text as input. The results presented show that an educated choice on the text preprocessing strategy to employ should be based on the task as well as on the model considered. Outcomes in this survey show that choosing the best preprocessing technique – in place of the worst – can significantly improve accuracy on the classification (up to 25%, as in the case of an XLNet on the IMDB dataset). In some cases, by means of a suitable preprocessing strategy, even a simple Naïve Bayes classifier proved to outperform (i.e., by 2% in accuracy) the best performing Transformer. We found that Transformers and traditional models exhibit a higher impact of the preprocessing on the TC performance. Our main findings are: (1) also on modern pre-trained language models, preprocessing can affect performance, depending on the datasets and on the preprocessing technique or combination of techniques used, (2) in some cases, using a proper preprocessing strategy, simple models can outperform Transformers on TC tasks, (3) similar classes of models exhibit similar level of sensitivity to text preprocessing.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102342"},"PeriodicalIF":3.7,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437923001783/pdfft?md5=f6a37c2a5b264959fc055b2613fb321e&pid=1-s2.0-S0306437923001783-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139024022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HubHSP graph: Capturing local geometrical and statistical data properties via spanning graphs","authors":"Stephane Marchand-Maillet , Edgar Chávez","doi":"10.1016/j.is.2023.102341","DOIUrl":"10.1016/j.is.2023.102341","url":null,"abstract":"<div><p>The computation of a continuous generative model to describe a finite sample of an infinite metric space can prove challenging and lead to erroneous hypothesis, particularly in high-dimensional spaces. In this paper, we follow a different route and define the Hubness Half Space Partitioning graph (HubHSP graph). By constructing this spanning graph over the dataset, we can capture both the geometrical and statistical properties of the data without resorting to any continuity assumption. Leveraging the classical graph-theoretic apparatus, the HubHSP graph facilitates critical operations, including the creation of a representative sample of the original dataset, without relying on density estimation. This representative subsample is essential for a range of operations, including indexing, visualization, and machine learning tasks such as clustering or inductive learning. With the HubHSP graph, we can bypass the limitations of traditional methods and obtain a holistic understanding of our dataset’s properties, enabling us to unlock its full potential.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102341"},"PeriodicalIF":3.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437923001771/pdfft?md5=fc0eac6dd447ca16f10189821d083444&pid=1-s2.0-S0306437923001771-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139017802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HubHSP graph: Capturing local geometrical and statistical data properties via spanning graphs","authors":"Stephane Marchand-Maillet, Edgar Chávez","doi":"10.1016/j.is.2023.102341","DOIUrl":"https://doi.org/10.1016/j.is.2023.102341","url":null,"abstract":"<p>The computation of a continuous generative model to describe a finite sample of an infinite metric space can prove challenging and lead to erroneous hypothesis, particularly in high-dimensional spaces. In this paper, we follow a different route and define the Hubness Half Space Partitioning graph (HubHSP graph). By constructing this spanning graph over the dataset, we can capture both the geometrical and statistical properties of the data without resorting to any continuity assumption. Leveraging the classical graph-theoretic apparatus, the HubHSP graph facilitates critical operations, including the creation of a representative sample of the original dataset, without relying on density estimation. This representative subsample is essential for a range of operations, including indexing, visualization, and machine learning tasks such as clustering or inductive learning. With the HubHSP graph, we can bypass the limitations of traditional methods and obtain a holistic understanding of our dataset’s properties, enabling us to unlock its full potential.</p>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"30 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139023963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Martínez-Rojas , A. Jiménez-Ramírez , J.G. Enríquez , H.A. Reijers
{"title":"A screenshot-based task mining framework for disclosing the drivers behind variable human actions","authors":"A. Martínez-Rojas , A. Jiménez-Ramírez , J.G. Enríquez , H.A. Reijers","doi":"10.1016/j.is.2023.102340","DOIUrl":"10.1016/j.is.2023.102340","url":null,"abstract":"<div><p>Robotic Process Automation (RPA) enables subject matter experts to use the graphical user interface as a means to automate and integrate systems. This is a fast method to automate repetitive, mundane tasks. To avoid constructing a software robot from scratch, Task Mining approaches can be used to monitor human behavior through a series of timestamped events, such as mouse clicks and keystrokes. From a so-called User Interface log (UI Log), it is possible to automatically discover the process model behind this behavior. However, when the discovered process model shows different process variants, it is hard to determine what drives a human’s decision to execute one variant over the other. Existing approaches do analyze the UI Log in search for the underlying rules, but neglect what can be seen on the screen. As a result, a major part of the human decision-making remains hidden. To address this gap, this paper describes a Task Mining framework that uses the screenshot of each event in the UI Log as an additional source of information. From such an enriched UI Log, by using image-processing techniques and Machine Learning algorithms, a decision tree is created, which offers a more complete explanation of the human decision-making process. The presented framework can express the decision tree graphically, explicitly identifying which elements in the screenshots are relevant to make the decision. The framework has been evaluated through a case study that involves a process with real-life screenshots. The results indicate a satisfactorily high accuracy of the overall approach, even if a small UI Log is used. The evaluation also identifies challenges for applying the framework in a real-life setting when a high density of interface elements is present.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102340"},"PeriodicalIF":3.7,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S030643792300176X/pdfft?md5=595e70f04b75d2dca939507ee4f713af&pid=1-s2.0-S030643792300176X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139018882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Martínez-Rojas, A. Jiménez-Ramírez, J.G. Enríquez, H.A. Reijers
{"title":"A screenshot-based task mining framework for disclosing the drivers behind variable human actions","authors":"A. Martínez-Rojas, A. Jiménez-Ramírez, J.G. Enríquez, H.A. Reijers","doi":"10.1016/j.is.2023.102340","DOIUrl":"https://doi.org/10.1016/j.is.2023.102340","url":null,"abstract":"<p>Robotic Process Automation (RPA) enables subject matter experts to use the graphical user interface as a means to automate and integrate systems. This is a fast method to automate repetitive, mundane tasks. To avoid constructing a software robot from scratch, Task Mining approaches can be used to monitor human behavior through a series of timestamped events, such as mouse clicks and keystrokes. From a so-called User Interface log (UI Log), it is possible to automatically discover the process model behind this behavior. However, when the discovered process model shows different process variants, it is hard to determine what drives a human’s decision to execute one variant over the other. Existing approaches do analyze the UI Log in search for the underlying rules, but neglect what can be seen on the screen. As a result, a major part of the human decision-making remains hidden. To address this gap, this paper describes a Task Mining framework that uses the screenshot of each event in the UI Log as an additional source of information. From such an enriched UI Log, by using image-processing techniques and Machine Learning algorithms, a decision tree is created, which offers a more complete explanation of the human decision-making process. The presented framework can express the decision tree graphically, explicitly identifying which elements in the screenshots are relevant to make the decision. The framework has been evaluated through a case study that involves a process with real-life screenshots. The results indicate a satisfactorily high accuracy of the overall approach, even if a small UI Log is used. The evaluation also identifies challenges for applying the framework in a real-life setting when a high density of interface elements is present.</p>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"74 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139023993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tijs Slaats , Søren Debois , Christoffer Olling Back , Axel Kjeld Fjelrad Christfort
{"title":"Foundations and practice of binary process discovery","authors":"Tijs Slaats , Søren Debois , Christoffer Olling Back , Axel Kjeld Fjelrad Christfort","doi":"10.1016/j.is.2023.102339","DOIUrl":"10.1016/j.is.2023.102339","url":null,"abstract":"<div><p>Most contemporary process discovery methods take as inputs only <em>positive</em> examples of process executions, and so they are <em>one-class classification</em> algorithms. However, we have found <em>negative</em> examples to also be available in industry, hence we build on earlier work that treats process discovery as a <em>binary classification</em> problem. This approach opens the door to many well-established methods and metrics from machine learning, in particular to improve the distinction between what should and should not be allowed by the output model. Concretely, we (1) present a verified formalisation of process discovery as a binary classification problem; (2) provide cases with negative examples from industry, including real-life logs; (3) propose the Rejection Miner binary classification procedure, applicable to any process notation that has a suitable syntactic composition operator; (4) implement two concrete binary miners, one outputting Declare patterns, the other Dynamic Condition Response (DCR) graphs; and (5) apply these miners to real world and synthetic logs obtained from our industry partners and the process discovery contest, showing increased output model quality in terms of accuracy and model size.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102339"},"PeriodicalIF":3.7,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437923001758/pdfft?md5=f2bf1fcd001426b54f1d43f5ac2ad3d9&pid=1-s2.0-S0306437923001758-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139024055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}