{"title":"GAMA: A multi-graph-based anomaly detection framework for business processes via graph neural networks","authors":"Wei Guan, Jian Cao, Yang Gu, Shiyou Qian","doi":"10.1016/j.is.2024.102405","DOIUrl":"https://doi.org/10.1016/j.is.2024.102405","url":null,"abstract":"<div><p>Anomalies in business processes are inevitable for various reasons such as system failures and operator errors. Detecting anomalies is important for the management and optimization of business processes. However, prevailing anomaly detection approaches often fail to capture crucial structural information about the underlying process. To address this, we propose a multi-Graph based Anomaly detection fraMework for business processes via grAph neural networks, named GAMA. GAMA makes use of structural process information and attribute information in a more integrated way. In GAMA, multiple graphs are applied to model a trace in which each attribute is modeled as a separate graph. In particular, the graph constructed for the special attribute <em>activity</em> reflects the control flow. Then GAMA employs a multi-graph encoder and a multi-sequence decoder on multiple graphs to detect anomalies in terms of the reconstruction errors. Moreover, three teacher forcing styles are designed to enhance GAMA’s ability to reconstruct normal behaviors and thus improve detection performance. We conduct extensive experiments on both synthetic logs and real-life logs. The experiment results demonstrate that GAMA outperforms state-of-the-art methods for both trace-level and attribute-level anomaly detection.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102405"},"PeriodicalIF":3.7,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141083465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Quijada-Fuentes , M. Andrea Rodríguez , Diego Seco
{"title":"TRGST: An enhanced generalized suffix tree for topological relations between paths","authors":"Carlos Quijada-Fuentes , M. Andrea Rodríguez , Diego Seco","doi":"10.1016/j.is.2024.102406","DOIUrl":"10.1016/j.is.2024.102406","url":null,"abstract":"<div><p>This paper introduces the <em>TRGST</em> data structure, which is designed to handle queries related to topological relations between paths represented as sequences of stops in a network. As an example, these paths could correspond to stops on a public transport network, and a query of interest is to retrieve paths that share at least <span><math><mi>k</mi></math></span> consecutive stops. While topological relations among spatial objects have received extensive attention, the efficient processing of these relations in the context of trajectory paths, considering both time and space efficiency, remains a relatively less explored domain. Taking inspiration from pattern matching implementations, the <em>TRGST</em> data structure is constructed on the foundation of the Generalized Suffix Tree. Its purpose is to provide a compact representation of a set of paths and to efficiently handle topological relation queries by leveraging the pattern search capabilities inherent in this structure. The paper provides a detailed account of the structure and algorithms of <em>TRGST</em>, followed by a performance analysis utilizing both real and synthetic data. The results underscore the remarkable scalability of the <em>TRGST</em> in terms of both query time and space utilization.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102406"},"PeriodicalIF":3.7,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141144791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MBDL: Exploring dynamic dependency among various types of behaviors for recommendation","authors":"Hang Zhang, Mingxin Gan","doi":"10.1016/j.is.2024.102407","DOIUrl":"10.1016/j.is.2024.102407","url":null,"abstract":"<div><p>Users have various behaviors on items, including <em>page view</em>, <em>tag-as-favorite</em>, <em>add-to-cart</em>, and <em>purchase</em> in online shopping platforms. These various types of behaviors reflect users’ different intentions, which also help learn their preferences on items in a recommender system. Although some multi-behavior recommendation methods have been proposed, two significant challenges have not been widely noticed: (i) capturing heterogeneous and dynamic preferences of users simultaneously from different types of behaviors; (ii) modeling the dynamic dependency among various types of behaviors. To overcome the above challenges, we propose a novel multi-behavior dynamic dependency learning method (MBDL) to explore the heterogeneity and dependency among various types of behavior sequences for recommendation. In brief, MBDL first uses a dual-channel interest encoder to learn the long-term interest representations and the evolution of short-term interests from the behavior-aware item sequences. Then, MBDL adopts a contrastive learning method to preserve the consistency of user’s long-term behavioral patterns, and a multi-head attention network to capture the dynamic dependency among short-term interactive behaviors. Finally, MBDL adaptively integrates the influence of long- and short-term interests to predict future user–item interactions. Experiments on two real-world datasets show that the proposed MBDL method outperforms state-of-the-art methods significantly on recommendation accuracy. Further ablation studies demonstrate the effectiveness of our model and the benefits of learning dynamic dependency among types of behaviors.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102407"},"PeriodicalIF":3.7,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141143297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Storage Management with Multi-Version Partitioned BTrees","authors":"Christian Riegger, Ilia Petrov","doi":"10.1016/j.is.2024.102403","DOIUrl":"https://doi.org/10.1016/j.is.2024.102403","url":null,"abstract":"<div><p>Modern persistent Key/Value-Stores operate on updatable datasets — massively exceeding the size of available main memory. Tree-based key/value storage management structures became particularly popular in storage engines. B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees allow constant search performance, however write-heavy workloads yield inefficient write patterns to secondary storage devices and poor performance characteristics. LSM-Trees overcome this issue by horizontal partitioning fractions of data — small enough to fully reside in main memory, but require frequent maintenance to sustain search performance.</p><p>To this end, firstly, we propose Multi-Version Partitioned BTrees (MV-PBT) as sole storage and index management structure in key-sorted storage engines like Key/Value-Stores. Secondly, we compare MV-PBT against LSM-Trees. The logical horizontal partitioning in MV-PBT allows leveraging recent advances in modern B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Tree techniques in a small transparent and memory resident portion of the structure. Structural properties sustain steady read performance, even on historical data, and yield efficient write patterns as well as reduced write-amplification.</p><p>We integrate MV-PBT in the WiredTiger key/value storage engine. MV-PBT offers an up to 2x increased steady throughput in comparison to LSM-Trees and several orders of magnitude in comparison to B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees in a YCSB workload. Moreover, MV-PBT exhibits robust time-travel query performance and outperforms LSM-Trees by 20% and B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees by an order of magnitude.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102403"},"PeriodicalIF":3.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000619/pdfft?md5=cd0642883c73bb282d5d3104ee04d813&pid=1-s2.0-S0306437924000619-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141294465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognizing task-level events from user interaction data","authors":"Adrian Rebmann , Han van der Aa","doi":"10.1016/j.is.2024.102404","DOIUrl":"10.1016/j.is.2024.102404","url":null,"abstract":"<div><p>User interaction data comprises events that capture individual actions that a user performs on their computer. Such events provide detailed records about how users carry out their tasks in a process, even when this involves different applications. Although the comprehensiveness of such data provides a promising basis for process mining, user interaction events cannot be used directly for this purpose, because they do not meet two essential requirements. In particular, they neither indicate their relation to a process-level activity nor their relation to a specific process execution. Therefore, user interaction data needs to be transformed so that it meets these requirements before process mining techniques can be applied. This transformation problem comprises identifying tasks and their types and determining the relation between tasks and process executions. While some existing approaches tackle parts of this problem, none address it comprehensively. Therefore, we propose an unsupervised approach for recognizing task-level events from user interaction data that addresses it in full. It segments user interaction data to identify tasks, categorizes these according to their type, and relates tasks to each other via object instances it extracts from the user interaction events. In this manner, our approach creates task-level events that meet the requirements of process mining settings. Our evaluation demonstrates the approach’s efficacy and shows that its combined consideration of control-flow, data, and semantic information allows it to outperform baseline approaches in both online and offline settings.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102404"},"PeriodicalIF":3.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000620/pdfft?md5=6b076d025b548fc182dc1f86d4b2885e&pid=1-s2.0-S0306437924000620-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141037429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhi Wang , Hancong Duan , Yamin Cheng , Geyong Min
{"title":"Learning complex predicates for cardinality estimation using recursive neural networks","authors":"Zhi Wang , Hancong Duan , Yamin Cheng , Geyong Min","doi":"10.1016/j.is.2024.102402","DOIUrl":"10.1016/j.is.2024.102402","url":null,"abstract":"<div><p>Cardinality estimation is one of the most vital components in the query optimizer, which has been extensively studied recently. On one hand, traditional cardinality estimators, such as histograms and sampling methods, struggle to capture the correlations between multiple tables. On the other hand, current learning-based methods still suffer from the feature extraction of complex predicates and join relations, which will lead to inaccurate cost estimation, eventually a sub-optimal execution plan. To address these challenges, we present a novel end-to-end architecture leveraging deep learning to provide high-quality cardinality estimation. We exploit an effective feature extraction technique, which can fully make use of the structure of tables, join conditions and predicates. Besides, we use sampling-based technique to construct sample bitmaps for the tables and join conditions respectively. We also utilize the characteristics of predicate tree combined with recursive neural network to extract deep-level features of complex predicates. Finally, we embed these feature vectors into the model, which consists of three components: a recursive neural network, a graph convolutional neural network (GCN) and a multi-set convolutional neural network, to obtain the estimated cardinality. Extensive results conducted on real-world workloads demonstrate that our approach can achieve significant improvement in accuracy and be extended to queries with complex semantics.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102402"},"PeriodicalIF":3.7,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141048672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BF-BigGraph: An efficient subgraph isomorphism approach using machine learning for big graph databases","authors":"Adnan Yazici , Ezgi Taşkomaz","doi":"10.1016/j.is.2024.102401","DOIUrl":"10.1016/j.is.2024.102401","url":null,"abstract":"<div><p>Graph databases are flexible NoSQL databases used to efficiently store and query complex and big data. One of the most difficult problems in graph databases is the problem of subgraph isomorphism, which involves finding a matching pattern in a given graph. Subgraph isomorphism algorithms generally encounter problems in the efficient processing of complex queries based on a lack of pruning methods and the use of a matching order. In this study, we present a new subgraph isomorphism approach based on the best-first search design strategy and name it BF-BigGraph. Our approach includes a machine learning technique to efficiently find the best matching order for various complex queries. The parameters we used in our approach as heuristics to improve the performance of complex queries on graph-based NoSQL databases are database volatility, database size, type of query, and the size of the query. We utilized the Random Forest machine learning method to narrow candidate nodes to a higher level of search and effectively reduce the search space for efficient querying and retrieval. We compared BF-BigGraph with state-of-the-art approaches, namely BB-Graph, Neo4j’s Cypher, DualIso, GraphQL, TurboIso, and VF3 using publicly available databases including undirected graphs; WorldCup, Pokec, Youtube, and a big graph database of a real demographic application (a population database) with approximately 70 million nodes of a big directed graph. The performance results of our approach for different types of complex queries on all these databases are significantly better in terms of computation time and required memory than other competing approaches in the literature.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102401"},"PeriodicalIF":3.7,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141050700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Ritter , Fredrik Nordvall Forsberg , Stefanie Rinderle-Ma
{"title":"Responsible composition and optimization of integration processes under correctness preserving guarantees","authors":"Daniel Ritter , Fredrik Nordvall Forsberg , Stefanie Rinderle-Ma","doi":"10.1016/j.is.2024.102400","DOIUrl":"https://doi.org/10.1016/j.is.2024.102400","url":null,"abstract":"<div><p>Enterprise Application Integration deals with the problem of connecting heterogeneous applications, and is the centerpiece of current on-premise, cloud and device integration scenarios. For integration scenarios, structurally correct composition of patterns into processes and improvements of integration processes are crucial. In order to achieve this, we formalize compositions of integration patterns based on their characteristics, and describe optimization strategies that help to reduce the model complexity, and improve the process execution efficiency using design time techniques. Using the formalism of timed DB-nets – a refinement of Petri nets – we model integration logic features such as control- and data flow, transactional data storage, compensation and exception handling, and time aspects that are present in reoccurring solutions as separate integration patterns. We then propose a realization of optimization strategies using graph rewriting, and prove that the optimizations we consider preserve both structural and functional correctness. We evaluate the improvements on a real-world catalog of pattern compositions, containing over 900 integration processes, and illustrate the correctness properties in case studies based on two of these processes.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102400"},"PeriodicalIF":3.7,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140824326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Special Issue with Best Papers from ICPM 2022","authors":"Andrea Burattin, Artem Polyvyanyy, Barbara Weber","doi":"10.1016/j.is.2024.102389","DOIUrl":"10.1016/j.is.2024.102389","url":null,"abstract":"","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102389"},"PeriodicalIF":3.7,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140778395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao You , Qiaodong Dang , Qing Li , Peng Zhang , Guanzhong Wu , Wei Huang
{"title":"TransLSTD: Augmenting hierarchical disease risk prediction model with time and context awareness via disease clustering","authors":"Tao You , Qiaodong Dang , Qing Li , Peng Zhang , Guanzhong Wu , Wei Huang","doi":"10.1016/j.is.2024.102390","DOIUrl":"https://doi.org/10.1016/j.is.2024.102390","url":null,"abstract":"<div><p>The use of electronic health records has become widespread, providing a valuable source of information for predicting disease risk. While deep neural network models have been proposed and shown to be effective in this task, supplemented with medical domain knowledge for interpretability, several limitations still exist. Firstly, there is often a lack of differentiation between chronic and acute diseases leading to biased modeling of diseases. Secondly, the extraction of patient single-layer temporal patterns is limited, which hinders comprehensive representation and predictive power. Thirdly, weak interpretability based on deep neural networks prevents the extraction of valuable medical knowledge, limiting practical applications. To overcome these challenges, we propose TransLSTD, a hierarchical model that incorporates time awareness and context awareness while distinguishing between long-term and short-term diseases. TransLSTD uses clustering algorithms to classify disease types based on the occurrence feature matrix of diseases from EHR dataset and updates disease representation at the code level while creating patient visit embeddings. The model utilizes query vectors to incorporate visit context information and combines time data to capture the patient’s overall health status. Finally, the prediction module generates outcomes and provides effective interpretations. We demonstrate the effectiveness of TransLSTD using two real-world datasets, outperforming state-of-the-art models in terms of both AUC and F1 values. The data and code are released at <span>https://github.com/DangQD/TransLSTD-master</span><svg><path></path></svg>.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102390"},"PeriodicalIF":3.7,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140644348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}