Yaojun Hao , Haotian Wang , Qingshan Zhao , Liping Feng , Jian Wang
{"title":"Detecting the adversarially-learned injection attacks via knowledge graphs","authors":"Yaojun Hao , Haotian Wang , Qingshan Zhao , Liping Feng , Jian Wang","doi":"10.1016/j.is.2024.102419","DOIUrl":"https://doi.org/10.1016/j.is.2024.102419","url":null,"abstract":"<div><p>Over the past two decades, many studies have devoted a good deal of attention to detect injection attacks in recommender systems. However, most of the studies mainly focus on detecting the heuristically-generated injection attacks, which are heuristically fabricated by hand-engineering. In practice, the adversarially-learned injection attacks have been proposed based on optimization methods and enhanced the ability in the camouflage and threat. Under the adversarially-learned injection attacks, the traditional detection models are likely to be fooled. In this paper, a detection method is proposed for the adversarially-learned injection attacks via knowledge graphs. Firstly, with the advantages of wealth information from knowledge graphs, item-pairs on the extension hops of knowledge graphs are regarded as the implicit preferences for users. Also, the item-pair popularity series and user item-pair matrix are constructed to express the user's preferences. Secondly, the word embedding model and principal component analysis are utilized to extract the user's initial vector representations from the item-pair popularity series and item-pair matrix, respectively. Moreover, the Variational Autoencoders with the improved R-drop regularization are used to reconstruct the embedding vectors and further identify the shilling profiles. Finally, the experiments on three real-world datasets indicate that the proposed detector has superior performance than benchmark methods when detecting the adversarially-learned injection attacks. In addition, the detector is evaluated under the heuristically-generated injection attacks and demonstrates the outstanding performance.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102419"},"PeriodicalIF":3.7,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141325033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaolin Han , Tobias Grubenmann , Chenhao Ma , Xiaodong Li , Wenya Sun , Sze Chun Wong , Xuequn Shang , Reynold Cheng
{"title":"FDM: Effective and efficient incident detection on sparse trajectory data","authors":"Xiaolin Han , Tobias Grubenmann , Chenhao Ma , Xiaodong Li , Wenya Sun , Sze Chun Wong , Xuequn Shang , Reynold Cheng","doi":"10.1016/j.is.2024.102418","DOIUrl":"10.1016/j.is.2024.102418","url":null,"abstract":"<div><p>Incident detection (ID), or the automatic discovery of anomalies from road traffic data (e.g., road sensor and GPS data), enables emergency actions (e.g., rescuing injured people) to be carried out in a timely fashion. Existing ID solutions based on data mining or machine learning often rely on <em>dense</em> traffic data; for instance, sensors installed in highways provide frequent updates of road information. In this paper, we ask the question: can ID be performed on <em>sparse</em> traffic data (e.g., location data obtained from GPS devices equipped on vehicles)? As these data may not be enough to describe the state of the roads involved, they can undermine the effectiveness of existing ID solutions. To tackle this challenge, we borrow an important insight from the transportation area, which uses trajectories (i.e., moving histories of vehicles) to derive <em>incident patterns</em>. We study how to obtain incident patterns from trajectories and devise a new solution (called <u>F</u>ilter-<u>D</u>iscovery-<u>M</u>atch (<strong>FDM</strong>)) to detect anomalies in sparse traffic data. We have also developed a fast algorithm to support FDM. Experiments on a taxi dataset in Hong Kong and a simulated dataset show that FDM is more effective than state-of-the-art ID solutions on sparse traffic data, and is also efficient.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102418"},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141278964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Entity Resolution with a hybrid Active Machine Learning framework: Strategies for optimal learning in sparse datasets","authors":"Mourad Jabrane , Hiba Tabbaa , Aissam Hadri , Imad Hafidi","doi":"10.1016/j.is.2024.102410","DOIUrl":"10.1016/j.is.2024.102410","url":null,"abstract":"<div><p>When solving the problem of identifying similar records in different datasets (known as Entity Resolution or ER), one big challenge is the lack of enough labeled data. Which is crucial for building strong machine learning models, but getting this data can be expensive and time-consuming. Active Machine Learning (ActiveML) is a helpful approach because it cleverly picks the most useful pieces of data to learn from. It uses two main ideas: informativeness and representativeness. Typical ActiveML methods used in ER usually depend too much on just one of these ideas, which can make them less effective, especially when starting with very little data. Our research introduces a new combined method that uses both ideas together. We created two versions of this method, called DPQ and STQ, and tested them on eleven different real-world datasets. The results showed that our new method improves ER by producing better scores, more stable models, and faster learning with less training data compared to existing methods.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102410"},"PeriodicalIF":3.7,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Di Gennaro , Claudia Greco , Amedeo Buonanno , Marialucia Cuciniello , Terry Amorese , Maria Santina Ler , Gennaro Cordasco , Francesco A.N. Palmieri , Anna Esposito
{"title":"HUM-CARD: A human crowded annotated real dataset","authors":"Giovanni Di Gennaro , Claudia Greco , Amedeo Buonanno , Marialucia Cuciniello , Terry Amorese , Maria Santina Ler , Gennaro Cordasco , Francesco A.N. Palmieri , Anna Esposito","doi":"10.1016/j.is.2024.102409","DOIUrl":"10.1016/j.is.2024.102409","url":null,"abstract":"<div><p>The growth of data-driven approaches typical of Machine Learning leads to an ever-increasing need for large quantities of labeled data. Unfortunately, these attributions are often made automatically and/or crudely, thus destroying the very concept of “ground truth” they are supposed to represent. To address this problem, we introduce HUM-CARD, a dataset of human trajectories in crowded contexts manually annotated by nine experts in engineering and psychology, totaling approximately <span><math><mrow><mn>5000</mn></mrow></math></span> hours. Our multidisciplinary labeling process has enabled the creation of a well-structured ontology, accounting for both individual and contextual factors influencing human movement dynamics in shared environments. Preliminary and descriptive analyzes are presented, highlighting the potential benefits of this dataset and its methodology in various research challenges.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102409"},"PeriodicalIF":3.7,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S030643792400067X/pdfft?md5=e81bccaabf431209b490556bb4e67c4b&pid=1-s2.0-S030643792400067X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141138482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huiting Ma , Dengao Li , Jian Fu , Guiji Zhao , Jumin Zhao
{"title":"Heart failure prognosis prediction: Let’s start with the MDL-HFP model","authors":"Huiting Ma , Dengao Li , Jian Fu , Guiji Zhao , Jumin Zhao","doi":"10.1016/j.is.2024.102408","DOIUrl":"10.1016/j.is.2024.102408","url":null,"abstract":"<div><p>Heart failure, as a critical symptom or terminal stage of assorted heart diseases, is a world-class public health problem. Establishing a prognostic model can help identify high dangerous patients, save their lives promptly, and reduce medical burden. Although integrating structured indicators and unstructured text for complementary information has been proven effective in disease prediction tasks, there are still certain limitations. Firstly, the processing of single branch modes is easily overlooked, which can affect the final fusion result. Secondly, simple fusion will lose complementary information between modalities, limiting the network’s learning ability. Thirdly, incomplete interpretability can affect the practical application and development of the model. To overcome these challenges, this paper proposes the MDL-HFP multimodal model for predicting patient prognosis using the MIMIC-III public database. Firstly, the ADASYN algorithm is used to handle the imbalance of data categories. Then, the proposed improved Deep&Cross Network is used for automatic feature selection to encode structured sparse features, and implicit graph structure information is introduced to encode unstructured clinical notes based on the HR-BGCN model. Finally, the information of the two modalities is fused through a cross-modal dynamic interaction layer. By comparing multiple advanced multimodal deep learning models, the model’s effectiveness is verified, with an average F1 score of 90.42% and an average accuracy of 90.70%. The model proposed in this paper can accurately classify the readmission status of patients, thereby assisting doctors in making judgments and improving the patient’s prognosis. Further visual analysis demonstrates the usability of the model, providing a comprehensive explanation for clinical decision-making.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102408"},"PeriodicalIF":3.7,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141137614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GAMA: A multi-graph-based anomaly detection framework for business processes via graph neural networks","authors":"Wei Guan, Jian Cao, Yang Gu, Shiyou Qian","doi":"10.1016/j.is.2024.102405","DOIUrl":"https://doi.org/10.1016/j.is.2024.102405","url":null,"abstract":"<div><p>Anomalies in business processes are inevitable for various reasons such as system failures and operator errors. Detecting anomalies is important for the management and optimization of business processes. However, prevailing anomaly detection approaches often fail to capture crucial structural information about the underlying process. To address this, we propose a multi-Graph based Anomaly detection fraMework for business processes via grAph neural networks, named GAMA. GAMA makes use of structural process information and attribute information in a more integrated way. In GAMA, multiple graphs are applied to model a trace in which each attribute is modeled as a separate graph. In particular, the graph constructed for the special attribute <em>activity</em> reflects the control flow. Then GAMA employs a multi-graph encoder and a multi-sequence decoder on multiple graphs to detect anomalies in terms of the reconstruction errors. Moreover, three teacher forcing styles are designed to enhance GAMA’s ability to reconstruct normal behaviors and thus improve detection performance. We conduct extensive experiments on both synthetic logs and real-life logs. The experiment results demonstrate that GAMA outperforms state-of-the-art methods for both trace-level and attribute-level anomaly detection.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102405"},"PeriodicalIF":3.7,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141083465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Quijada-Fuentes , M. Andrea Rodríguez , Diego Seco
{"title":"TRGST: An enhanced generalized suffix tree for topological relations between paths","authors":"Carlos Quijada-Fuentes , M. Andrea Rodríguez , Diego Seco","doi":"10.1016/j.is.2024.102406","DOIUrl":"10.1016/j.is.2024.102406","url":null,"abstract":"<div><p>This paper introduces the <em>TRGST</em> data structure, which is designed to handle queries related to topological relations between paths represented as sequences of stops in a network. As an example, these paths could correspond to stops on a public transport network, and a query of interest is to retrieve paths that share at least <span><math><mi>k</mi></math></span> consecutive stops. While topological relations among spatial objects have received extensive attention, the efficient processing of these relations in the context of trajectory paths, considering both time and space efficiency, remains a relatively less explored domain. Taking inspiration from pattern matching implementations, the <em>TRGST</em> data structure is constructed on the foundation of the Generalized Suffix Tree. Its purpose is to provide a compact representation of a set of paths and to efficiently handle topological relation queries by leveraging the pattern search capabilities inherent in this structure. The paper provides a detailed account of the structure and algorithms of <em>TRGST</em>, followed by a performance analysis utilizing both real and synthetic data. The results underscore the remarkable scalability of the <em>TRGST</em> in terms of both query time and space utilization.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102406"},"PeriodicalIF":3.7,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141144791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MBDL: Exploring dynamic dependency among various types of behaviors for recommendation","authors":"Hang Zhang, Mingxin Gan","doi":"10.1016/j.is.2024.102407","DOIUrl":"10.1016/j.is.2024.102407","url":null,"abstract":"<div><p>Users have various behaviors on items, including <em>page view</em>, <em>tag-as-favorite</em>, <em>add-to-cart</em>, and <em>purchase</em> in online shopping platforms. These various types of behaviors reflect users’ different intentions, which also help learn their preferences on items in a recommender system. Although some multi-behavior recommendation methods have been proposed, two significant challenges have not been widely noticed: (i) capturing heterogeneous and dynamic preferences of users simultaneously from different types of behaviors; (ii) modeling the dynamic dependency among various types of behaviors. To overcome the above challenges, we propose a novel multi-behavior dynamic dependency learning method (MBDL) to explore the heterogeneity and dependency among various types of behavior sequences for recommendation. In brief, MBDL first uses a dual-channel interest encoder to learn the long-term interest representations and the evolution of short-term interests from the behavior-aware item sequences. Then, MBDL adopts a contrastive learning method to preserve the consistency of user’s long-term behavioral patterns, and a multi-head attention network to capture the dynamic dependency among short-term interactive behaviors. Finally, MBDL adaptively integrates the influence of long- and short-term interests to predict future user–item interactions. Experiments on two real-world datasets show that the proposed MBDL method outperforms state-of-the-art methods significantly on recommendation accuracy. Further ablation studies demonstrate the effectiveness of our model and the benefits of learning dynamic dependency among types of behaviors.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102407"},"PeriodicalIF":3.7,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141143297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Storage Management with Multi-Version Partitioned BTrees","authors":"Christian Riegger, Ilia Petrov","doi":"10.1016/j.is.2024.102403","DOIUrl":"https://doi.org/10.1016/j.is.2024.102403","url":null,"abstract":"<div><p>Modern persistent Key/Value-Stores operate on updatable datasets — massively exceeding the size of available main memory. Tree-based key/value storage management structures became particularly popular in storage engines. B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees allow constant search performance, however write-heavy workloads yield inefficient write patterns to secondary storage devices and poor performance characteristics. LSM-Trees overcome this issue by horizontal partitioning fractions of data — small enough to fully reside in main memory, but require frequent maintenance to sustain search performance.</p><p>To this end, firstly, we propose Multi-Version Partitioned BTrees (MV-PBT) as sole storage and index management structure in key-sorted storage engines like Key/Value-Stores. Secondly, we compare MV-PBT against LSM-Trees. The logical horizontal partitioning in MV-PBT allows leveraging recent advances in modern B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Tree techniques in a small transparent and memory resident portion of the structure. Structural properties sustain steady read performance, even on historical data, and yield efficient write patterns as well as reduced write-amplification.</p><p>We integrate MV-PBT in the WiredTiger key/value storage engine. MV-PBT offers an up to 2x increased steady throughput in comparison to LSM-Trees and several orders of magnitude in comparison to B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees in a YCSB workload. Moreover, MV-PBT exhibits robust time-travel query performance and outperforms LSM-Trees by 20% and B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees by an order of magnitude.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102403"},"PeriodicalIF":3.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000619/pdfft?md5=cd0642883c73bb282d5d3104ee04d813&pid=1-s2.0-S0306437924000619-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141294465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognizing task-level events from user interaction data","authors":"Adrian Rebmann , Han van der Aa","doi":"10.1016/j.is.2024.102404","DOIUrl":"10.1016/j.is.2024.102404","url":null,"abstract":"<div><p>User interaction data comprises events that capture individual actions that a user performs on their computer. Such events provide detailed records about how users carry out their tasks in a process, even when this involves different applications. Although the comprehensiveness of such data provides a promising basis for process mining, user interaction events cannot be used directly for this purpose, because they do not meet two essential requirements. In particular, they neither indicate their relation to a process-level activity nor their relation to a specific process execution. Therefore, user interaction data needs to be transformed so that it meets these requirements before process mining techniques can be applied. This transformation problem comprises identifying tasks and their types and determining the relation between tasks and process executions. While some existing approaches tackle parts of this problem, none address it comprehensively. Therefore, we propose an unsupervised approach for recognizing task-level events from user interaction data that addresses it in full. It segments user interaction data to identify tasks, categorizes these according to their type, and relates tasks to each other via object instances it extracts from the user interaction events. In this manner, our approach creates task-level events that meet the requirements of process mining settings. Our evaluation demonstrates the approach’s efficacy and shows that its combined consideration of control-flow, data, and semantic information allows it to outperform baseline approaches in both online and offline settings.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102404"},"PeriodicalIF":3.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000620/pdfft?md5=6b076d025b548fc182dc1f86d4b2885e&pid=1-s2.0-S0306437924000620-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141037429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}