Wei Jia , Ruizhe Ma , Weinan Niu , Li Yan , Zongmin Ma
{"title":"SFTe: Temporal knowledge graphs embedding for future interaction prediction","authors":"Wei Jia , Ruizhe Ma , Weinan Niu , Li Yan , Zongmin Ma","doi":"10.1016/j.is.2024.102423","DOIUrl":"10.1016/j.is.2024.102423","url":null,"abstract":"<div><p>Interaction prediction is a crucial task in the Social Internet of Things (SIoT), serving diverse applications including social network analysis and recommendation systems. However, the dynamic nature of items, users, and their interactions over time poses challenges in effectively capturing and analyzing these changes. Existing interaction prediction models often overlook the temporal aspect and lack the ability to model multi-relational user-item interactions over time. To address these limitations, in this paper, we propose a <strong>S</strong>tructure, <strong>F</strong>acticity, and <strong>T</strong>emporal information preservation <strong>e</strong>mbedding model (SFTe) to predict future interaction. Our model leverages the advantages of Temporal Knowledge Graphs (TKGs) that can capture both the multi-relations and evolution. We begin by modeling user-item interactions over time by constructing a Temporal Interaction Knowledge Graph (TIKG). We then employ Structure Embedding (SE), Facticity Embedding (FE), and Temporal Embedding (TE) to capture topological structure, facticity consistency, and temporal dependence, respectively. In SE, we focus on preserving the first-order relationships to capture the topological structure of TIKG. In the FE component, given the distinct nature of SIoT, we introduce an attention mechanism to capture the effect of entities with the same additional information for generating subgraph embeddings. Lastly, TE utilizes recurrent neural networks to model the temporal dependencies among subgraphs and capture the evolving dynamics of the interactions over time. Experimental results on standard future interaction prediction demonstrate the superiority of the SFTe model compared with the state-of-the-art methods. Our model effectively addresses the challenges of time-aware interaction prediction, showcasing the potential of TKGs to enhance prediction performance.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102423"},"PeriodicalIF":3.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dehua Liu , Selasi Kwashie , Yidi Zhang , Guangtong Zhou , Michael Bewong , Xiaoying Wu , Xi Guo , Keqing He , Zaiwen Feng
{"title":"An efficient approach for discovering Graph Entity Dependencies (GEDs)","authors":"Dehua Liu , Selasi Kwashie , Yidi Zhang , Guangtong Zhou , Michael Bewong , Xiaoying Wu , Xi Guo , Keqing He , Zaiwen Feng","doi":"10.1016/j.is.2024.102421","DOIUrl":"https://doi.org/10.1016/j.is.2024.102421","url":null,"abstract":"<div><p>Graph entity dependencies (GEDs) are novel graph constraints, unifying keys and functional dependencies, for property graphs. They have been found useful in many real-world data quality and data management tasks, including fact checking on social media networks and entity resolution. In this paper, we study the discovery problem of GEDs—finding a minimal cover of valid GEDs in a given graph data. We formalise the problem, and propose an effective and efficient approach to overcome major bottlenecks in GED discovery. In particular, we leverage existing graph partitioning algorithms to enable fast GED-scope discovery, and employ effective pruning strategies over the prohibitively large space of candidate dependencies. Furthermore, we define an interestingness measure for GEDs based on the minimum description length principle, to score and rank the mined cover set of GEDs. Finally, we demonstrate the scalability and effectiveness of our GED discovery approach through extensive experiments on real-world benchmark graph data sets; and present the usefulness of the discovered rules in different downstream data quality management applications.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102421"},"PeriodicalIF":3.0,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000796/pdfft?md5=8af2f9051185a5f57df5320cb4c1b7bd&pid=1-s2.0-S0306437924000796-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141583109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing workload trends for boosting triple stores performance","authors":"Ahmed Al-Ghezi, Lena Wiese","doi":"10.1016/j.is.2024.102420","DOIUrl":"10.1016/j.is.2024.102420","url":null,"abstract":"<div><p>The Resource Description Framework (RDF) is widely used to model web data. The scale and complexity of the modeled data emphasized performance challenges on the RDF-triple stores. Workload adaption is one important strategy to deal with those challenges on the storage level. Current workload-adaption approaches lack the necessary generalization of the problem and only optimize part of the storage layer with the workload (mostly the replication). This creates a big performance gap within other data structures (e.g. indexes and cache) that could heavily benefit from the same workload adaption strategy. Moreover, the workload statistics are built collectively in most of the current approaches. Thus, the analysis process is unaware of whether workloads’ items are old or recent. However, that does not simulate the temporal trends that exist naturally in user queries which causes the analysis process to lag behind the rapid workload development. We present a novel universal adaption approach to the storage management of a distributed RDF store. The system aims to find optimal data assignments to the different indexes, replications, and join cache within the limited storage space. We present a cost model based on the workload that often contains frequent patterns. The workload is dynamically and continuously analyzed to evaluate predefined rules considering the benefits and costs of all options of assigning data to the storage structures. The objective is to reduce query execution time by letting different data containers compete on the limited storage space. By modeling the workload statistics as time series, we can apply well-known smoothing techniques allowing the importance of the workload to decay over time. That allows the universal adaption to stay tuned with potential changes in the workload trends.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102420"},"PeriodicalIF":3.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000784/pdfft?md5=4a9d8f0acac2d10b05565ee129773c94&pid=1-s2.0-S0306437924000784-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141393476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaolin Han , Tobias Grubenmann , Chenhao Ma , Xiaodong Li , Wenya Sun , Sze Chun Wong , Xuequn Shang , Reynold Cheng
{"title":"FDM: Effective and efficient incident detection on sparse trajectory data","authors":"Xiaolin Han , Tobias Grubenmann , Chenhao Ma , Xiaodong Li , Wenya Sun , Sze Chun Wong , Xuequn Shang , Reynold Cheng","doi":"10.1016/j.is.2024.102418","DOIUrl":"10.1016/j.is.2024.102418","url":null,"abstract":"<div><p>Incident detection (ID), or the automatic discovery of anomalies from road traffic data (e.g., road sensor and GPS data), enables emergency actions (e.g., rescuing injured people) to be carried out in a timely fashion. Existing ID solutions based on data mining or machine learning often rely on <em>dense</em> traffic data; for instance, sensors installed in highways provide frequent updates of road information. In this paper, we ask the question: can ID be performed on <em>sparse</em> traffic data (e.g., location data obtained from GPS devices equipped on vehicles)? As these data may not be enough to describe the state of the roads involved, they can undermine the effectiveness of existing ID solutions. To tackle this challenge, we borrow an important insight from the transportation area, which uses trajectories (i.e., moving histories of vehicles) to derive <em>incident patterns</em>. We study how to obtain incident patterns from trajectories and devise a new solution (called <u>F</u>ilter-<u>D</u>iscovery-<u>M</u>atch (<strong>FDM</strong>)) to detect anomalies in sparse traffic data. We have also developed a fast algorithm to support FDM. Experiments on a taxi dataset in Hong Kong and a simulated dataset show that FDM is more effective than state-of-the-art ID solutions on sparse traffic data, and is also efficient.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102418"},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141278964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Entity Resolution with a hybrid Active Machine Learning framework: Strategies for optimal learning in sparse datasets","authors":"Mourad Jabrane , Hiba Tabbaa , Aissam Hadri , Imad Hafidi","doi":"10.1016/j.is.2024.102410","DOIUrl":"10.1016/j.is.2024.102410","url":null,"abstract":"<div><p>When solving the problem of identifying similar records in different datasets (known as Entity Resolution or ER), one big challenge is the lack of enough labeled data. Which is crucial for building strong machine learning models, but getting this data can be expensive and time-consuming. Active Machine Learning (ActiveML) is a helpful approach because it cleverly picks the most useful pieces of data to learn from. It uses two main ideas: informativeness and representativeness. Typical ActiveML methods used in ER usually depend too much on just one of these ideas, which can make them less effective, especially when starting with very little data. Our research introduces a new combined method that uses both ideas together. We created two versions of this method, called DPQ and STQ, and tested them on eleven different real-world datasets. The results showed that our new method improves ER by producing better scores, more stable models, and faster learning with less training data compared to existing methods.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102410"},"PeriodicalIF":3.7,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Di Gennaro , Claudia Greco , Amedeo Buonanno , Marialucia Cuciniello , Terry Amorese , Maria Santina Ler , Gennaro Cordasco , Francesco A.N. Palmieri , Anna Esposito
{"title":"HUM-CARD: A human crowded annotated real dataset","authors":"Giovanni Di Gennaro , Claudia Greco , Amedeo Buonanno , Marialucia Cuciniello , Terry Amorese , Maria Santina Ler , Gennaro Cordasco , Francesco A.N. Palmieri , Anna Esposito","doi":"10.1016/j.is.2024.102409","DOIUrl":"10.1016/j.is.2024.102409","url":null,"abstract":"<div><p>The growth of data-driven approaches typical of Machine Learning leads to an ever-increasing need for large quantities of labeled data. Unfortunately, these attributions are often made automatically and/or crudely, thus destroying the very concept of “ground truth” they are supposed to represent. To address this problem, we introduce HUM-CARD, a dataset of human trajectories in crowded contexts manually annotated by nine experts in engineering and psychology, totaling approximately <span><math><mrow><mn>5000</mn></mrow></math></span> hours. Our multidisciplinary labeling process has enabled the creation of a well-structured ontology, accounting for both individual and contextual factors influencing human movement dynamics in shared environments. Preliminary and descriptive analyzes are presented, highlighting the potential benefits of this dataset and its methodology in various research challenges.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102409"},"PeriodicalIF":3.7,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S030643792400067X/pdfft?md5=e81bccaabf431209b490556bb4e67c4b&pid=1-s2.0-S030643792400067X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141138482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GAMA: A multi-graph-based anomaly detection framework for business processes via graph neural networks","authors":"Wei Guan, Jian Cao, Yang Gu, Shiyou Qian","doi":"10.1016/j.is.2024.102405","DOIUrl":"https://doi.org/10.1016/j.is.2024.102405","url":null,"abstract":"<div><p>Anomalies in business processes are inevitable for various reasons such as system failures and operator errors. Detecting anomalies is important for the management and optimization of business processes. However, prevailing anomaly detection approaches often fail to capture crucial structural information about the underlying process. To address this, we propose a multi-Graph based Anomaly detection fraMework for business processes via grAph neural networks, named GAMA. GAMA makes use of structural process information and attribute information in a more integrated way. In GAMA, multiple graphs are applied to model a trace in which each attribute is modeled as a separate graph. In particular, the graph constructed for the special attribute <em>activity</em> reflects the control flow. Then GAMA employs a multi-graph encoder and a multi-sequence decoder on multiple graphs to detect anomalies in terms of the reconstruction errors. Moreover, three teacher forcing styles are designed to enhance GAMA’s ability to reconstruct normal behaviors and thus improve detection performance. We conduct extensive experiments on both synthetic logs and real-life logs. The experiment results demonstrate that GAMA outperforms state-of-the-art methods for both trace-level and attribute-level anomaly detection.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102405"},"PeriodicalIF":3.7,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141083465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Quijada-Fuentes , M. Andrea Rodríguez , Diego Seco
{"title":"TRGST: An enhanced generalized suffix tree for topological relations between paths","authors":"Carlos Quijada-Fuentes , M. Andrea Rodríguez , Diego Seco","doi":"10.1016/j.is.2024.102406","DOIUrl":"10.1016/j.is.2024.102406","url":null,"abstract":"<div><p>This paper introduces the <em>TRGST</em> data structure, which is designed to handle queries related to topological relations between paths represented as sequences of stops in a network. As an example, these paths could correspond to stops on a public transport network, and a query of interest is to retrieve paths that share at least <span><math><mi>k</mi></math></span> consecutive stops. While topological relations among spatial objects have received extensive attention, the efficient processing of these relations in the context of trajectory paths, considering both time and space efficiency, remains a relatively less explored domain. Taking inspiration from pattern matching implementations, the <em>TRGST</em> data structure is constructed on the foundation of the Generalized Suffix Tree. Its purpose is to provide a compact representation of a set of paths and to efficiently handle topological relation queries by leveraging the pattern search capabilities inherent in this structure. The paper provides a detailed account of the structure and algorithms of <em>TRGST</em>, followed by a performance analysis utilizing both real and synthetic data. The results underscore the remarkable scalability of the <em>TRGST</em> in terms of both query time and space utilization.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102406"},"PeriodicalIF":3.7,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141144791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MBDL: Exploring dynamic dependency among various types of behaviors for recommendation","authors":"Hang Zhang, Mingxin Gan","doi":"10.1016/j.is.2024.102407","DOIUrl":"10.1016/j.is.2024.102407","url":null,"abstract":"<div><p>Users have various behaviors on items, including <em>page view</em>, <em>tag-as-favorite</em>, <em>add-to-cart</em>, and <em>purchase</em> in online shopping platforms. These various types of behaviors reflect users’ different intentions, which also help learn their preferences on items in a recommender system. Although some multi-behavior recommendation methods have been proposed, two significant challenges have not been widely noticed: (i) capturing heterogeneous and dynamic preferences of users simultaneously from different types of behaviors; (ii) modeling the dynamic dependency among various types of behaviors. To overcome the above challenges, we propose a novel multi-behavior dynamic dependency learning method (MBDL) to explore the heterogeneity and dependency among various types of behavior sequences for recommendation. In brief, MBDL first uses a dual-channel interest encoder to learn the long-term interest representations and the evolution of short-term interests from the behavior-aware item sequences. Then, MBDL adopts a contrastive learning method to preserve the consistency of user’s long-term behavioral patterns, and a multi-head attention network to capture the dynamic dependency among short-term interactive behaviors. Finally, MBDL adaptively integrates the influence of long- and short-term interests to predict future user–item interactions. Experiments on two real-world datasets show that the proposed MBDL method outperforms state-of-the-art methods significantly on recommendation accuracy. Further ablation studies demonstrate the effectiveness of our model and the benefits of learning dynamic dependency among types of behaviors.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102407"},"PeriodicalIF":3.7,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141143297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Storage Management with Multi-Version Partitioned BTrees","authors":"Christian Riegger, Ilia Petrov","doi":"10.1016/j.is.2024.102403","DOIUrl":"https://doi.org/10.1016/j.is.2024.102403","url":null,"abstract":"<div><p>Modern persistent Key/Value-Stores operate on updatable datasets — massively exceeding the size of available main memory. Tree-based key/value storage management structures became particularly popular in storage engines. B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees allow constant search performance, however write-heavy workloads yield inefficient write patterns to secondary storage devices and poor performance characteristics. LSM-Trees overcome this issue by horizontal partitioning fractions of data — small enough to fully reside in main memory, but require frequent maintenance to sustain search performance.</p><p>To this end, firstly, we propose Multi-Version Partitioned BTrees (MV-PBT) as sole storage and index management structure in key-sorted storage engines like Key/Value-Stores. Secondly, we compare MV-PBT against LSM-Trees. The logical horizontal partitioning in MV-PBT allows leveraging recent advances in modern B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Tree techniques in a small transparent and memory resident portion of the structure. Structural properties sustain steady read performance, even on historical data, and yield efficient write patterns as well as reduced write-amplification.</p><p>We integrate MV-PBT in the WiredTiger key/value storage engine. MV-PBT offers an up to 2x increased steady throughput in comparison to LSM-Trees and several orders of magnitude in comparison to B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees in a YCSB workload. Moreover, MV-PBT exhibits robust time-travel query performance and outperforms LSM-Trees by 20% and B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-Trees by an order of magnitude.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102403"},"PeriodicalIF":3.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000619/pdfft?md5=cd0642883c73bb282d5d3104ee04d813&pid=1-s2.0-S0306437924000619-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141294465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}