Information Systems最新文献

筛选
英文 中文
Process-related user interaction logs: State of the art, reference model, and object-centric implementation 与流程相关的用户交互日志:技术现状、参考模型和以对象为中心的实现方法
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-04-13 DOI: 10.1016/j.is.2024.102386
Luka Abb, Jana-Rebecca Rehse
{"title":"Process-related user interaction logs: State of the art, reference model, and object-centric implementation","authors":"Luka Abb,&nbsp;Jana-Rebecca Rehse","doi":"10.1016/j.is.2024.102386","DOIUrl":"https://doi.org/10.1016/j.is.2024.102386","url":null,"abstract":"<div><p>User interaction (UI) logs are high-resolution event logs that record low-level activities performed by a user during the execution of a task in an information system. Each event in such a log represents an interaction between the user and the interface, such as clicking a button, ticking a checkbox, or typing into a text field. UI logs are used in many different application contexts for purposes such as usability analysis, task mining, or robotic process automation (RPA). However, UI logs suffer from a lack of standardization. Each research study and processing tool relies on a different conceptualization and implementation of the elements and attributes of user interactions. This exacerbates or even prohibits the integration of UI logs from different sources or the combination of UI data collection tools with downstream analytics or automation solutions. In this paper, our objective is to address this issue and facilitate the exchange and analysis of UI logs in research and practice. Therefore, we first review process-related UI logs in scientific publications and industry tools to determine commonalities and differences between them. Based on our findings, we propose a universally applicable reference data model for process-related UI logs, which includes all core attributes but remains flexible regarding the scope, level of abstraction, and case notion. Finally, we provide exemplary implementations of the reference model in XES and OCED.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000449/pdfft?md5=99fcafdb33deb5f863a548bbb4740fc9&pid=1-s2.0-S0306437924000449-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140604560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Read-safe snapshots: An abort/wait-free serializable read method for read-only transactions on mixed OLTP/OLAP workloads 安全读取快照适用于混合 OLTP/OLAP 工作负载上只读事务的中止/免等待可序列化读取方法
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-04-09 DOI: 10.1016/j.is.2024.102385
Takamitsu Shioi , Takashi Kambayashi , Suguru Arakawa , Ryoji Kurosawa , Satoshi Hikida , Haruo Yokota
{"title":"Read-safe snapshots: An abort/wait-free serializable read method for read-only transactions on mixed OLTP/OLAP workloads","authors":"Takamitsu Shioi ,&nbsp;Takashi Kambayashi ,&nbsp;Suguru Arakawa ,&nbsp;Ryoji Kurosawa ,&nbsp;Satoshi Hikida ,&nbsp;Haruo Yokota","doi":"10.1016/j.is.2024.102385","DOIUrl":"https://doi.org/10.1016/j.is.2024.102385","url":null,"abstract":"<div><p>This paper proposes Read-Safe Snapshots (RSS), a concurrency control method that ensures reading the latest serializable version on multiversion concurrency control (MVCC) for read-only transactions without creating any serializability anomaly, thereby enhancing the transaction processing throughput under mixed workloads of online transactional processing (OLTP) and online analytical processing (OLAP). Ensuring serializability for data consistency between OLTP and OLAP is vital to prevent OLAP from obtaining nonserializable results. Existing serializability methods achieve this consistency by making OLTP or OLAP transactions aborts or waits, but these can lead to throughput degradation when implemented for large read sets in read-only OLAP transactions under mixed workloads of the recent real-time analysis applications. To deal with this problem, we present an RSS construction algorithm that does not affect the conventional OLTP performance and simultaneously avoids producing additional aborts and waits. Moreover, the RSS construction method can be easily applied to the read-only replica of a multinode system as well as a single-node system because no validation for serializability is required. Our experimental findings showed that RSS could prevent read-only OLAP transactions from creating anomaly cycles under a multinode environment of master-copy replication, which led to the achievement of serializability with the low overhead of about 15% compared to baseline OLTP/OLAP throughputs under snapshot isolation (SI). The OLTP throughput under our proposed method in a mixed OLTP/OLAP workload was about 45% better than SafeSnapshots, a serializable snapshot isolation (SSI) equipped with a read-only optimization method, and did not degrade the OLAP throughput.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000437/pdfft?md5=44919a1e7ab150e46eaabe4c385782e7&pid=1-s2.0-S0306437924000437-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140552021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blockchain technology for requirement traceability in systems engineering 区块链技术在系统工程中的需求可追溯性
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-04-05 DOI: 10.1016/j.is.2024.102384
Mohan S.R. Elapolu , Rahul Rai , David J. Gorsich , Denise Rizzo , Stephen Rapp , Matthew P. Castanier
{"title":"Blockchain technology for requirement traceability in systems engineering","authors":"Mohan S.R. Elapolu ,&nbsp;Rahul Rai ,&nbsp;David J. Gorsich ,&nbsp;Denise Rizzo ,&nbsp;Stephen Rapp ,&nbsp;Matthew P. Castanier","doi":"10.1016/j.is.2024.102384","DOIUrl":"https://doi.org/10.1016/j.is.2024.102384","url":null,"abstract":"<div><p>Requirement engineering (RE), a systematic process of eliciting, defining, analyzing, and managing requirements, is a vital phase in systems engineering. In RE, requirement traceability establishes the relationship between the artifacts and supports requirement validation, change management, and impact analysis. Establishing requirement traceability is challenging, especially in the early stages of a complex system design, as requirements constantly evolve and change. Moreover, the involvement of distributed stakeholders in system development introduces collaboration and trust issues. This paper outlines a novel blockchain-based requirement traceability framework that includes a data acquisition template and graph-based visualization. The template enables dual-level traceability (artifact and object) in the RE processes. The traceability information acquired through the templates is stored in the blockchain, where traces are embedded in blocks’ metadata and data. Furthermore, the blockchain is represented as a <em>Neo4J</em> property graph where traces can be retrieved using <em>Cypher</em> queries, thus enabling a mechanism to query and examine the history of requirements. The framework’s efficacy is showcased by documenting the RE process of an autonomous automotive system. Our results indicated that the framework can record the history of artifacts with constantly changing requirements and can yield secure decentralized ledgers of requirement artifacts. The proposed distributed traceability framework has shown promise to enhance stakeholder collaboration and trust. However, additional user studies should be conducted to bolster our results.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140554947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enjoy the silence: Analysis of stochastic Petri nets with silent transitions 享受无声无声转换的随机 Petri 网分析
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-04-04 DOI: 10.1016/j.is.2024.102383
Sander J.J. Leemans , Fabrizio Maria Maggi , Marco Montali
{"title":"Enjoy the silence: Analysis of stochastic Petri nets with silent transitions","authors":"Sander J.J. Leemans ,&nbsp;Fabrizio Maria Maggi ,&nbsp;Marco Montali","doi":"10.1016/j.is.2024.102383","DOIUrl":"10.1016/j.is.2024.102383","url":null,"abstract":"<div><p>Capturing stochastic behaviour in business and work processes is essential to quantitatively understand how nondeterminism is resolved when taking decisions within the process. This is of special interest in process mining, where event data tracking the actual execution of the process are related to process models, and can then provide insights on frequencies and probabilities. Variants of stochastic Petri nets provide a natural formal basis to represent stochastic behaviour and support different data-driven and model-driven analysis tasks in this spectrum. However, when capturing business processes, such nets inherently need a labelling that maps between transitions and activities. In many state of the art process mining techniques, this labelling is not 1-on-1, leading to unlabelled transitions and activities represented by multiple transitions. At the same time, they have to be analysed in a finite-trace semantics, matching the fact that each process execution consists of finitely many steps. These two aspects impede the direct application of existing techniques for stochastic Petri nets, calling for a novel characterisation that incorporates labels and silent transitions in a finite-trace semantics. In this article, we provide such a characterisation starting from generalised stochastic Petri nets and obtaining the framework of labelled stochastic processes (LSPs). On top of this framework, we introduce different key analysis tasks on the traces of LSPs and their probabilities. We show that all such analysis tasks can be solved analytically, in particular reducing them to a single method that combines automata-based techniques to single out the behaviour of interest within an LSP, with techniques based on absorbing Markov chains to reason on their probabilities. Finally, we demonstrate the significance of how our approach in the context of stochastic conformance checking, illustrating practical feasibility through a proof-of-concept implementation and its application to different datasets.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000413/pdfft?md5=2011a29e04496e91e304834ecac1b098&pid=1-s2.0-S0306437924000413-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140762458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A chance for models to show their quality: Stochastic process model-log dimensions 模型展示其质量的机会随机过程模型-对数维度
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-04-02 DOI: 10.1016/j.is.2024.102382
Adam T. Burke , Sander J.J. Leemans , Moe T. Wynn , Wil M.P. van der Aalst , Arthur H.M. ter Hofstede
{"title":"A chance for models to show their quality: Stochastic process model-log dimensions","authors":"Adam T. Burke ,&nbsp;Sander J.J. Leemans ,&nbsp;Moe T. Wynn ,&nbsp;Wil M.P. van der Aalst ,&nbsp;Arthur H.M. ter Hofstede","doi":"10.1016/j.is.2024.102382","DOIUrl":"https://doi.org/10.1016/j.is.2024.102382","url":null,"abstract":"<div><p>Process models describe the desired or observed behaviour of organisations. In stochastic process mining, computational analysis of trace data yields process models which describe process paths and their probability of execution. To understand the quality of these models, and to compare them, quantitative quality measures are used.</p><p>This research investigates model comparison empirically, using stochastic process models built from real-life logs. The experimental design collects a large number of models generated randomly and using process discovery techniques. Twenty-five different metrics are taken on these models, using both existing process model metrics and new, exploratory ones. The results are analysed quantitatively, making particular use of principal component analysis.</p><p>Based on this analysis, we suggest three stochastic process model dimensions: adhesion, relevance and simplicity. We also suggest possible metrics for these dimensions, and demonstrate their use on example models.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000401/pdfft?md5=6831ca8dc2e3712e67135ed5946d6e27&pid=1-s2.0-S0306437924000401-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140552022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The rise of nonnegative matrix factorization: Algorithms and applications 非负矩阵因式分解的兴起:算法与应用
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-03-21 DOI: 10.1016/j.is.2024.102379
Yi-Ting Guo , Qin-Qin Li , Chun-Sheng Liang
{"title":"The rise of nonnegative matrix factorization: Algorithms and applications","authors":"Yi-Ting Guo ,&nbsp;Qin-Qin Li ,&nbsp;Chun-Sheng Liang","doi":"10.1016/j.is.2024.102379","DOIUrl":"10.1016/j.is.2024.102379","url":null,"abstract":"<div><p>Although nonnegative matrix factorization (NMF) is widely used, some matrix factorization methods result in misleading results and waste of computing resources due to lack of timely optimization and case-by-case consideration. Therefore, an up-to-date and comprehensive review on its algorithms and applications is needed to promote improvement and applications for NMF. Here, we start with introducing background and gathering the principles and formulae of NMF algorithms. There have been dozens of new algorithms since its birth in the 1990s. Generally, several or even more algorithms are adopted in a single software package written in R, Python, C/C++, etc. Besides, the applications of NMF are analyzed. NMF is not only most widely used in modern subjects or techniques such as computer science, telecommunications, imaging science, and remote sensing but also increasingly used in traditional subjects such as physics, chemistry, biology, medicine, and psychology, being accepted by around 130 fields (disciplines) in about 20 years. Finally, the features and performance of different categories of NMF are summarized and evaluated. The summarized advantages and disadvantages and proposed suggestions for improvements are expected to enlighten the future efforts to polish the mathematical principles and procedures of NMF to realize higher accuracy and productivity in practical use.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140276114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cube query interestingness: Novelty, relevance, peculiarity and surprise 立方体查询的趣味性:新颖性、相关性、特殊性和惊奇性
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-03-21 DOI: 10.1016/j.is.2024.102381
Dimos Gkitsakis , Spyridon Kaloudis , Eirini Mouselli , Veronika Peralta , Patrick Marcel , Panos Vassiliadis
{"title":"Cube query interestingness: Novelty, relevance, peculiarity and surprise","authors":"Dimos Gkitsakis ,&nbsp;Spyridon Kaloudis ,&nbsp;Eirini Mouselli ,&nbsp;Veronika Peralta ,&nbsp;Patrick Marcel ,&nbsp;Panos Vassiliadis","doi":"10.1016/j.is.2024.102381","DOIUrl":"https://doi.org/10.1016/j.is.2024.102381","url":null,"abstract":"<div><p>In this paper, we discuss methods to assess the interestingness of a query in an environment of data cubes. We assume a hierarchical multidimensional database, storing data cubes and level hierarchies. We start with a comprehensive review of related work in the fields of human behavior studies and computer science. We define the interestingness of a query as a vector of scores along different aspects, like novelty, relevance, surprise and peculiarity and complement this definition with a taxonomy of the information that can be used to assess each of these aspects of interestingness. We provide both syntactic (result-independent) and extensional (result-dependent) checks, measures and algorithms for assessing the different aspects of interestingness in a quantitative fashion. We also report our findings from a user study that we conducted, analyzing the significance of each aspect, its evolution over time and the behavior of the study’s participants.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140290602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A graph neural network with topic relation heterogeneous multi-level cross-item information for session-based recommendation 基于会话推荐的具有主题关系异构多级跨项信息的图神经网络
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-03-20 DOI: 10.1016/j.is.2024.102380
Fan Yang, Dunlu Peng
{"title":"A graph neural network with topic relation heterogeneous multi-level cross-item information for session-based recommendation","authors":"Fan Yang,&nbsp;Dunlu Peng","doi":"10.1016/j.is.2024.102380","DOIUrl":"https://doi.org/10.1016/j.is.2024.102380","url":null,"abstract":"<div><p>The aim of session-based recommendation (SBR) mainly analyzes the anonymous user’s historical behavior records to predict the next possible interaction item and recommend the result to the user. However, due to the anonymity of users and the sparsity of behavior records, recommendation results are often inaccurate. The existing SBR models mainly consider the order of items within a session and rarely analyze the complex transition relationship between items, and additionally, they are inadequate at mining higher-order hidden relationship between different sessions. To address these issues, we propose a topic relation heterogeneous multi-level cross-item information graph neural network (TRHMCI-GNN) to improve the performance of recommendation. The model attempts to capture hidden relationship between items through topic classification and build a topic relation heterogeneous cross-item global graph. The graph contains inter-session cross-item information as well as hidden topic relation among sessions. In addition, a self-loop star graph is established to learn the intra-session cross-item information, and the self-connection attributes are added to fuse the information of each item itself. By using channel-hybrid attention mechanism, the item information of different levels is pooled by two channels: max-pooling and mean-pooling, which effectively fuse the item information of cross-item global graph and self-loop star graph. In this way, the model captures the global information of the target item and its individual features, and the label smoothing operation is added for recommendation. Extensive experimental results demonstrate that the recommendation performance of TRHMCI-GNN model is superior to the comparable baseline models on the three real datasets Diginetica, Yoochoose1/64 and Tmall. The code is available now.<span><sup>1</sup></span></p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140209509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection 基于跨模态注意力的深度学习框架,使用统一模态进行多模态假新闻、仇恨言论和攻击性语言检测
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-03-16 DOI: 10.1016/j.is.2024.102378
Eniafe Festus Ayetiran , Özlem Özgöbek
{"title":"An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection","authors":"Eniafe Festus Ayetiran ,&nbsp;Özlem Özgöbek","doi":"10.1016/j.is.2024.102378","DOIUrl":"https://doi.org/10.1016/j.is.2024.102378","url":null,"abstract":"<div><p>Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the potentials offered by other modalities in contributing to the detection of these menaces. However, a major problem in multimodal content understanding is how to effectively model the complementarity of the different modalities due to their diverse characteristics and features. From a multimodal point of view, the three tasks have been studied mainly using image and text modalities. Improving the effectiveness of the diverse multimodal approaches is still an open research topic. In addition to the traditional text and image modalities, we consider image–texts which are rarely used in previous studies but which contain useful information for enhancing the effectiveness of a prediction model. In order to ease multimodal content understanding and enhance prediction, we leverage recent advances in computer vision and deep learning for these tasks. First, we unify the modalities by creating a text representation of the images and image–texts, in addition to the main text. Secondly, we propose a multi-layer deep neural network with inter-modal attention mechanism to model the complementarity among these modalities. We conduct extensive experiments involving three standard datasets covering the three tasks. Experimental results show that detection of fake news, hate speech and offensive language can benefit from this approach. Furthermore, we conduct robust ablation experiments to show the effectiveness of our approach. Our model predominantly outperforms prior works across the datasets.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S030643792400036X/pdfft?md5=a31db78e16613aefde39a1acfcbb50af&pid=1-s2.0-S030643792400036X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140163766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SuperGuardian: Superspreader removal for cardinality estimation in data streaming 超级守护者在数据流中消除超级散布器以估算心率
IF 3.7 2区 计算机科学
Information Systems Pub Date : 2024-02-17 DOI: 10.1016/j.is.2024.102351
Jie Lu , Hongchang Chen , Penghao Sun , Tao Hu , Zhen Zhang , Quan Ren
{"title":"SuperGuardian: Superspreader removal for cardinality estimation in data streaming","authors":"Jie Lu ,&nbsp;Hongchang Chen ,&nbsp;Penghao Sun ,&nbsp;Tao Hu ,&nbsp;Zhen Zhang ,&nbsp;Quan Ren","doi":"10.1016/j.is.2024.102351","DOIUrl":"https://doi.org/10.1016/j.is.2024.102351","url":null,"abstract":"<div><p>Measuring flow cardinality is one of the fundamental problems in data stream mining, where a data stream is modeled as a sequence of items from different flows and the cardinality of a flow is the number of distinct items in the flow. Many existing sketches based on estimator sharing have been proposed to deal with huge flows in data streams. However, these sketches suffer from inefficient memory usage due to allocating the same memory size for each estimator without considering the skewed cardinality distribution. To address this issue, we propose SuperGuardian to improve the memory efficiency of existing sketches. SuperGuardian intelligently separates flows with high-cardinality from the data stream, and keeps the information of these flows with the large estimator, while using existing sketches with small estimators to record low-cardinality flows. We carry out a mathematical analysis for the cardinality estimation error of SuperGuardian. To validate our proposal, we have implemented SuperGuardian and conducted experimental evaluations using real traffic traces. The experimental results show that existing sketches using SuperGuardian reduce error by 79 % - 96 % and increase the throughput by 0.3–2.3 times.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139986939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信