Information Systems最新文献

筛选
英文 中文
MDU-Net: Multi-resolution learning and differential clustering fusion for multivariate electricity time series forecasting MDU-Net:多分辨率学习和多元电时间序列预测的差分聚类融合
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2026-01-19 DOI: 10.1016/j.is.2026.102693
Yongming Guan , Chengdong Zheng , Yuliang Shi , Gang Wang , Linfeng Wu , Zhiyong Chen , Hui Li
{"title":"MDU-Net: Multi-resolution learning and differential clustering fusion for multivariate electricity time series forecasting","authors":"Yongming Guan ,&nbsp;Chengdong Zheng ,&nbsp;Yuliang Shi ,&nbsp;Gang Wang ,&nbsp;Linfeng Wu ,&nbsp;Zhiyong Chen ,&nbsp;Hui Li","doi":"10.1016/j.is.2026.102693","DOIUrl":"10.1016/j.is.2026.102693","url":null,"abstract":"<div><div>Artificial intelligence (AI) has demonstrated transformative potential in diverse fields such as healthcare, drug discovery, and natural language processing by enabling advanced pattern recognition and predictive modeling of complex data. Particularly in the power system, where it involves areas such as power load, electricity price, and renewable energy, the application of AI technology to enhance the multivariate electricity time series forecasting tasks is crucial for grid security and economic dispatch. In power systems, multivariate electricity time series forecasting tasks involving power load, electricity prices, and renewable energy are crucial for grid security and economic dispatch. Contemporary forecasting approaches primarily focus on two aspects: modeling multi-scale periodic characteristics within sequences and capturing complex collaborative dependencies among variables. However, existing techniques often fail to simultaneously disentangle multi-scale features and model the dynamically heterogeneous dependencies between variables. To overcome these limitations, this paper proposes MDU-Net, a novel forecasting framework. The framework comprises two core modules: Multi-resolution hierarchical Union learning (MRU) module and Differential Channel Clustering Fusion (DCCF) Module. The MRU module constructs multi-granularity temporal representations through downsampling and achieves effective cross-scale feature fusion by integrating channel-independent operations with seasonal-trend decomposition. The DCCF module adopts first- and second-order derivative approximations to generate soft clustering mask matrices, adaptively capturing asymmetric collaborative dependencies among different variables over time. Experimental results on multiple public datasets (ETT, Electricity) demonstrate that MDU-Net significantly outperforms state-of-the-art baselines in multivariate electricity time series prediction. it achieves 2.7% and 17.1% relative MSE reductions compared to TimeMixer and PatchTST, respectively, with 1.4% and 14.4% lower MAE. Notably, MDU-Net maintains strong generalization capabilities and computational efficiency. The framework also shows promising performance in cross-domain applications such as traffic forecasting.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102693"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reflection on compliance monitoring in business processes: Functionalities, application, and tool-support 对业务流程中的遵从性监视的反思:功能、应用程序和工具支持
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2025-12-04 DOI: 10.1016/j.is.2025.102650
Linh Thao Ly , Fabrizio Maria Maggi , Marco Montali , Stefanie Rinderle-Ma , Wil M.P. van der Aalst
{"title":"Reflection on compliance monitoring in business processes: Functionalities, application, and tool-support","authors":"Linh Thao Ly ,&nbsp;Fabrizio Maria Maggi ,&nbsp;Marco Montali ,&nbsp;Stefanie Rinderle-Ma ,&nbsp;Wil M.P. van der Aalst","doi":"10.1016/j.is.2025.102650","DOIUrl":"10.1016/j.is.2025.102650","url":null,"abstract":"<div><div>Together with Information Systems, we celebrate the journal’s 50th anniversary and the 10th anniversary of our joint work on a systematic framework for compliance monitoring functionalities.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102650"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example 示例驱动的语义相似度感知查询意图发现:使用户能够通过示例查询跨越SQL障碍
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2026-01-12 DOI: 10.1016/j.is.2026.102687
Anna Fariha , Lucy Cousins , Narges Mahyar , Alexandra Meliou
{"title":"Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example","authors":"Anna Fariha ,&nbsp;Lucy Cousins ,&nbsp;Narges Mahyar ,&nbsp;Alexandra Meliou","doi":"10.1016/j.is.2026.102687","DOIUrl":"10.1016/j.is.2026.102687","url":null,"abstract":"<div><div>Traditional relational data interfaces require precise structured queries over potentially complex schemas. These rigid data retrieval mechanisms pose hurdles for nonexpert users, who typically lack programming language expertise and are unfamiliar with the details of the schema. Existing tools assist in formulating queries through keyword search, query recommendation, and query auto-completion, but still require some technical expertise. An alternative method for accessing data is <em>query by example</em> (QBE), where users express their data exploration intent simply by providing examples of their intended data and the system infers the intended query. However, existing QBE approaches focus on the structural similarity of the examples and ignore the richer context present in the data. As a result, they typically produce queries that are too general, and fail to capture the user’s intent effectively. In this article, we present <span>SQuID</span>, a system that performs <em>semantic-similarity-aware</em> query intent discovery from user-provided example tuples.</div><div>Our work makes the following contributions: (1) We design <span>SQuID</span>: an end-to-end system that automatically formulates select-project-join queries with optional group-by aggregation and intersection operators – a much larger class than what prior QBE techniques support – from user-provided examples, in an open-world setting. (2) We express the problem of query intent discovery using a <em>probabilistic abduction model</em> that infers a query as the most likely explanation of the provided examples. (3) We introduce the notion of an <em>abduction-ready</em> database, which precomputes semantic properties and related statistics, allowing <span>SQuID</span> to achieve real-time performance. (4) We present an extensive empirical evaluation on three real-world datasets, including user intent case studies, demonstrating that <span>SQuID</span> is efficient and effective, and outperforms machine learning methods, as well as the state of the art in the related query reverse engineering problem. (5) We contrast <span>SQuID</span> with traditional <span>SQL</span> querying through a comparative user study, which demonstrates that users with varying expertise are significantly more effective and efficient with <span>SQuID</span> than <span>SQL</span>. We find that <span>SQuID</span> eliminates the barriers in studying the database schema, formalizing task semantics, and writing syntactically correct <span>SQL</span> queries, and, thus, substantially alleviates the need for technical expertise in data exploration.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102687"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated decision-making for dynamic task assignment at scale 大规模动态任务分配的自动决策
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2026-01-22 DOI: 10.1016/j.is.2026.102694
Riccardo Lo Bianco , Willem van Jaarsveld , Jeroen Middelhuis , Luca Begnardi , Remco Dijkman
{"title":"Automated decision-making for dynamic task assignment at scale","authors":"Riccardo Lo Bianco ,&nbsp;Willem van Jaarsveld ,&nbsp;Jeroen Middelhuis ,&nbsp;Luca Begnardi ,&nbsp;Remco Dijkman","doi":"10.1016/j.is.2026.102694","DOIUrl":"10.1016/j.is.2026.102694","url":null,"abstract":"<div><div>The Dynamic Task Assignment Problem (DTAP) concerns matching resources to tasks in real time while minimizing some objectives, like resource costs or task cycle time. In this work, we consider a DTAP variant where every task is a case composed of a stochastic sequence of activities. The DTAP, in this case, involves the decision of which employee to assign to which activity to process requests as quickly as possible. In recent years, Deep Reinforcement Learning (DRL) has emerged as a promising tool for tackling this DTAP variant, but most research is limited to solving small-scale, synthetic problems, neglecting the challenges posed by real-world use cases. To bridge this gap, this work proposes a DRL-based Decision Support System (DSS) for real-world scale DTAPs. To this end, we introduce a DRL agent with two novel elements: a graph structure for observations and actions that can effectively represent any DTAP and a reward function that is provably equivalent to the objective of minimizing the average cycle time of tasks. The combination of these two novelties allows the agent to learn effective and generalizable assignment policies for real-world scale DTAPs. The proposed DSS is evaluated on five DTAP instances whose parameters are extracted from real-world logs through process mining. The experimental evaluation shows how the proposed DRL agent matches or outperforms the best baseline in all DTAP instances and generalizes on different time horizons and across instances.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102694"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VCR: Interpretable and interactive debugging of object detection models with visual concepts 具有可视化概念的对象检测模型的可解释和交互式调试
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2025-12-12 DOI: 10.1016/j.is.2025.102652
Jie Jeff Xu , Saahir Dhanani , Jorge Piazentin Ono , Wenbin He , Liu Ren , Kexin Rong
{"title":"VCR: Interpretable and interactive debugging of object detection models with visual concepts","authors":"Jie Jeff Xu ,&nbsp;Saahir Dhanani ,&nbsp;Jorge Piazentin Ono ,&nbsp;Wenbin He ,&nbsp;Liu Ren ,&nbsp;Kexin Rong","doi":"10.1016/j.is.2025.102652","DOIUrl":"10.1016/j.is.2025.102652","url":null,"abstract":"<div><div>Computer vision models can make systematic errors, performing well on average but substantially worse on particular subsets (or slices) of data. In this work, we introduce Visual Concept Reviewer (VCR), a human-in-the-loop slice discovery framework that enables practitioners to interactively discover and understand systematic errors in object-detection models via novel use of visual concepts–semantically meaningful and frequently recurring image segments representing objects, parts, or abstract properties.</div><div>Leveraging recent advances in vision foundation models, <span>VCR</span> automatically generates segment-level visual concepts that serve as interpretable primitives for diagnosing issues in object-detection models, while also supporting lightweight human supervision when needed. <span>VCR</span> combines visual concepts with metadata in a tabular format and adapts frequent itemset mining techniques to identify common absences and presences of concepts associated with poor model performance at interactive speeds. <span>VCR</span> also keeps humans in the loop for interpretation and refinement at each step of the slice discovery process. We demonstrate VCR’s effectiveness and scalability through a new evaluation benchmark with 1713 slice discovery settings across three datasets. A user study with six expert industry machine learning scientists and engineers provides qualitative evidence of VCR’s utility in real-world workflows.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102652"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring cultural commonsense in multilingual large language models: A survey 探索多语言大语言模型中的文化常识:综述
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2025-12-01 DOI: 10.1016/j.is.2025.102649
Geleta Negasa Binegde, Huaping Zhang
{"title":"Exploring cultural commonsense in multilingual large language models: A survey","authors":"Geleta Negasa Binegde,&nbsp;Huaping Zhang","doi":"10.1016/j.is.2025.102649","DOIUrl":"10.1016/j.is.2025.102649","url":null,"abstract":"<div><div>Large language models (LLMs) have demonstrated impressive proficiency in multilingual natural language processing (NLP), yet they frequently struggle with cultural commonsense—the implicit knowledge shaped by societal norms, traditions, and shared experiences. As these models are deployed in diverse linguistic and cultural settings, their ability to understand and apply cultural commonsense becomes crucial for ensuring fairness, inclusivity, and contextual accuracy. This paper presents a systematic review and a large-scale empirical benchmark for evaluating cultural commonsense in multilingual LLMs. Through a comprehensive evaluation of 15 models on the BLEnD dataset, our analysis reveals a critical performance gap of 64.2% between high-resource and low-resource cultures. The results demonstrate significant disparities across model architectures: encoder-only models show more consistent but lower overall performance compared to decoder-based models. We identify key limitations, including data scarcity, representational bias, and inadequate cross-lingual knowledge transfer. Finally, we propose future research directions, such as culturally diverse dataset curation, hybrid knowledge graph architectures, and fairness-aware fine-tuning. The primary contributions of this work are (1) a systematic review of challenges and mitigation strategies for cultural commonsense; (2) a large-scale empirical benchmark that evaluates 15 multilingual LLMs across 13 languages and 16 countries, revealing significant performance disparities; and (3) concrete findings on the effects of model architecture and the limitations of scale in cultural understanding. This research underscores the urgent need to advance cultural commonsense in multilingual LLMs to ensure the development of fair, inclusive, and contextually accurate AI systems globally.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102649"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145652035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reflection on the convergence and interplay of edge, fog, and cloud in the AI-driven Internet of Things (IoT) 人工智能驱动的物联网中边缘、雾、云的融合与互动思考
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2025-12-03 DOI: 10.1016/j.is.2025.102662
Farshad Firouzi , Bahar Farahani , Alexander Marinšek
{"title":"Reflection on the convergence and interplay of edge, fog, and cloud in the AI-driven Internet of Things (IoT)","authors":"Farshad Firouzi ,&nbsp;Bahar Farahani ,&nbsp;Alexander Marinšek","doi":"10.1016/j.is.2025.102662","DOIUrl":"10.1016/j.is.2025.102662","url":null,"abstract":"<div><div>As the Information Systems Journal celebrates its 50th Anniversary, we are honored to reflect on the journey and legacy of our 2022 article, “The convergence and interplay of edge, fog, and cloud in the AI-driven Internet of Things (IoT)”. The paper introduced a unified architectural framework that advanced the integration of computing, intelligence, and connectivity across the edge–fog–cloud continuum, establishing a foundational model for scalable, adaptive, context-aware, and trustworthy AI-enabled systems. This reflection highlights how the work has shaped our research trajectories, influenced developments within the broader scientific community, and guided innovation, education, and industrial practice.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102662"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying organizational mining to discover agent systems from event data 应用组织挖掘技术从事件数据中发现代理系统
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2025-12-31 DOI: 10.1016/j.is.2025.102669
Qingtan Shen , Artem Polyvyanyy , Nir Lipovetzky , Timotheus Kampik
{"title":"Applying organizational mining to discover agent systems from event data","authors":"Qingtan Shen ,&nbsp;Artem Polyvyanyy ,&nbsp;Nir Lipovetzky ,&nbsp;Timotheus Kampik","doi":"10.1016/j.is.2025.102669","DOIUrl":"10.1016/j.is.2025.102669","url":null,"abstract":"<div><div>Agent system mining is a recently introduced type of process mining that takes a bottom-up approach to the data-driven analysis of socio-technical systems that execute business processes in organizations. Instead of the top-down approach used in conventional process mining that studies a system in terms of its global state evolution, agent system mining analyzes the system as if it is composed of autonomous agents, each with its local state and behavior, interacting with other agents and the environment to contribute to the emerging global behavior of the business process. Recently, Agent Miner, the first algorithm for discovering agent systems from event data generated by process-aware information systems, has been proposed. The quality of the agent systems discovered by this algorithm depends on the quality of the agent types (or agents), which are identified from the available information about agent instances in the data. In this paper, we study the suitability and benefits of using methods from the organizational mining subarea of process mining for identifying agent types. The experiments we conduct over real-world datasets confirm the usefulness of such methods for discovering simple, modular, and accurate agent systems. These conclusions are grounded in quality metrics such as the size of discovered models (simplicity), Louvain modularity and the Gini coefficient (modularity), and precision and recall (accuracy). The results confirm the benefits of using organizational mining for identifying agent types when discovering agent systems from event data, leading to the construction of models of superior quality in precision, recall, and simplicity compared to models constructed by state-of-the-art conventional process discovery algorithms.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102669"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based similarity measures for the structural comparison of process traces 用于过程轨迹结构比较的基于图的相似性度量
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2025-12-26 DOI: 10.1016/j.is.2025.102671
Clemens Schreiber , Amine Abbad-Andaloussi , Andrea Burattin , Andreas Oberweis , Barbara Weber
{"title":"Graph-based similarity measures for the structural comparison of process traces","authors":"Clemens Schreiber ,&nbsp;Amine Abbad-Andaloussi ,&nbsp;Andrea Burattin ,&nbsp;Andreas Oberweis ,&nbsp;Barbara Weber","doi":"10.1016/j.is.2025.102671","DOIUrl":"10.1016/j.is.2025.102671","url":null,"abstract":"<div><div>Similarity measures are commonly applied for a variety of process mining techniques, such as trace clustering, conformance checking, and event abstraction. Yet, these measures generally fail to recognize similarity based on structural process features, such as the order of activities, loops, skips, choices, and parallelism. To make this more explicit, we propose a set of properties that allow to evaluate, what kind of structural features are reflected by a similarity measure. We further propose a novel approach leveraging existing graph-based algorithms and instance graphs to extract high-level structural features (loops, skips, choices, and parallelism) from traces, such that they can be used to extend and improve existing similarity measures. These algorithms are well-established in graph theory and can be computed efficiently. Finally, we provide an evaluation of the proposed approach based on synthetic and real-world datasets. The evaluation provides evidence that the additional graph-based features can substantially improve the similarity comparison of traces in several cases. This applies in particular for the comparison of user behavior (e.g., based on eye tracking data) where structural features enable the detection of specific behavioral patterns.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102671"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynaHash: An efficient blocking structure for streaming record linkage DynaHash:一种高效的流记录链接阻塞结构
IF 3.4 2区 计算机科学
Information Systems Pub Date : 2026-06-01 Epub Date: 2026-01-16 DOI: 10.1016/j.is.2026.102692
Dimitrios Karapiperis , Christos Tjortjis , Vassilios S. Verykios
{"title":"DynaHash: An efficient blocking structure for streaming record linkage","authors":"Dimitrios Karapiperis ,&nbsp;Christos Tjortjis ,&nbsp;Vassilios S. Verykios","doi":"10.1016/j.is.2026.102692","DOIUrl":"10.1016/j.is.2026.102692","url":null,"abstract":"<div><div>Record linkage holds a crucial position in data management and analysis by identifying and merging records from disparate data sets that pertain to the same real-world entity. As data volumes grow, the intricacies of record linkage amplify, presenting challenges, such as potential redundancies and computational complexities. This paper introduces DynaHash, a novel randomized record linkage mechanism that utilizes (a) the MinHash technique to generate compact representations of blocking keys and (b) Hamming Locality-Sensitive Hashing (LSH) to construct the blocking structure from these vectors. By employing these methods, DynaHash offers theoretical guarantees of accuracy and achieves sublinear runtime complexities, with appropriate parameter tuning. It comprises two key components: a persistent storage system for permanently storing the blocking structure to ensure complete results, and an in-memory component for generating very fast partial results by summarizing the persisted blocking structure. Additionally, DynaHash leverages Multi-Probe matching to scan multiple neighboring blocks, in terms of their Hamming distances, in order to find matches. Our theoretical work derives a decrease factor in the space requirements, which depends on the Hamming threshold, compared with the baseline LSH. Our experimental evaluation against three state-of-the-art methods on six real-world data sets demonstrates DynaHash’s exceptional recall rates and query times, which are at least <span><math><mrow><mn>2</mn><mo>×</mo></mrow></math></span> faster than its competitors and do not depend on the size of the underlying data sets.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"138 ","pages":"Article 102692"},"PeriodicalIF":3.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书