Information Systems最新文献

筛选
英文 中文
Training-free sparse representations of dense vectors for scalable information retrieval 面向可扩展信息检索的密集向量的无训练稀疏表示
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-05-13 DOI: 10.1016/j.is.2025.102567
Fabio Carrara, Lucia Vadicamo, Giuseppe Amato, Claudio Gennaro
{"title":"Training-free sparse representations of dense vectors for scalable information retrieval","authors":"Fabio Carrara,&nbsp;Lucia Vadicamo,&nbsp;Giuseppe Amato,&nbsp;Claudio Gennaro","doi":"10.1016/j.is.2025.102567","DOIUrl":"10.1016/j.is.2025.102567","url":null,"abstract":"<div><div>In this paper, we propose and analyze Vec2Doc, a novel training-free method to transform dense vectors into sparse integer vectors, facilitating the use of inverted indexes for information retrieval (IR). The exponential growth of deep learning and artificial intelligence has revolutionized scientific problem-solving in areas such as computer vision, natural language processing, and automatic content generation. These advances have also significantly impacted IR, with a better understanding of natural language and multimodal content analysis leading to more accurate information retrieval. Despite these developments, modern IR relies primarily on the similarity evaluation of dense vectors from the latent spaces of deep neural networks. This dependence introduces substantial challenges in performing similarity searches on large collections containing billions of vectors. Traditional IR methods, which employ inverted indexes and vector space models, are adept at handling sparse vectors but do not work well with dense ones. Vec2Doc attempts to fill this gap by converting dense vectors into a format compatible with conventional inverted index techniques. Our preliminary experimental evaluations show that Vec2Doc is a promising solution to overcome the scalability problems inherent in vector-based IR, offering an alternative method for efficient and accurate large-scale information retrieval.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102567"},"PeriodicalIF":3.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Alternating Optimization Scheme for Binary Sketches 二元草图的交替优化方案
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-05-10 DOI: 10.1016/j.is.2025.102563
Erik Thordsen, Erich Schubert
{"title":"An Alternating Optimization Scheme for Binary Sketches","authors":"Erik Thordsen,&nbsp;Erich Schubert","doi":"10.1016/j.is.2025.102563","DOIUrl":"10.1016/j.is.2025.102563","url":null,"abstract":"<div><div>Searching for similar objects in intrinsically high-dimensional data sets is a challenging task. The use of compact sketches has been proposed for faster similarity search using linear scans. Binary sketches are one such approach to find a good mapping from the original data space to bit strings of a fixed length. These bit strings can be compared efficiently using only few XOR and bit count operations, replacing costly similarity computations with an inexpensive approximation. We propose a new scheme to initialize and improve binary sketches for similarity search in Euclidean spaces. Our optimization iteratively improves the quality of the sketches with a form of orthogonalization. We provide empirical evidence that the quality of the sketches has a peak beyond which it is not correlated to neither bit independence nor bit balance, which contradicts a previous hypothesis in the literature. Regularization in the form of noise added to the training data can turn the peak into a plateau and applying the optimization in a stochastic fashion, i.e., training on smaller subsets of the data, allows for rapid initialization. We provide a loss function that allows to approximate the same objective using neural network frameworks such as PyTorch, elevating the approach to GPU-based training.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102563"},"PeriodicalIF":3.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class Representatives Selection in non-metric spaces for nearest prototype classification 非度量空间中最接近原型分类的类代表选择
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-05-10 DOI: 10.1016/j.is.2025.102564
Jaroslav Hlaváč , Martin Kopp , Tomáš Skopal
{"title":"Class Representatives Selection in non-metric spaces for nearest prototype classification","authors":"Jaroslav Hlaváč ,&nbsp;Martin Kopp ,&nbsp;Tomáš Skopal","doi":"10.1016/j.is.2025.102564","DOIUrl":"10.1016/j.is.2025.102564","url":null,"abstract":"<div><div>The nearest prototype classification is a less computationally intensive replacement for the <span><math><mi>k</mi></math></span>-NN method, especially when large datasets are considered. Centroids are often used as prototypes to represent whole classes in metric spaces. Selection of class prototypes in non-metric spaces is more challenging as the idea of computing centroids is not directly applicable. Instead, a set of representative objects can be used as the class prototype.</div><div>This paper presents the Class Representatives Selection (CRS) method, a novel memory and computationally efficient method that finds a small yet representative set of objects from each class to be used as a prototype. CRS leverages the similarity graph representation of each class created by the NN-Descent algorithm to pick a low number of representatives that ensure sufficient class coverage. Thanks to the graph-based approach, CRS can be applied to any space where at least a pairwise similarity can be defined. In the experimental evaluation, we demonstrate that our method outperforms the state-of-the-art techniques on multiple datasets from different domains.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102564"},"PeriodicalIF":3.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143948792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Back to the Order: Partial orders in streaming conformance checking 回到顺序:流一致性检查中的部分顺序
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-05-10 DOI: 10.1016/j.is.2025.102566
Kristo Raun , Riccardo Tommasini , Ahmed Awad
{"title":"Back to the Order: Partial orders in streaming conformance checking","authors":"Kristo Raun ,&nbsp;Riccardo Tommasini ,&nbsp;Ahmed Awad","doi":"10.1016/j.is.2025.102566","DOIUrl":"10.1016/j.is.2025.102566","url":null,"abstract":"<div><div>Most organizations are built around their business processes. Commonly, these processes follow a predefined path. Deviations from the expected path can lead to lower quality products and services, reduced efficiencies, and compliance liabilities. Rapid identification of deviations helps mitigate such risks. For identifying deviations, the conformance checker would need to know the sequence in which events occurred. In this paper, we tackle two challenges associated with knowing the right sequence of events. First, we look at out-of-order event arrival, a common occurrence in modern information systems. Second, we extend the previous work by incorporating partial order handling. Partially ordered events are a well-studied problem in process mining, but to the best of our knowledge it has not been researched in terms of fast-paced streaming conformance checking. Real-life and semi-synthetic datasets are used for validating the proposed methods.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102566"},"PeriodicalIF":3.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143948791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Timeline-based process discovery 基于时间轴的流程发现
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-05-10 DOI: 10.1016/j.is.2025.102568
Christoffer Rubensson , Harleen Kaur , Timotheus Kampik , Jan Mendling
{"title":"Timeline-based process discovery","authors":"Christoffer Rubensson ,&nbsp;Harleen Kaur ,&nbsp;Timotheus Kampik ,&nbsp;Jan Mendling","doi":"10.1016/j.is.2025.102568","DOIUrl":"10.1016/j.is.2025.102568","url":null,"abstract":"<div><div>A key concern of automatic process discovery is providing insights into business process performance. Process analysts are specifically interested in waiting times and delays for identifying opportunities to speed up processes. Against this backdrop, it is surprising that current techniques for automatic process discovery generate directly-follows graphs and comparable process models without representing the time axis explicitly. This paper presents four layout strategies for automatically constructing process models that explicitly align with a time axis. We exemplify our approaches for directly-follows graphs. We evaluate their effectiveness by applying them to real-world event logs with varying complexities. Our specific focus is on their ability to handle the trade-off between high control-flow abstraction and high consistency of temporal activity order. Our results show that timeline-based layouts provide benefits in terms of an explicit representation of temporal distances. They face challenges for logs with many repeating and concurrent activities.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102568"},"PeriodicalIF":3.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stochastic conformance checking based on variable-length Markov chains 基于变长马尔可夫链的随机一致性检验
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-05-09 DOI: 10.1016/j.is.2025.102561
Emilio Incerto , Andrea Vandin , Sima Sarv Ahrabi
{"title":"Stochastic conformance checking based on variable-length Markov chains","authors":"Emilio Incerto ,&nbsp;Andrea Vandin ,&nbsp;Sima Sarv Ahrabi","doi":"10.1016/j.is.2025.102561","DOIUrl":"10.1016/j.is.2025.102561","url":null,"abstract":"<div><div>Conformance checking is central in process mining (PM). It studies deviations of logs from reference processes. Originally, the proposed approaches did not focus on stochastic aspects of the underlying process, and gave qualitative models as output. Recently, these have been extended in approaches for <em>stochastic conformance checking</em> (SCC), giving quantitative models as output. A different community, namely the <em>software performance engineering</em> (PE) one, interested in the synthesis of stochastic processes since decades, has developed independently techniques to synthesize Markov Chains (MC) that describe the stochastic process underlying program runs. However, these were never applied to SCC problems. We propose a novel approach to SCC based on PE results for the synthesis of stochastic processes. Thanks to a rich experimental evaluation, we show that it outperforms the state-of-the-art. In doing so, we further bridge PE and PM, fostering cross-fertilization. We use techniques for the synthesis of Variable-length MC (VLMC), higher-order MC able to compactly encode complex path dependencies in the control-flow. VLMCs are equipped with a notion of likelihood that a trace belongs to a model. We use it to perform SCC of a log against a model. We establish the degree of conformance by equipping VLMCs with uEMSC, a standard conformance measure in the SCC literature. We compare with 18 SCC techniques from the PM literature, using 11 benchmark datasets from the PM community. We outperform all approaches in 10 out of 11 datasets, i.e., we get uEMSC values closer to 1 for logs conforming to a model. Furthermore, we show that VLMC are efficient, as they handled all considered datasets in a few seconds.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102561"},"PeriodicalIF":3.0,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Multi-Faceted Visual Process Analytics 面向多面可视化过程分析
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-05-06 DOI: 10.1016/j.is.2025.102560
Stef van den Elzen , Mieke Jans , Niels Martin , Femke Pieters , Christian Tominski , Maria-Cruz Villa-Uriol , Sebastiaan J. van Zelst
{"title":"Towards Multi-Faceted Visual Process Analytics","authors":"Stef van den Elzen ,&nbsp;Mieke Jans ,&nbsp;Niels Martin ,&nbsp;Femke Pieters ,&nbsp;Christian Tominski ,&nbsp;Maria-Cruz Villa-Uriol ,&nbsp;Sebastiaan J. van Zelst","doi":"10.1016/j.is.2025.102560","DOIUrl":"10.1016/j.is.2025.102560","url":null,"abstract":"<div><div>Both the fields of Process Mining (PM) and Visual Analytics (VA) aim to make complex phenomena understandable. In PM, the goal is to gain insights into the execution of complex processes by analyzing the event data that is captured in event logs. This data is inherently multi-faceted, meaning that it covers various data facets, including spatial and temporal dependencies, relations between data entities (such as cases/events), and multivariate data attributes per entity. However, the multi-faceted nature of the data has not received much attention in PM. Conversely, VA research has investigated interactive visual methods for making multi-faceted data understandable for about two decades. In this study, we bring together PM and VA with the goal of advancing towards Visual Process Analytics (VPA) of multi-faceted processes. To this end, we present a systematic view of relevant (VA) data facets in the context of PM and assess to what extent existing PM visualizations address the data facets’ characteristics, making use of VA guidelines. In addition to visualizations, we look at how PM can benefit from analytical abstraction and interaction techniques known in the VA realm. Based on this, we discuss open challenges and opportunities for future research towards multi-faceted VPA.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102560"},"PeriodicalIF":3.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143934651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Support estimation in frequent itemsets mining on Enriched Two Level Tree 富二层树频繁项集挖掘中的支持度估计
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-05-06 DOI: 10.1016/j.is.2025.102559
Clémentin Tayou Djamegni , William Kery Branston Ndemaze , Edith Belise Kenmogne , Hervé Maradona Nana Kouassi , Arnauld Nzegha Fountsop , Idriss Tetakouchom , Laurent Cabrel Tabueu Fotso
{"title":"Support estimation in frequent itemsets mining on Enriched Two Level Tree","authors":"Clémentin Tayou Djamegni ,&nbsp;William Kery Branston Ndemaze ,&nbsp;Edith Belise Kenmogne ,&nbsp;Hervé Maradona Nana Kouassi ,&nbsp;Arnauld Nzegha Fountsop ,&nbsp;Idriss Tetakouchom ,&nbsp;Laurent Cabrel Tabueu Fotso","doi":"10.1016/j.is.2025.102559","DOIUrl":"10.1016/j.is.2025.102559","url":null,"abstract":"<div><div>Efficiently counting the support of candidate itemsets is a crucial aspect of extracting frequent itemsets because it directly impacts the overall performance of the mining process. Researchers have developed various techniques and data structures to overcome this challenge, but the problem is still open. In this paper, we investigate the two-level tree enrichment technique as a potential solution without adding significant computational overhead. In addition, we introduce ETL_Miner, a novel algorithm that provides an estimated bound for the support value of all candidate itemsets within the search space. The method presented in this article is flexible and can be used with various algorithms. To demonstrate this point, we introduce a modified version of Apriori that integrates ETL_Miner as an extra pruning phase. Preliminary empirical experimental results on both real and synthetic datasets confirm the accuracy of the proposed method and reduce the total extraction time.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102559"},"PeriodicalIF":3.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143934650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Substring compression variations and LZ78-Derivates 子串压缩变化和lz78派生
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-04-25 DOI: 10.1016/j.is.2025.102553
Dominik Köppl
{"title":"Substring compression variations and LZ78-Derivates","authors":"Dominik Köppl","doi":"10.1016/j.is.2025.102553","DOIUrl":"10.1016/j.is.2025.102553","url":null,"abstract":"<div><div>We propose algorithms computing the semi-greedy Lempel–Ziv 78 (LZ78), the Lempel–Ziv Double (LZD), and the Lempel–Ziv–Miller–Wegman (LZMW) factorizations in linear time for integer alphabets. For LZD and LZMW, we additionally propose data structures that can be constructed in linear time, which can solve the substring compression problems for these factorizations in time linear in the output size. For substring compression, we give the first results for lexparse and closed factorizations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102553"},"PeriodicalIF":3.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143906791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to resolve inconsistencies in qualitative constraint networks 学习解决定性约束网络中的不一致性
IF 3 2区 计算机科学
Information Systems Pub Date : 2025-04-18 DOI: 10.1016/j.is.2025.102557
Anastasia Paparrizou, Michael Sioutis
{"title":"Learning to resolve inconsistencies in qualitative constraint networks","authors":"Anastasia Paparrizou,&nbsp;Michael Sioutis","doi":"10.1016/j.is.2025.102557","DOIUrl":"10.1016/j.is.2025.102557","url":null,"abstract":"<div><div>In this paper, we present a reinforcement learning approach for resolving inconsistencies in qualitative constraint networks (<span><math><mi>QCN</mi></math></span>s). <span><math><mi>QCN</mi></math></span>s are typically used in constraint programming to represent and reason about intuitive spatial or temporal relations like <em>x</em> {<em>is inside of</em> <span><math><mo>∨</mo></math></span> <em>overlaps</em>} <em>y</em>. Naturally, <span><math><mi>QCN</mi></math></span>s are not immune to uncertainty, noise, or imperfect data that may be present in information, and thus, more often than not, they are hampered by inconsistencies. We propose a multi-armed bandit approach that defines a well-suited ordering of constraints for finding a maximal satisfiable subset of them. Specifically, our learning approach interacts with a solver, and after each trial a reward is returned to measure the performance of the selected action (constraint addition). The reward function is based on the reduction of the solution space of a consistent reconstruction of the input <span><math><mi>QCN</mi></math></span>. Experimental results with different bandit policies and various rewards that are obtained by our algorithm suggest that we can do better than the state of the art in terms of both effectiveness, viz., lower number of repairs obtained for an inconsistent <span><math><mi>QCN</mi></math></span>, and efficiency, viz., faster runtime.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102557"},"PeriodicalIF":3.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信