arXiv - CS - Databases最新文献

筛选
英文 中文
EHL*: Memory-Budgeted Indexing for Ultrafast Optimal Euclidean Pathfinding EHL*:超快最优欧氏寻路的内存预算索引
arXiv - CS - Databases Pub Date : 2024-08-21 DOI: arxiv-2408.11341
Jinchun Du, Bojie Shen, Muhammad Aamir Cheema
{"title":"EHL*: Memory-Budgeted Indexing for Ultrafast Optimal Euclidean Pathfinding","authors":"Jinchun Du, Bojie Shen, Muhammad Aamir Cheema","doi":"arxiv-2408.11341","DOIUrl":"https://doi.org/arxiv-2408.11341","url":null,"abstract":"The Euclidean Shortest Path Problem (ESPP), which involves finding the\u0000shortest path in a Euclidean plane with polygonal obstacles, is a classic\u0000problem with numerous real-world applications. The current state-of-the-art\u0000solution, Euclidean Hub Labeling (EHL), offers ultra-fast query performance,\u0000outperforming existing techniques by 1-2 orders of magnitude in runtime\u0000efficiency. However, this performance comes at the cost of significant memory\u0000overhead, requiring up to tens of gigabytes of storage on large maps, which can\u0000limit its applicability in memory-constrained environments like mobile phones\u0000or smaller devices. Additionally, EHL's memory usage can only be determined\u0000after index construction, and while it provides a memory-runtime tradeoff, it\u0000does not fully optimize memory utilization. In this work, we introduce an\u0000improved version of EHL, called EHL*, which overcomes these limitations. A key\u0000contribution of EHL* is its ability to create an index that adheres to a\u0000specified memory budget while optimizing query runtime performance. Moreover,\u0000EHL* can leverage preknown query distributions, a common scenario in many\u0000real-world applications to further enhance runtime efficiency. Our results show\u0000that EHL* can reduce memory usage by up to 10-20 times without much impact on\u0000query runtime performance compared to EHL, making it a highly effective\u0000solution for optimal pathfinding in memory-constrained environments.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving Data Management using Blockchains 使用区块链进行隐私保护数据管理
arXiv - CS - Databases Pub Date : 2024-08-21 DOI: arxiv-2408.11263
Michael Mireku Kwakye
{"title":"Privacy-Preserving Data Management using Blockchains","authors":"Michael Mireku Kwakye","doi":"arxiv-2408.11263","DOIUrl":"https://doi.org/arxiv-2408.11263","url":null,"abstract":"Privacy-preservation policies are guidelines formulated to protect data\u0000providers private data. Previous privacy-preservation methodologies have\u0000addressed privacy in which data are permanently stored in repositories and\u0000disconnected from changing data provider privacy preferences. This occurrence\u0000becomes evident as data moves to another data repository. Hence, the need for\u0000data providers to control and flexibly update their existing privacy\u0000preferences due to changing data usage continues to remain a problem. This\u0000paper proposes a blockchain-based methodology for preserving data providers\u0000private and sensitive data. The research proposes to tightly couple data\u0000providers private attribute data element to privacy preferences and data\u0000accessor data element into a privacy tuple. The implementation presents a\u0000framework of tightly-coupled relational database and blockchains. This delivers\u0000secure, tamper-resistant, and query-efficient platform for data management and\u0000query processing. The evaluation analysis from the implementation validates\u0000efficient query processing of privacy-aware queries on the privacy\u0000infrastructure.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Story Behind the Lines: Line Charts as a Gateway to Dataset Discovery 线条背后的故事折线图是发现数据集的入口
arXiv - CS - Databases Pub Date : 2024-08-18 DOI: arxiv-2408.09506
Daomin Ji, Hui Luo, Zhifeng Bao, J. Shane Culpepper
{"title":"The Story Behind the Lines: Line Charts as a Gateway to Dataset Discovery","authors":"Daomin Ji, Hui Luo, Zhifeng Bao, J. Shane Culpepper","doi":"arxiv-2408.09506","DOIUrl":"https://doi.org/arxiv-2408.09506","url":null,"abstract":"Line charts are a valuable tool for data analysis and exploration, distilling\u0000essential insights from a dataset. However, access to the underlying dataset\u0000behind a line chart is rarely readily available. In this paper, we explore a\u0000novel dataset discovery problem, dataset discovery via line charts, focusing on\u0000the use of line charts as queries to discover datasets within a large data\u0000repository that are capable of generating similar line charts. To solve this\u0000problem, we propose a novel approach called Fine-grained Cross-modal Relevance\u0000Learning Model (FCM), which aims to estimate the relevance between a line chart\u0000and a candidate dataset. To achieve this goal, FCM first employs a visual\u0000element extractor to extract informative visual elements, i.e., lines and\u0000y-ticks, from a line chart. Then, two novel segment-level encoders are adopted\u0000to learn representations for a line chart and a dataset, preserving\u0000fine-grained information, followed by a cross-modal matcher to match the\u0000learned representations in a fine-grained way. Furthermore, we extend FCM to\u0000support line chart queries generated based on data aggregation. Last, we\u0000propose a benchmark tailored for this problem since no such dataset exists.\u0000Extensive evaluation on the new benchmark verifies the effectiveness of our\u0000proposed method. Specifically, our proposed approach surpasses the best\u0000baseline by 30.1% and 41.0% in terms of prec@50 and ndcg@50, respectively.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The temporal conceptual data modelling language TREND 时态概念数据模型语言 TREND
arXiv - CS - Databases Pub Date : 2024-08-18 DOI: arxiv-2408.09427
Sonia Berman, C. Maria Keet, Tamindran Shunmugam
{"title":"The temporal conceptual data modelling language TREND","authors":"Sonia Berman, C. Maria Keet, Tamindran Shunmugam","doi":"arxiv-2408.09427","DOIUrl":"https://doi.org/arxiv-2408.09427","url":null,"abstract":"Temporal conceptual data modelling, as an extension to regular conceptual\u0000data modelling languages such as EER and UML class diagrams, has received\u0000intermittent attention across the decades. It is receiving renewed interest in\u0000the context of, among others, business process modelling that needs robust\u0000expressive data models to complement them. None of the proposed temporal\u0000conceptual data modelling languages have been tested on understandability and\u0000usability by modellers, however, nor is it clear which temporal constraints\u0000would be used by modellers or whether the ones included are the relevant\u0000temporal constraints. We therefore sought to investigate temporal\u0000representations in temporal conceptual data modelling languages, design a, to\u0000date, most expressive language, TREND, through small-scale qualitative\u0000experiments, and finalise the graphical notation and modelling and\u0000understanding in large scale experiments. This involved a series of 11\u0000experiments with over a thousand participants in total, having created 246\u0000temporal conceptual data models. Key outcomes are that choice of label for\u0000transition constraints had limited impact, as did extending explanations of the\u0000modelling language, but expressing what needs to be modelled in controlled\u0000natural language did improve model quality. The experiments also indicate that\u0000more training may be needed, in particular guidance for domain experts, to\u0000achieve adoption of temporal conceptual data modelling by the community.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NFDI4DSO: Towards a BFO Compliant Ontology for Data Science NFDI4DSO:建立符合 BFO 标准的数据科学本体论
arXiv - CS - Databases Pub Date : 2024-08-16 DOI: arxiv-2408.08698
Genet Asefa Gesese, Jörg Waitelonis, Zongxiong Chen, Sonja Schimmler, Harald Sack
{"title":"NFDI4DSO: Towards a BFO Compliant Ontology for Data Science","authors":"Genet Asefa Gesese, Jörg Waitelonis, Zongxiong Chen, Sonja Schimmler, Harald Sack","doi":"arxiv-2408.08698","DOIUrl":"https://doi.org/arxiv-2408.08698","url":null,"abstract":"The NFDI4DataScience (NFDI4DS) project aims to enhance the accessibility and\u0000interoperability of research data within Data Science (DS) and Artificial\u0000Intelligence (AI) by connecting digital artifacts and ensuring they adhere to\u0000FAIR (Findable, Accessible, Interoperable, and Reusable) principles. To this\u0000end, this poster introduces the NFDI4DS Ontology, which describes resources in\u0000DS and AI and models the structure of the NFDI4DS consortium. Built upon the\u0000NFDICore ontology and mapped to the Basic Formal Ontology (BFO), this ontology\u0000serves as the foundation for the NFDI4DS knowledge graph currently under\u0000development.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The (Elementary) Mathematical Data Model Revisited 再论(初级)数学数据模型
arXiv - CS - Databases Pub Date : 2024-08-15 DOI: arxiv-2408.08367
Christian Mancas
{"title":"The (Elementary) Mathematical Data Model Revisited","authors":"Christian Mancas","doi":"arxiv-2408.08367","DOIUrl":"https://doi.org/arxiv-2408.08367","url":null,"abstract":"This paper presents the current version of our (Elementary) Mathematical Data\u0000Model ((E)MDM), which is based on the na\"ive theory of sets, relations, and\u0000functions, as well as on the first-order predicate calculus with equality. Many\u0000real-life examples illustrate its 4 types of sets, 4 types of functions, and 76\u0000types of constraints. This rich panoply of constraints is the main strength of\u0000this model, guaranteeing that any data value stored in a database is plausible,\u0000which is the highest possible level of syntactical data quality. A (E)MDM\u0000example scheme is presented and contrasted with some popular family tree\u0000software products. The paper also presents the main (E)MDM related approaches\u0000in data modeling and processing.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization DataVisT5:用于联合理解文本和数据可视化的预训练语言模型
arXiv - CS - Databases Pub Date : 2024-08-14 DOI: arxiv-2408.07401
Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong
{"title":"DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization","authors":"Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong","doi":"arxiv-2408.07401","DOIUrl":"https://doi.org/arxiv-2408.07401","url":null,"abstract":"Data visualization (DV) is the fundamental and premise tool to improve the\u0000efficiency in conveying the insights behind the big data, which has been widely\u0000accepted in existing data-driven world. Task automation in DV, such as\u0000converting natural language queries to visualizations (i.e., text-to-vis),\u0000generating explanations from visualizations (i.e., vis-to-text), answering\u0000DV-related questions in free form (i.e. FeVisQA), and explicating tabular data\u0000(i.e., table-to-text), is vital for advancing the field. Despite their\u0000potential, the application of pre-trained language models (PLMs) like T5 and\u0000BERT in DV has been limited by high costs and challenges in handling\u0000cross-modal information, leading to few studies on PLMs for DV. We introduce\u0000textbf{DataVisT5}, a novel PLM tailored for DV that enhances the T5\u0000architecture through a hybrid objective pre-training and multi-task fine-tuning\u0000strategy, integrating text and DV datasets to effectively interpret cross-modal\u0000semantics. Extensive evaluations on public datasets show that DataVisT5\u0000consistently outperforms current state-of-the-art models on various DV-related\u0000tasks. We anticipate that DataVisT5 will not only inspire further research on\u0000vertical PLMs but also expand the range of applications for PLMs.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"440 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QirK: Question Answering via Intermediate Representation on Knowledge Graphs QirK:通过知识图谱上的中间表示进行问题解答
arXiv - CS - Databases Pub Date : 2024-08-14 DOI: arxiv-2408.07494
Jan Luca Scheerer, Anton Lykov, Moe Kayali, Ilias Fountalis, Dan Olteanu, Nikolaos Vasiloglou, Dan Suciu
{"title":"QirK: Question Answering via Intermediate Representation on Knowledge Graphs","authors":"Jan Luca Scheerer, Anton Lykov, Moe Kayali, Ilias Fountalis, Dan Olteanu, Nikolaos Vasiloglou, Dan Suciu","doi":"arxiv-2408.07494","DOIUrl":"https://doi.org/arxiv-2408.07494","url":null,"abstract":"We demonstrate QirK, a system for answering natural language questions on\u0000Knowledge Graphs (KG). QirK can answer structurally complex questions that are\u0000still beyond the reach of emerging Large Language Models (LLMs). It does so\u0000using a unique combination of database technology, LLMs, and semantic search\u0000over vector embeddings. The glue for these components is an intermediate\u0000representation (IR). The input question is mapped to IR using LLMs, which is\u0000then repaired into a valid relational database query with the aid of a semantic\u0000search on vector embeddings. This allows a practical synthesis of LLM\u0000capabilities and KG reliability. A short video demonstrating QirK is available at\u0000https://youtu.be/6c81BLmOZ0U.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Re-Thinking Process Mining in the AI-Based Agents Era 重新思考人工智能代理时代的流程挖掘
arXiv - CS - Databases Pub Date : 2024-08-14 DOI: arxiv-2408.07720
Alessandro Berti, Mayssa Maatallah, Urszula Jessen, Michal Sroka, Sonia Ayachi Ghannouchi
{"title":"Re-Thinking Process Mining in the AI-Based Agents Era","authors":"Alessandro Berti, Mayssa Maatallah, Urszula Jessen, Michal Sroka, Sonia Ayachi Ghannouchi","doi":"arxiv-2408.07720","DOIUrl":"https://doi.org/arxiv-2408.07720","url":null,"abstract":"Large Language Models (LLMs) have emerged as powerful conversational\u0000interfaces, and their application in process mining (PM) tasks has shown\u0000promising results. However, state-of-the-art LLMs struggle with complex\u0000scenarios that demand advanced reasoning capabilities. In the literature, two\u0000primary approaches have been proposed for implementing PM using LLMs: providing\u0000textual insights based on a textual abstraction of the process mining artifact,\u0000and generating code executable on the original artifact. This paper proposes\u0000utilizing the AI-Based Agents Workflow (AgWf) paradigm to enhance the\u0000effectiveness of PM on LLMs. This approach allows for: i) the decomposition of\u0000complex tasks into simpler workflows, and ii) the integration of deterministic\u0000tools with the domain knowledge of LLMs. We examine various implementations of\u0000AgWf and the types of AI-based tasks involved. Additionally, we discuss the\u0000CrewAI implementation framework and present examples related to process mining.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASPEN: ASP-Based System for Collective Entity Resolution ASPEN:基于 ASP 的集体实体解决系统
arXiv - CS - Databases Pub Date : 2024-08-13 DOI: arxiv-2408.06961
Zhiliang Xiang, Meghyn Bienvenu, Gianluca Cima, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García
{"title":"ASPEN: ASP-Based System for Collective Entity Resolution","authors":"Zhiliang Xiang, Meghyn Bienvenu, Gianluca Cima, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García","doi":"arxiv-2408.06961","DOIUrl":"https://doi.org/arxiv-2408.06961","url":null,"abstract":"In this paper, we present ASPEN, an answer set programming (ASP)\u0000implementation of a recently proposed declarative framework for collective\u0000entity resolution (ER). While an ASP encoding had been previously suggested,\u0000several practical issues had been neglected, most notably, the question of how\u0000to efficiently compute the (externally defined) similarity facts that are used\u0000in rule bodies. This leads us to propose new variants of the encodings\u0000(including Datalog approximations) and show how to employ different\u0000functionalities of ASP solvers to compute (maximal) solutions, and\u0000(approximations of) the sets of possible and certain merges. A comprehensive\u0000experimental evaluation of ASPEN on real-world datasets shows that the approach\u0000is promising, achieving high accuracy in real-life ER scenarios. Our\u0000experiments also yield useful insights into the relative merits of different\u0000types of (approximate) ER solutions, the impact of recursion, and factors\u0000influencing performance.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信