arXiv - CS - Databases最新文献_第6页

EHL*: Memory-Budgeted Indexing for Ultrafast Optimal Euclidean Pathfinding EHL*：超快最优欧氏寻路的内存预算索引

arXiv - CS - Databases Pub Date : 2024-08-21 DOI: arxiv-2408.11341

Jinchun Du, Bojie Shen, Muhammad Aamir Cheema

{"title":"EHL*: Memory-Budgeted Indexing for Ultrafast Optimal Euclidean Pathfinding","authors":"Jinchun Du, Bojie Shen, Muhammad Aamir Cheema","doi":"arxiv-2408.11341","DOIUrl":"https://doi.org/arxiv-2408.11341","url":null,"abstract":"The Euclidean Shortest Path Problem (ESPP), which involves finding the\u0000shortest path in a Euclidean plane with polygonal obstacles, is a classic\u0000problem with numerous real-world applications. The current state-of-the-art\u0000solution, Euclidean Hub Labeling (EHL), offers ultra-fast query performance,\u0000outperforming existing techniques by 1-2 orders of magnitude in runtime\u0000efficiency. However, this performance comes at the cost of significant memory\u0000overhead, requiring up to tens of gigabytes of storage on large maps, which can\u0000limit its applicability in memory-constrained environments like mobile phones\u0000or smaller devices. Additionally, EHL's memory usage can only be determined\u0000after index construction, and while it provides a memory-runtime tradeoff, it\u0000does not fully optimize memory utilization. In this work, we introduce an\u0000improved version of EHL, called EHL*, which overcomes these limitations. A key\u0000contribution of EHL* is its ability to create an index that adheres to a\u0000specified memory budget while optimizing query runtime performance. Moreover,\u0000EHL* can leverage preknown query distributions, a common scenario in many\u0000real-world applications to further enhance runtime efficiency. Our results show\u0000that EHL* can reduce memory usage by up to 10-20 times without much impact on\u0000query runtime performance compared to EHL, making it a highly effective\u0000solution for optimal pathfinding in memory-constrained environments.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Privacy-Preserving Data Management using Blockchains 使用区块链进行隐私保护数据管理

arXiv - CS - Databases Pub Date : 2024-08-21 DOI: arxiv-2408.11263

Michael Mireku Kwakye

引用次数: 0

The Story Behind the Lines: Line Charts as a Gateway to Dataset Discovery 线条背后的故事折线图是发现数据集的入口

arXiv - CS - Databases Pub Date : 2024-08-18 DOI: arxiv-2408.09506

Daomin Ji, Hui Luo, Zhifeng Bao, J. Shane Culpepper

{"title":"The Story Behind the Lines: Line Charts as a Gateway to Dataset Discovery","authors":"Daomin Ji, Hui Luo, Zhifeng Bao, J. Shane Culpepper","doi":"arxiv-2408.09506","DOIUrl":"https://doi.org/arxiv-2408.09506","url":null,"abstract":"Line charts are a valuable tool for data analysis and exploration, distilling\u0000essential insights from a dataset. However, access to the underlying dataset\u0000behind a line chart is rarely readily available. In this paper, we explore a\u0000novel dataset discovery problem, dataset discovery via line charts, focusing on\u0000the use of line charts as queries to discover datasets within a large data\u0000repository that are capable of generating similar line charts. To solve this\u0000problem, we propose a novel approach called Fine-grained Cross-modal Relevance\u0000Learning Model (FCM), which aims to estimate the relevance between a line chart\u0000and a candidate dataset. To achieve this goal, FCM first employs a visual\u0000element extractor to extract informative visual elements, i.e., lines and\u0000y-ticks, from a line chart. Then, two novel segment-level encoders are adopted\u0000to learn representations for a line chart and a dataset, preserving\u0000fine-grained information, followed by a cross-modal matcher to match the\u0000learned representations in a fine-grained way. Furthermore, we extend FCM to\u0000support line chart queries generated based on data aggregation. Last, we\u0000propose a benchmark tailored for this problem since no such dataset exists.\u0000Extensive evaluation on the new benchmark verifies the effectiveness of our\u0000proposed method. Specifically, our proposed approach surpasses the best\u0000baseline by 30.1% and 41.0% in terms of prec@50 and ndcg@50, respectively.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The temporal conceptual data modelling language TREND 时态概念数据模型语言 TREND

arXiv - CS - Databases Pub Date : 2024-08-18 DOI: arxiv-2408.09427

Sonia Berman, C. Maria Keet, Tamindran Shunmugam

{"title":"The temporal conceptual data modelling language TREND","authors":"Sonia Berman, C. Maria Keet, Tamindran Shunmugam","doi":"arxiv-2408.09427","DOIUrl":"https://doi.org/arxiv-2408.09427","url":null,"abstract":"Temporal conceptual data modelling, as an extension to regular conceptual\u0000data modelling languages such as EER and UML class diagrams, has received\u0000intermittent attention across the decades. It is receiving renewed interest in\u0000the context of, among others, business process modelling that needs robust\u0000expressive data models to complement them. None of the proposed temporal\u0000conceptual data modelling languages have been tested on understandability and\u0000usability by modellers, however, nor is it clear which temporal constraints\u0000would be used by modellers or whether the ones included are the relevant\u0000temporal constraints. We therefore sought to investigate temporal\u0000representations in temporal conceptual data modelling languages, design a, to\u0000date, most expressive language, TREND, through small-scale qualitative\u0000experiments, and finalise the graphical notation and modelling and\u0000understanding in large scale experiments. This involved a series of 11\u0000experiments with over a thousand participants in total, having created 246\u0000temporal conceptual data models. Key outcomes are that choice of label for\u0000transition constraints had limited impact, as did extending explanations of the\u0000modelling language, but expressing what needs to be modelled in controlled\u0000natural language did improve model quality. The experiments also indicate that\u0000more training may be needed, in particular guidance for domain experts, to\u0000achieve adoption of temporal conceptual data modelling by the community.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NFDI4DSO: Towards a BFO Compliant Ontology for Data Science NFDI4DSO：建立符合 BFO 标准的数据科学本体论

arXiv - CS - Databases Pub Date : 2024-08-16 DOI: arxiv-2408.08698

Genet Asefa Gesese, Jörg Waitelonis, Zongxiong Chen, Sonja Schimmler, Harald Sack

引用次数: 0

The (Elementary) Mathematical Data Model Revisited 再论（初级）数学数据模型

arXiv - CS - Databases Pub Date : 2024-08-15 DOI: arxiv-2408.08367

Christian Mancas

引用次数: 0

DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization DataVisT5：用于联合理解文本和数据可视化的预训练语言模型

arXiv - CS - Databases Pub Date : 2024-08-14 DOI: arxiv-2408.07401

Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong

{"title":"DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization","authors":"Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong","doi":"arxiv-2408.07401","DOIUrl":"https://doi.org/arxiv-2408.07401","url":null,"abstract":"Data visualization (DV) is the fundamental and premise tool to improve the\u0000efficiency in conveying the insights behind the big data, which has been widely\u0000accepted in existing data-driven world. Task automation in DV, such as\u0000converting natural language queries to visualizations (i.e., text-to-vis),\u0000generating explanations from visualizations (i.e., vis-to-text), answering\u0000DV-related questions in free form (i.e. FeVisQA), and explicating tabular data\u0000(i.e., table-to-text), is vital for advancing the field. Despite their\u0000potential, the application of pre-trained language models (PLMs) like T5 and\u0000BERT in DV has been limited by high costs and challenges in handling\u0000cross-modal information, leading to few studies on PLMs for DV. We introduce\u0000textbf{DataVisT5}, a novel PLM tailored for DV that enhances the T5\u0000architecture through a hybrid objective pre-training and multi-task fine-tuning\u0000strategy, integrating text and DV datasets to effectively interpret cross-modal\u0000semantics. Extensive evaluations on public datasets show that DataVisT5\u0000consistently outperforms current state-of-the-art models on various DV-related\u0000tasks. We anticipate that DataVisT5 will not only inspire further research on\u0000vertical PLMs but also expand the range of applications for PLMs.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"440 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QirK: Question Answering via Intermediate Representation on Knowledge Graphs QirK：通过知识图谱上的中间表示进行问题解答

arXiv - CS - Databases Pub Date : 2024-08-14 DOI: arxiv-2408.07494

Jan Luca Scheerer, Anton Lykov, Moe Kayali, Ilias Fountalis, Dan Olteanu, Nikolaos Vasiloglou, Dan Suciu

引用次数: 0

Re-Thinking Process Mining in the AI-Based Agents Era 重新思考人工智能代理时代的流程挖掘

arXiv - CS - Databases Pub Date : 2024-08-14 DOI: arxiv-2408.07720

Alessandro Berti, Mayssa Maatallah, Urszula Jessen, Michal Sroka, Sonia Ayachi Ghannouchi

引用次数: 0

ASPEN: ASP-Based System for Collective Entity Resolution ASPEN：基于 ASP 的集体实体解决系统

arXiv - CS - Databases Pub Date : 2024-08-13 DOI: arxiv-2408.06961

Zhiliang Xiang, Meghyn Bienvenu, Gianluca Cima, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García

引用次数: 0