arXiv - CS - Databases最新文献

筛选
英文 中文
MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset MEVDT:基于事件的多模式车辆检测与跟踪数据集
arXiv - CS - Databases Pub Date : 2024-07-29 DOI: arxiv-2407.20446
Zaid A. El Shair, Samir A. Rawashdeh
{"title":"MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset","authors":"Zaid A. El Shair, Samir A. Rawashdeh","doi":"arxiv-2407.20446","DOIUrl":"https://doi.org/arxiv-2407.20446","url":null,"abstract":"In this data article, we introduce the Multi-Modal Event-based Vehicle\u0000Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized\u0000stream of event data and grayscale images of traffic scenes, captured using the\u0000Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera.\u0000MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M\u0000events, 10k object labels, and 85 unique object tracking trajectories.\u0000Additionally, MEVDT includes manually annotated ground truth labels\u0000$unicode{x2014}$ consisting of object classifications, pixel-precise bounding\u0000boxes, and unique object IDs $unicode{x2014}$ which are provided at a labeling\u0000frequency of 24 Hz. Designed to advance the research in the domain of\u0000event-based vision, MEVDT aims to address the critical need for high-quality,\u0000real-world annotated datasets that enable the development and evaluation of\u0000object detection and tracking algorithms in automotive environments.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shapley Value Computation in Ontology-Mediated Query Answering 本体中介查询回答中的夏普利值计算
arXiv - CS - Databases Pub Date : 2024-07-29 DOI: arxiv-2407.20058
Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade
{"title":"Shapley Value Computation in Ontology-Mediated Query Answering","authors":"Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade","doi":"arxiv-2407.20058","DOIUrl":"https://doi.org/arxiv-2407.20058","url":null,"abstract":"The Shapley value, originally introduced in cooperative game theory for\u0000wealth distribution, has found use in KR and databases for the purpose of\u0000assigning scores to formulas and database tuples based upon their contribution\u0000to obtaining a query result or inconsistency. In the present paper, we explore\u0000the use of Shapley values in ontology-mediated query answering (OMQA) and\u0000present a detailed complexity analysis of Shapley value computation (SVC) in\u0000the OMQA setting. In particular, we establish a PF/#P-hard dichotomy for SVC\u0000for ontology-mediated queries (T,q) composed of an ontology T formulated in the\u0000description logic ELHI_bot and a connected constant-free homomorphism-closed\u0000query q. We further show that the #P-hardness side of the dichotomy can be\u0000strengthened to cover possibly disconnected queries with constants. Our results\u0000exploit recently discovered connections between SVC and probabilistic query\u0000evaluation and allow us to generalize existing results on probabilistic OMQA.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload 利用复杂的 SQL 工作负载评估文本到 SQL 生成的 LLM
arXiv - CS - Databases Pub Date : 2024-07-28 DOI: arxiv-2407.19517
Limin Ma, Ken Pu, Ying Zhu
{"title":"Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload","authors":"Limin Ma, Ken Pu, Ying Zhu","doi":"arxiv-2407.19517","DOIUrl":"https://doi.org/arxiv-2407.19517","url":null,"abstract":"This study presents a comparative analysis of the a complex SQL benchmark,\u0000TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings\u0000reveal that TPC-DS queries exhibit a significantly higher level of structural\u0000complexity compared to the other two benchmarks. This underscores the need for\u0000more intricate benchmarks to simulate realistic scenarios effectively. To\u0000facilitate this comparison, we devised several measures of structural\u0000complexity and applied them across all three benchmarks. The results of this\u0000study can guide future research in the development of more sophisticated\u0000text-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based\u0000on the query descriptions provided by the TPC-DS benchmark. The prompt\u0000engineering process incorporated both the query description as outlined in the\u0000TPC-DS specification and the database schema of TPC-DS. Our findings indicate\u0000that the current state-of-the-art generative AI models fall short in generating\u0000accurate decision-making queries. We conducted a comparison of the generated\u0000queries with the TPC-DS gold standard queries using a series of fuzzy structure\u0000matching techniques based on query features. The results demonstrated that the\u0000accuracy of the generated queries is insufficient for practical real-world\u0000application.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Turning Multidimensional Big Data Analytics into Practice: Design and Implementation of ClustCube Big-Data Tools in Real-Life Scenarios 将多维大数据分析付诸实践:在实际生活场景中设计和实施 ClustCube 大数据工具
arXiv - CS - Databases Pub Date : 2024-07-26 DOI: arxiv-2407.18604
Alfredo Cuzzocrea, Abderraouf Hafsaoui, Ismail Benlaredj
{"title":"Turning Multidimensional Big Data Analytics into Practice: Design and Implementation of ClustCube Big-Data Tools in Real-Life Scenarios","authors":"Alfredo Cuzzocrea, Abderraouf Hafsaoui, Ismail Benlaredj","doi":"arxiv-2407.18604","DOIUrl":"https://doi.org/arxiv-2407.18604","url":null,"abstract":"Multidimensional Big Data Analytics is an emerging area that marries the\u0000capabilities of OLAP with modern Big Data Analytics. Essentially, the idea is\u0000engrafting multidimensional models into Big Data analytics processes to gain\u0000into expressive power of the overall discovery task. ClustCube is a\u0000state-of-the-art model that combines OLAP and Clustering, thus delving into\u0000practical and well-understood advantages in the context of real-life\u0000applications and systems. In this paper, we show how ClustCube can effectively\u0000and efficiently realizing nice tools for supporting Multidimensional Big Data\u0000Analytics, and assess these tools in the context of real-life research\u0000projects.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards A More Reasonable Semantic Web 迈向更合理的语义网
arXiv - CS - Databases Pub Date : 2024-07-26 DOI: arxiv-2407.19095
Vleer Doing, Ryan Wisnesky
{"title":"Towards A More Reasonable Semantic Web","authors":"Vleer Doing, Ryan Wisnesky","doi":"arxiv-2407.19095","DOIUrl":"https://doi.org/arxiv-2407.19095","url":null,"abstract":"We aim to accelerate the original vision of the semantic web by revisiting\u0000design decisions that have defined the semantic web up until now. We propose a\u0000shift in direction that more broadly embraces existing data infrastructure by\u0000reconsidering the semantic web's logical foundations. We argue to shift\u0000attention away from description logic, which has so far underpinned the\u0000semantic web, to a different fragment of first-order logic. We argue, using\u0000examples from the (geo)spatial domain, that by doing so, the semantic web can\u0000be approached as a traditional data migration and integration problem at a\u0000massive scale. That way, a huge amount of existing tools and theories can be\u0000deployed to the semantic web's benefit, and the original vision of ontology as\u0000shared abstraction be reinvigorated.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial Adaptive Indexing for Approximate Query Answering 用于近似查询回答的部分自适应索引
arXiv - CS - Databases Pub Date : 2024-07-26 DOI: arxiv-2407.18702
Stavros Maroulis, Nikos Bikakis, Vassilis Stamatopoulos, George Papastefanatos
{"title":"Partial Adaptive Indexing for Approximate Query Answering","authors":"Stavros Maroulis, Nikos Bikakis, Vassilis Stamatopoulos, George Papastefanatos","doi":"arxiv-2407.18702","DOIUrl":"https://doi.org/arxiv-2407.18702","url":null,"abstract":"In data exploration, users need to analyze large data files quickly, aiming\u0000to minimize data-to-analysis time. While recent adaptive indexing approaches\u0000address this need, they are cases where demonstrate poor performance.\u0000Particularly, during the initial queries, in regions with a high density of\u0000objects, and in very large files over commodity hardware. This work introduces\u0000an approach for adaptive indexing driven by both query workload and\u0000user-defined accuracy constraints to support approximate query answering. The\u0000approach is based on partial index adaptation which reduces the costs\u0000associated with reading data files and refining indexes. We leverage a\u0000hierarchical tile-based indexing scheme and its stored metadata to provide\u0000efficient query evaluation, ensuring accuracy within user-specified bounds. Our\u0000preliminary evaluation demonstrates improvement on query evaluation time,\u0000especially during initial user exploration.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey of open-source data quality tools: shedding light on the materialization of data quality dimensions in practice 开源数据质量工具调查:揭示数据质量在实践中的具体体现
arXiv - CS - Databases Pub Date : 2024-07-26 DOI: arxiv-2407.18649
Vasileios Papastergios, Anastasios Gounaris
{"title":"A survey of open-source data quality tools: shedding light on the materialization of data quality dimensions in practice","authors":"Vasileios Papastergios, Anastasios Gounaris","doi":"arxiv-2407.18649","DOIUrl":"https://doi.org/arxiv-2407.18649","url":null,"abstract":"Data Quality (DQ) describes the degree to which data characteristics meet\u0000requirements and are fit for use by humans and/or systems. There are several\u0000aspects in which DQ can be measured, called DQ dimensions (i.e. accuracy,\u0000completeness, consistency, etc.), also referred to as characteristics in\u0000literature. ISO/IEC 25012 Standard defines a data quality model with fifteen\u0000such dimensions, setting the requirements a data product should meet. In this\u0000short report, we aim to bridge the gap between lower-level functionalities\u0000offered by DQ tools and higher-level dimensions in a systematic manner,\u0000revealing the many-to-many relationships between them. To this end, we examine\u00006 open-source DQ tools and we emphasize on providing a mapping between the\u0000functionalities they offer and the DQ dimensions, as defined by the ISO\u0000standard. Wherever applicable, we also provide insights into the software\u0000engineering details that tools leverage, in order to address DQ challenges.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Privacy Bound for Shuffle Model with Personalized Privacy 增强洗牌模型的个性化隐私约束
arXiv - CS - Databases Pub Date : 2024-07-25 DOI: arxiv-2407.18157
Yixuan Liu, Yuhan Liu, Li Xiong, Yujie Gu, Hong Chen
{"title":"Enhanced Privacy Bound for Shuffle Model with Personalized Privacy","authors":"Yixuan Liu, Yuhan Liu, Li Xiong, Yujie Gu, Hong Chen","doi":"arxiv-2407.18157","DOIUrl":"https://doi.org/arxiv-2407.18157","url":null,"abstract":"The shuffle model of Differential Privacy (DP) is an enhanced privacy\u0000protocol which introduces an intermediate trusted server between local users\u0000and a central data curator. It significantly amplifies the central DP guarantee\u0000by anonymizing and shuffling the local randomized data. Yet, deriving a tight\u0000privacy bound is challenging due to its complicated randomization protocol.\u0000While most existing work are focused on unified local privacy settings, this\u0000work focuses on deriving the central privacy bound for a more practical setting\u0000where personalized local privacy is required by each user. To bound the privacy\u0000after shuffling, we first need to capture the probability of each user\u0000generating clones of the neighboring data points. Second, we need to quantify\u0000the indistinguishability between two distributions of the number of clones on\u0000neighboring datasets. Existing works either inaccurately capture the\u0000probability, or underestimate the indistinguishability between neighboring\u0000datasets. Motivated by this, we develop a more precise analysis, which yields a\u0000general and tighter bound for arbitrary DP mechanisms. Firstly, we derive the\u0000clone-generating probability by hypothesis testing %from a randomizer-specific\u0000perspective, which leads to a more accurate characterization of the\u0000probability. Secondly, we analyze the indistinguishability in the context of\u0000$f$-DP, where the convexity of the distributions is leveraged to achieve a\u0000tighter privacy bound. Theoretical and numerical results demonstrate that our\u0000bound remarkably outperforms the existing results in the literature.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
My Ontologist: Evaluating BFO-Based AI for Definition Support 我的本体论者评估基于 BFO 的定义支持人工智能
arXiv - CS - Databases Pub Date : 2024-07-24 DOI: arxiv-2407.17657
Carter Benson, Alec Sculley, Austin Liebers, John Beverley
{"title":"My Ontologist: Evaluating BFO-Based AI for Definition Support","authors":"Carter Benson, Alec Sculley, Austin Liebers, John Beverley","doi":"arxiv-2407.17657","DOIUrl":"https://doi.org/arxiv-2407.17657","url":null,"abstract":"Generative artificial intelligence (AI), exemplified by the release of\u0000GPT-3.5 in 2022, has significantly advanced the potential applications of large\u0000language models (LLMs), including in the realms of ontology development and\u0000knowledge graph creation. Ontologies, which are structured frameworks for\u0000organizing information, and knowledge graphs, which combine ontologies with\u0000actual data, are essential for enabling interoperability and automated\u0000reasoning. However, current research has largely overlooked the generation of\u0000ontologies extending from established upper-level frameworks like the Basic\u0000Formal Ontology (BFO), risking the creation of non-integrable ontology silos.\u0000This study explores the extent to which LLMs, particularly GPT-4, can support\u0000ontologists trained in BFO. Through iterative development of a specialized GPT\u0000model named \"My Ontologist,\" we aimed to generate BFO-conformant ontologies.\u0000Initial versions faced challenges in maintaining definition conventions and\u0000leveraging foundational texts effectively. My Ontologist 3.0 showed promise by\u0000adhering to structured rules and modular ontology suites, yet the release of\u0000GPT-4o disrupted this progress by altering the model's behavior. Our findings\u0000underscore the importance of aligning LLM-generated ontologies with top-level\u0000standards and highlight the complexities of integrating evolving AI\u0000capabilities in ontology engineering.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Subgraph Matching via Cost-Model-based Vertex Dominance Embeddings (Technical Report) 通过基于成本模型的顶点支配嵌入实现动态子图匹配(技术报告)
arXiv - CS - Databases Pub Date : 2024-07-23 DOI: arxiv-2407.16660
Yutong Ye, Xiang Lian, Nan Zhang, Mingsong Chen
{"title":"Dynamic Subgraph Matching via Cost-Model-based Vertex Dominance Embeddings (Technical Report)","authors":"Yutong Ye, Xiang Lian, Nan Zhang, Mingsong Chen","doi":"arxiv-2407.16660","DOIUrl":"https://doi.org/arxiv-2407.16660","url":null,"abstract":"In many real-world applications such as social network analysis, knowledge\u0000graph discovery, biological network analytics, and so on, graph data management\u0000has become increasingly important and has drawn much attention from the\u0000database community. While many graphs (e.g., Twitter, Wikipedia, etc.) are\u0000usually involving over time, it is of great importance to study the dynamic\u0000subgraph matching (DSM) problem, a fundamental yet challenging graph operator,\u0000which continuously monitors subgraph matching results over dynamic graphs with\u0000a stream of edge updates. To efficiently tackle the DSM problem, we carefully\u0000design a novel vertex dominance embedding approach, which effectively encodes\u0000vertex labels that can be incrementally maintained upon graph updates. Inspire\u0000by low pruning power for high-degree vertices, we propose a new degree grouping\u0000technique over basic subgraph patterns in different degree groups (i.e., groups\u0000of star substructures), and devise degree-aware star substructure synopses\u0000(DAS^3) to effectively facilitate our designed vertex dominance and range\u0000pruning strategies. We develop efficient algorithms to incrementally maintain\u0000dynamic graphs and answer DSM queries. Through extensive experiments, we\u0000confirm the efficiency of our proposed approaches over both real and synthetic\u0000graphs.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"351 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信