arXiv - CS - Databases最新文献_第10页

MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset MEVDT：基于事件的多模式车辆检测与跟踪数据集

arXiv - CS - Databases Pub Date : 2024-07-29 DOI: arxiv-2407.20446

Zaid A. El Shair, Samir A. Rawashdeh

引用次数: 0

Shapley Value Computation in Ontology-Mediated Query Answering 本体中介查询回答中的夏普利值计算

arXiv - CS - Databases Pub Date : 2024-07-29 DOI: arxiv-2407.20058

Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade

引用次数: 0

Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload 利用复杂的 SQL 工作负载评估文本到 SQL 生成的 LLM

arXiv - CS - Databases Pub Date : 2024-07-28 DOI: arxiv-2407.19517

Limin Ma, Ken Pu, Ying Zhu

{"title":"Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload","authors":"Limin Ma, Ken Pu, Ying Zhu","doi":"arxiv-2407.19517","DOIUrl":"https://doi.org/arxiv-2407.19517","url":null,"abstract":"This study presents a comparative analysis of the a complex SQL benchmark,\u0000TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings\u0000reveal that TPC-DS queries exhibit a significantly higher level of structural\u0000complexity compared to the other two benchmarks. This underscores the need for\u0000more intricate benchmarks to simulate realistic scenarios effectively. To\u0000facilitate this comparison, we devised several measures of structural\u0000complexity and applied them across all three benchmarks. The results of this\u0000study can guide future research in the development of more sophisticated\u0000text-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based\u0000on the query descriptions provided by the TPC-DS benchmark. The prompt\u0000engineering process incorporated both the query description as outlined in the\u0000TPC-DS specification and the database schema of TPC-DS. Our findings indicate\u0000that the current state-of-the-art generative AI models fall short in generating\u0000accurate decision-making queries. We conducted a comparison of the generated\u0000queries with the TPC-DS gold standard queries using a series of fuzzy structure\u0000matching techniques based on query features. The results demonstrated that the\u0000accuracy of the generated queries is insufficient for practical real-world\u0000application.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Turning Multidimensional Big Data Analytics into Practice: Design and Implementation of ClustCube Big-Data Tools in Real-Life Scenarios 将多维大数据分析付诸实践：在实际生活场景中设计和实施 ClustCube 大数据工具

arXiv - CS - Databases Pub Date : 2024-07-26 DOI: arxiv-2407.18604

Alfredo Cuzzocrea, Abderraouf Hafsaoui, Ismail Benlaredj

引用次数: 0

Towards A More Reasonable Semantic Web 迈向更合理的语义网

arXiv - CS - Databases Pub Date : 2024-07-26 DOI: arxiv-2407.19095

Vleer Doing, Ryan Wisnesky

引用次数: 0

Partial Adaptive Indexing for Approximate Query Answering 用于近似查询回答的部分自适应索引

arXiv - CS - Databases Pub Date : 2024-07-26 DOI: arxiv-2407.18702

Stavros Maroulis, Nikos Bikakis, Vassilis Stamatopoulos, George Papastefanatos

引用次数: 0

A survey of open-source data quality tools: shedding light on the materialization of data quality dimensions in practice 开源数据质量工具调查：揭示数据质量在实践中的具体体现

arXiv - CS - Databases Pub Date : 2024-07-26 DOI: arxiv-2407.18649

Vasileios Papastergios, Anastasios Gounaris

引用次数: 0

Enhanced Privacy Bound for Shuffle Model with Personalized Privacy 增强洗牌模型的个性化隐私约束

arXiv - CS - Databases Pub Date : 2024-07-25 DOI: arxiv-2407.18157

Yixuan Liu, Yuhan Liu, Li Xiong, Yujie Gu, Hong Chen

{"title":"Enhanced Privacy Bound for Shuffle Model with Personalized Privacy","authors":"Yixuan Liu, Yuhan Liu, Li Xiong, Yujie Gu, Hong Chen","doi":"arxiv-2407.18157","DOIUrl":"https://doi.org/arxiv-2407.18157","url":null,"abstract":"The shuffle model of Differential Privacy (DP) is an enhanced privacy\u0000protocol which introduces an intermediate trusted server between local users\u0000and a central data curator. It significantly amplifies the central DP guarantee\u0000by anonymizing and shuffling the local randomized data. Yet, deriving a tight\u0000privacy bound is challenging due to its complicated randomization protocol.\u0000While most existing work are focused on unified local privacy settings, this\u0000work focuses on deriving the central privacy bound for a more practical setting\u0000where personalized local privacy is required by each user. To bound the privacy\u0000after shuffling, we first need to capture the probability of each user\u0000generating clones of the neighboring data points. Second, we need to quantify\u0000the indistinguishability between two distributions of the number of clones on\u0000neighboring datasets. Existing works either inaccurately capture the\u0000probability, or underestimate the indistinguishability between neighboring\u0000datasets. Motivated by this, we develop a more precise analysis, which yields a\u0000general and tighter bound for arbitrary DP mechanisms. Firstly, we derive the\u0000clone-generating probability by hypothesis testing %from a randomizer-specific\u0000perspective, which leads to a more accurate characterization of the\u0000probability. Secondly, we analyze the indistinguishability in the context of\u0000$f$-DP, where the convexity of the distributions is leveraged to achieve a\u0000tighter privacy bound. Theoretical and numerical results demonstrate that our\u0000bound remarkably outperforms the existing results in the literature.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

My Ontologist: Evaluating BFO-Based AI for Definition Support 我的本体论者评估基于 BFO 的定义支持人工智能

arXiv - CS - Databases Pub Date : 2024-07-24 DOI: arxiv-2407.17657

Carter Benson, Alec Sculley, Austin Liebers, John Beverley

{"title":"My Ontologist: Evaluating BFO-Based AI for Definition Support","authors":"Carter Benson, Alec Sculley, Austin Liebers, John Beverley","doi":"arxiv-2407.17657","DOIUrl":"https://doi.org/arxiv-2407.17657","url":null,"abstract":"Generative artificial intelligence (AI), exemplified by the release of\u0000GPT-3.5 in 2022, has significantly advanced the potential applications of large\u0000language models (LLMs), including in the realms of ontology development and\u0000knowledge graph creation. Ontologies, which are structured frameworks for\u0000organizing information, and knowledge graphs, which combine ontologies with\u0000actual data, are essential for enabling interoperability and automated\u0000reasoning. However, current research has largely overlooked the generation of\u0000ontologies extending from established upper-level frameworks like the Basic\u0000Formal Ontology (BFO), risking the creation of non-integrable ontology silos.\u0000This study explores the extent to which LLMs, particularly GPT-4, can support\u0000ontologists trained in BFO. Through iterative development of a specialized GPT\u0000model named \"My Ontologist,\" we aimed to generate BFO-conformant ontologies.\u0000Initial versions faced challenges in maintaining definition conventions and\u0000leveraging foundational texts effectively. My Ontologist 3.0 showed promise by\u0000adhering to structured rules and modular ontology suites, yet the release of\u0000GPT-4o disrupted this progress by altering the model's behavior. Our findings\u0000underscore the importance of aligning LLM-generated ontologies with top-level\u0000standards and highlight the complexities of integrating evolving AI\u0000capabilities in ontology engineering.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Subgraph Matching via Cost-Model-based Vertex Dominance Embeddings (Technical Report) 通过基于成本模型的顶点支配嵌入实现动态子图匹配（技术报告）

arXiv - CS - Databases Pub Date : 2024-07-23 DOI: arxiv-2407.16660

Yutong Ye, Xiang Lian, Nan Zhang, Mingsong Chen

{"title":"Dynamic Subgraph Matching via Cost-Model-based Vertex Dominance Embeddings (Technical Report)","authors":"Yutong Ye, Xiang Lian, Nan Zhang, Mingsong Chen","doi":"arxiv-2407.16660","DOIUrl":"https://doi.org/arxiv-2407.16660","url":null,"abstract":"In many real-world applications such as social network analysis, knowledge\u0000graph discovery, biological network analytics, and so on, graph data management\u0000has become increasingly important and has drawn much attention from the\u0000database community. While many graphs (e.g., Twitter, Wikipedia, etc.) are\u0000usually involving over time, it is of great importance to study the dynamic\u0000subgraph matching (DSM) problem, a fundamental yet challenging graph operator,\u0000which continuously monitors subgraph matching results over dynamic graphs with\u0000a stream of edge updates. To efficiently tackle the DSM problem, we carefully\u0000design a novel vertex dominance embedding approach, which effectively encodes\u0000vertex labels that can be incrementally maintained upon graph updates. Inspire\u0000by low pruning power for high-degree vertices, we propose a new degree grouping\u0000technique over basic subgraph patterns in different degree groups (i.e., groups\u0000of star substructures), and devise degree-aware star substructure synopses\u0000(DAS^3) to effectively facilitate our designed vertex dominance and range\u0000pruning strategies. We develop efficient algorithms to incrementally maintain\u0000dynamic graphs and answer DSM queries. Through extensive experiments, we\u0000confirm the efficiency of our proposed approaches over both real and synthetic\u0000graphs.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"351 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0