Cinderella — Adaptive online partitioning of irregularly structured data

K. Herrmann, H. Voigt, Wolfgang Lehner
{"title":"Cinderella — Adaptive online partitioning of irregularly structured data","authors":"K. Herrmann, H. Voigt, Wolfgang Lehner","doi":"10.1109/ICDEW.2014.6818342","DOIUrl":null,"url":null,"abstract":"In an increasing number of use cases, databases face the challenge of managing irregularly structured data. Irregularly structured data is characterized by a quickly evolving variety of entities without a common set of attributes. These entities do not show enough regularity to be captured in a traditional database schema. A common solution is to centralize the diverse entities in a universal table. Usually, this leads to a very sparse table. Although today's techniques allow efficient storage of sparse universal tables, query efficiency is still a problem. Queries that reference only a subset of attributes have to read the whole universal table including many irrelevant entities. One possible solution is to use a partitioning of the table, which allows pruning partitions of irrelevant entities before they are touched. Creating and maintaining such a partitioning manually is very laborious or even infeasible, due to the enormous complexity. Thus an autonomous solution is desirable. In this paper, we define the Online Partitioning Problem for irregularly structured data and present Cinderella. Cinderella is an autonomous online algorithm for horizontal partitioning of irregularly structured entities in universal tables. It is designed to keep its overhead low by incrementally assigning entities to partitions while they are touched anyway during modifications. The achieved partitioning allows queries that retrieve only entities with a subset of attributes easily pruning partitions of irrelevant entities. Cinderella increases the locality of queries and reduces query execution cost.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"287 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 30th International Conference on Data Engineering Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2014.6818342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

In an increasing number of use cases, databases face the challenge of managing irregularly structured data. Irregularly structured data is characterized by a quickly evolving variety of entities without a common set of attributes. These entities do not show enough regularity to be captured in a traditional database schema. A common solution is to centralize the diverse entities in a universal table. Usually, this leads to a very sparse table. Although today's techniques allow efficient storage of sparse universal tables, query efficiency is still a problem. Queries that reference only a subset of attributes have to read the whole universal table including many irrelevant entities. One possible solution is to use a partitioning of the table, which allows pruning partitions of irrelevant entities before they are touched. Creating and maintaining such a partitioning manually is very laborious or even infeasible, due to the enormous complexity. Thus an autonomous solution is desirable. In this paper, we define the Online Partitioning Problem for irregularly structured data and present Cinderella. Cinderella is an autonomous online algorithm for horizontal partitioning of irregularly structured entities in universal tables. It is designed to keep its overhead low by incrementally assigning entities to partitions while they are touched anyway during modifications. The achieved partitioning allows queries that retrieve only entities with a subset of attributes easily pruning partitions of irrelevant entities. Cinderella increases the locality of queries and reduces query execution cost.
Cinderella -不规则结构化数据的自适应在线分区
在越来越多的用例中,数据库面临着管理不规则结构化数据的挑战。不规则结构化数据的特点是各种实体快速演变,而没有一组共同的属性。这些实体没有显示出足够的规律性,无法在传统的数据库模式中捕获。一种常见的解决方案是将不同的实体集中在一个通用表中。通常,这会导致一个非常稀疏的表。尽管目前的技术允许对稀疏通用表进行高效存储,但查询效率仍然是一个问题。仅引用属性子集的查询必须读取整个通用表,包括许多不相关的实体。一种可能的解决方案是使用表分区,它允许在触及不相关实体之前修剪它们的分区。手动创建和维护这样的分区是非常费力的,甚至是不可行的,因为它非常复杂。因此一个自治的解决方案是可取的。在本文中,我们定义了不规则结构化数据的在线分区问题,并给出了灰姑娘问题。Cinderella是一个自治的在线算法,用于在通用表中对不规则结构的实体进行水平分区。它的目的是通过在修改期间将实体增量地分配给分区来降低开销。实现的分区允许只检索具有属性子集的实体的查询轻松地修剪不相关实体的分区。Cinderella增加了查询的局部性,降低了查询的执行成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信