SRank: Guiding schema selection in NoSQL document stores

IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"SRank: Guiding schema selection in NoSQL document stores","authors":"","doi":"10.1016/j.datak.2024.102360","DOIUrl":null,"url":null,"abstract":"<div><div>The rise of big data has led to a greater need for applications to change their schema frequently. NoSQL databases provide flexibility in organizing data and offer multiple choices for structuring and storing similar information. While schema flexibility speeds up initial development, choosing schemas wisely is crucial, as they significantly impact performance, affecting data redundancy, navigation cost, data access cost, and maintainability. This paper emphasizes the importance of schema design in NoSQL document stores. It proposes a model to analyze and evaluate different schema alternatives and suggest the best schema out of various schema alternatives. The model is divided into four phases. The model inputs the Entity-Relationship (ER) model and workload queries. In the Transformation Phase, the schema alternatives are initially developed for each ER model, and subsequently, a schema graph is generated for each alternative. Concurrently, workload queries undergo conversion into query graphs. In the Schema Evaluation phase, the Schema Rank (SRank) is calculated for each schema alternative using query metrics derived from the query graphs and path coverage generated from the schema graphs. Finally, in the Output phase, the schema with the highest SRank is recommended as the most suitable choice for the application. The paper includes a case study of a Hotel Reservation System (HRS) to demonstrate the application of the proposed model. It comprehensively evaluates various schema alternatives based on query response time, storage efficiency, scalability, throughput, and latency. The paper validates the SRank computation for schema selection in NoSQL databases through an extensive experimental study. The alignment of SRank values with each schema's performance metrics underscores this ranking system's effectiveness. The SRank simplifies the schema selection process, assisting users in making informed decisions by reducing the time, cost, and effort of identifying the optimal schema for NoSQL document stores.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24000843","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The rise of big data has led to a greater need for applications to change their schema frequently. NoSQL databases provide flexibility in organizing data and offer multiple choices for structuring and storing similar information. While schema flexibility speeds up initial development, choosing schemas wisely is crucial, as they significantly impact performance, affecting data redundancy, navigation cost, data access cost, and maintainability. This paper emphasizes the importance of schema design in NoSQL document stores. It proposes a model to analyze and evaluate different schema alternatives and suggest the best schema out of various schema alternatives. The model is divided into four phases. The model inputs the Entity-Relationship (ER) model and workload queries. In the Transformation Phase, the schema alternatives are initially developed for each ER model, and subsequently, a schema graph is generated for each alternative. Concurrently, workload queries undergo conversion into query graphs. In the Schema Evaluation phase, the Schema Rank (SRank) is calculated for each schema alternative using query metrics derived from the query graphs and path coverage generated from the schema graphs. Finally, in the Output phase, the schema with the highest SRank is recommended as the most suitable choice for the application. The paper includes a case study of a Hotel Reservation System (HRS) to demonstrate the application of the proposed model. It comprehensively evaluates various schema alternatives based on query response time, storage efficiency, scalability, throughput, and latency. The paper validates the SRank computation for schema selection in NoSQL databases through an extensive experimental study. The alignment of SRank values with each schema's performance metrics underscores this ranking system's effectiveness. The SRank simplifies the schema selection process, assisting users in making informed decisions by reducing the time, cost, and effort of identifying the optimal schema for NoSQL document stores.
SRank:指导 NoSQL 文档存储中的模式选择
大数据的兴起导致应用程序更需要频繁更改其模式。NoSQL 数据库可以灵活地组织数据,并为结构化和存储类似信息提供多种选择。虽然模式灵活性加快了初始开发速度,但明智地选择模式至关重要,因为它们会显著影响性能,影响数据冗余、导航成本、数据访问成本和可维护性。本文强调了模式设计在 NoSQL 文档存储中的重要性。它提出了一个模型,用于分析和评估不同的模式备选方案,并从各种模式备选方案中推荐最佳模式。该模型分为四个阶段。该模型输入实体关系(ER)模型和工作量查询。在转换阶段,最初为每个 ER 模型开发模式备选方案,随后为每个备选方案生成模式图。与此同时,工作负载查询也被转换成查询图。在模式评估阶段,使用从查询图和从模式图生成的路径覆盖率得出的查询指标,为每个模式备选方案计算模式排名(SRank)。最后,在输出阶段,推荐 SRank 最高的模式作为最适合应用的选择。本文通过一个酒店预订系统(HRS)的案例研究来展示所提模型的应用。论文根据查询响应时间、存储效率、可扩展性、吞吐量和延迟全面评估了各种模式选择。论文通过广泛的实验研究验证了用于 NoSQL 数据库模式选择的 SRank 计算。SRank 值与每个模式的性能指标相一致,凸显了该排名系统的有效性。SRank 简化了模式选择过程,通过减少为 NoSQL 文档存储确定最佳模式所需的时间、成本和精力,帮助用户做出明智的决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data & Knowledge Engineering
Data & Knowledge Engineering 工程技术-计算机:人工智能
CiteScore
5.00
自引率
0.00%
发文量
66
审稿时长
6 months
期刊介绍: Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信