Safely Managing Data Variety in Big Data Software Development

Thomas Cerqueus, E. Almeida, Stefanie Scherzinger
{"title":"Safely Managing Data Variety in Big Data Software Development","authors":"Thomas Cerqueus, E. Almeida, Stefanie Scherzinger","doi":"10.5555/2819289.2819293","DOIUrl":null,"url":null,"abstract":"We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plug in. Our plug in ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2819289.2819293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plug in. Our plug in ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.
安全管理大数据软件开发中的数据多样性
我们考虑构建大数据软件系统的任务,以软件即服务的形式提供。这些应用程序通常由NoSQL数据存储支持,解决了众所周知的大数据处理的v: NoSQL数据存储可以处理大量数据,许多系统不强制执行全局模式,以解释数据的结构变化。因此,软件工程师可以随时设计数据模型,这种灵活性在敏捷软件开发中尤为重要。然而,当涉及到持久化数据结构的更改时,NoSQL数据存储通常还没有考虑到更改的准确性。然而,这是敏捷软件开发不可避免的结果。在大多数基于nosql的应用程序堆栈中,模式演变完全在应用程序代码中处理,通常涉及对象映射器库。然而,简单的代码重构,比如在源代码级别重命名类属性,可能会在应用程序部署到生产环境后导致数据丢失或运行时错误。我们通过提供在IDE插件中实现的类型检查规则来解决这个痛点。我们的插件ControVol静态地根据代码发布历史检查对象映射器类声明。因此,ControVol能够检测到常见但有风险的数据和模式不匹配的情况,甚至可以建议自动修复。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信