Safely Managing Data Variety in Big Data Software Development

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI:10.5555/2819289.2819293

Thomas Cerqueus, E. Almeida, Stefanie Scherzinger

{"title":"Safely Managing Data Variety in Big Data Software Development","authors":"Thomas Cerqueus, E. Almeida, Stefanie Scherzinger","doi":"10.5555/2819289.2819293","DOIUrl":null,"url":null,"abstract":"We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plug in. Our plug in ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2819289.2819293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plug in. Our plug in ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.

查看原文本刊更多论文

安全管理大数据软件开发中的数据多样性

我们考虑构建大数据软件系统的任务，以软件即服务的形式提供。这些应用程序通常由NoSQL数据存储支持，解决了众所周知的大数据处理的v: NoSQL数据存储可以处理大量数据，许多系统不强制执行全局模式，以解释数据的结构变化。因此，软件工程师可以随时设计数据模型，这种灵活性在敏捷软件开发中尤为重要。然而，当涉及到持久化数据结构的更改时，NoSQL数据存储通常还没有考虑到更改的准确性。然而，这是敏捷软件开发不可避免的结果。在大多数基于nosql的应用程序堆栈中，模式演变完全在应用程序代码中处理，通常涉及对象映射器库。然而，简单的代码重构，比如在源代码级别重命名类属性，可能会在应用程序部署到生产环境后导致数据丢失或运行时错误。我们通过提供在IDE插件中实现的类型检查规则来解决这个痛点。我们的插件ControVol静态地根据代码发布历史检查对象映射器类声明。因此，ControVol能够检测到常见但有风险的数据和模式不匹配的情况，甚至可以建议自动修复。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering

自引率

0.00%

发文量