Mixing Biology and Computer Science Concepts to Design Resilient Data Lakes

Marzieh Derakhshannia, Anne Laurent, Arnaud Martin
{"title":"Mixing Biology and Computer Science Concepts to Design Resilient Data Lakes","authors":"Marzieh Derakhshannia, Anne Laurent, Arnaud Martin","doi":"10.46298/jimis.11449","DOIUrl":null,"url":null,"abstract":"Data lakes appeared a few years ago, introduced in particular to meet the challenges of storing and exploiting IoT data. They were first considered as a new technical and commercial tool, sold by the main database software editors. More recently, they have become the subject of research, in particular to define what a data lake should be, what it should provide in terms of services, and how it should be built. In this work, we have tried to return to the origins of data lakes, starting from the name “lake”. We present here how we worked, between biologists and computer scientists, to understand the links between natural and data lakes. In this article, we first explore the links between the disciplines of biology and computer science before declining these links for the particular theme of lakes. This could appear as a work of transferring knowledge from biology to computer science, and a “simple” application of the concepts. However, we had to interact and understand each other’s concepts and issues to align a possible comparison between the disciplines, for example to determine at what scale to establish the biological comparison, from DNA to the more macro system of the animal and plant ecosystem present in a natural lake. For this reason, we are inspired by a hybrid method based on ecological and logistical network topology to propose the resilient structure for the data lake. Thus, we use the Ecological Network Analysis (ENA) as a bio-inspired method and Graph theory as a logistical-inspired framework to study the interdisciplinary resilience strategies for the data lake network.","PeriodicalId":385261,"journal":{"name":"Journal of Interdisciplinary Methodologies and Issues in Sciences","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Interdisciplinary Methodologies and Issues in Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/jimis.11449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data lakes appeared a few years ago, introduced in particular to meet the challenges of storing and exploiting IoT data. They were first considered as a new technical and commercial tool, sold by the main database software editors. More recently, they have become the subject of research, in particular to define what a data lake should be, what it should provide in terms of services, and how it should be built. In this work, we have tried to return to the origins of data lakes, starting from the name “lake”. We present here how we worked, between biologists and computer scientists, to understand the links between natural and data lakes. In this article, we first explore the links between the disciplines of biology and computer science before declining these links for the particular theme of lakes. This could appear as a work of transferring knowledge from biology to computer science, and a “simple” application of the concepts. However, we had to interact and understand each other’s concepts and issues to align a possible comparison between the disciplines, for example to determine at what scale to establish the biological comparison, from DNA to the more macro system of the animal and plant ecosystem present in a natural lake. For this reason, we are inspired by a hybrid method based on ecological and logistical network topology to propose the resilient structure for the data lake. Thus, we use the Ecological Network Analysis (ENA) as a bio-inspired method and Graph theory as a logistical-inspired framework to study the interdisciplinary resilience strategies for the data lake network.
混合生物学和计算机科学概念设计弹性数据湖
数据湖出现在几年前,主要是为了应对存储和利用物联网数据的挑战。它们最初被认为是一种新的技术和商业工具,由主要的数据库软件编辑出售。最近,它们已经成为研究的主题,特别是定义数据湖应该是什么,它应该在服务方面提供什么,以及它应该如何构建。在这项工作中,我们试图回到数据湖的起源,从“湖”这个名字开始。在这里,我们展示了生物学家和计算机科学家之间的合作,以了解自然和数据湖之间的联系。在本文中,我们首先探讨了生物学和计算机科学学科之间的联系,然后为湖泊的特定主题拒绝这些联系。这可以表现为将生物学知识转移到计算机科学的工作,以及概念的“简单”应用。然而,我们必须相互交流并理解彼此的概念和问题,以便在学科之间进行可能的比较,例如,确定在什么规模上建立生物学比较,从DNA到天然湖泊中存在的动植物生态系统的更宏观系统。因此,我们受到基于生态和物流网络拓扑结构的混合方法的启发,提出了数据湖的弹性结构。因此,我们使用生态网络分析(ENA)作为生物启发方法和图论作为逻辑启发框架来研究数据湖网络的跨学科弹性策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信