Data Lake Conceptualized Web Platform for Food Research Data Collection

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Web Engineering Pub Date : 2024-03-01 DOI:10.13052/jwe1540-9589.2333

Gi-taek An;Seyoung Oh;Eunhye Kim;Jung-min Park

{"title":"Data Lake Conceptualized Web Platform for Food Research Data Collection","authors":"Gi-taek An;Seyoung Oh;Eunhye Kim;Jung-min Park","doi":"10.13052/jwe1540-9589.2333","DOIUrl":null,"url":null,"abstract":"Food research is uniquely intertwined with everyday life and necessitates the utilization of big data. Within this domain, the research data consist of various forms and formats, encompassing biological experiment results, chemical analysis data, nutritional information, microbiological data, sensor data, images, and videos. This diversity stems from the integration of data from various subdomains within the larger field. With recent advancements in deep learning technology, the importance of data has grown significantly, resulting in increased reliance on data-driven research. Although specialized platforms for sharing and utilizing data have been established at the national level, particularly in the bioscience field, food research lacks a dedicated infrastructure and specialized data-sharing platforms. In this study, we develop a platform that leverages Hadoop-based distributed file systems to create a data lake. This platform enables data storage and sharing through a web-based interface. The distributed file system supports scalability by adding data nodes, making it an effective solution for capacity expansion. In addition, the web-based platform ensures high accessibility, allowing users access from anywhere, at any time, using any device. Finally, we introduce the establishment of a 1.8 PB Hadoop-based physical storage system and present an approach for building a highly accessible web platform with substantial utility.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"23 3","pages":"377-392"},"PeriodicalIF":1.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10547279","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10547279/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Food research is uniquely intertwined with everyday life and necessitates the utilization of big data. Within this domain, the research data consist of various forms and formats, encompassing biological experiment results, chemical analysis data, nutritional information, microbiological data, sensor data, images, and videos. This diversity stems from the integration of data from various subdomains within the larger field. With recent advancements in deep learning technology, the importance of data has grown significantly, resulting in increased reliance on data-driven research. Although specialized platforms for sharing and utilizing data have been established at the national level, particularly in the bioscience field, food research lacks a dedicated infrastructure and specialized data-sharing platforms. In this study, we develop a platform that leverages Hadoop-based distributed file systems to create a data lake. This platform enables data storage and sharing through a web-based interface. The distributed file system supports scalability by adding data nodes, making it an effective solution for capacity expansion. In addition, the web-based platform ensures high accessibility, allowing users access from anywhere, at any time, using any device. Finally, we introduce the establishment of a 1.8 PB Hadoop-based physical storage system and present an approach for building a highly accessible web platform with substantial utility.

查看原文本刊更多论文

数据湖概念化食品研究数据收集网络平台

食品研究与日常生活息息相关，需要利用大数据。在这一领域，研究数据由各种形式和格式组成，包括生物实验结果、化学分析数据、营养信息、微生物数据、传感器数据、图像和视频。这种多样性源于更大领域内不同子领域数据的整合。随着最近深度学习技术的进步，数据的重要性显著增加，导致对数据驱动型研究的依赖性增加。虽然国家层面已经建立了专门的数据共享和利用平台，特别是在生物科学领域，但粮食研究缺乏专门的基础设施和专门的数据共享平台。在本研究中，我们开发了一个利用基于 Hadoop 的分布式文件系统创建数据湖的平台。该平台通过基于网络的界面实现数据存储和共享。分布式文件系统支持通过添加数据节点来实现可扩展性，是扩容的有效解决方案。此外，基于网络的平台确保了高度的可访问性，允许用户随时随地使用任何设备进行访问。最后，我们介绍了基于 Hadoop 的 1.8 PB 物理存储系统的建立情况，并提出了一种建立高度可访问的、实用的网络平台的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Web Engineering 工程技术-计算机：理论方法

CiteScore

1.80

自引率

12.50%

发文量

审稿时长

9 months

期刊介绍： The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.