Data Lake Conceptualized Web Platform for Food Research Data Collection

IF 0.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Gi-taek An;Seyoung Oh;Eunhye Kim;Jung-min Park
{"title":"Data Lake Conceptualized Web Platform for Food Research Data Collection","authors":"Gi-taek An;Seyoung Oh;Eunhye Kim;Jung-min Park","doi":"10.13052/jwe1540-9589.2333","DOIUrl":null,"url":null,"abstract":"Food research is uniquely intertwined with everyday life and necessitates the utilization of big data. Within this domain, the research data consist of various forms and formats, encompassing biological experiment results, chemical analysis data, nutritional information, microbiological data, sensor data, images, and videos. This diversity stems from the integration of data from various subdomains within the larger field. With recent advancements in deep learning technology, the importance of data has grown significantly, resulting in increased reliance on data-driven research. Although specialized platforms for sharing and utilizing data have been established at the national level, particularly in the bioscience field, food research lacks a dedicated infrastructure and specialized data-sharing platforms. In this study, we develop a platform that leverages Hadoop-based distributed file systems to create a data lake. This platform enables data storage and sharing through a web-based interface. The distributed file system supports scalability by adding data nodes, making it an effective solution for capacity expansion. In addition, the web-based platform ensures high accessibility, allowing users access from anywhere, at any time, using any device. Finally, we introduce the establishment of a 1.8 PB Hadoop-based physical storage system and present an approach for building a highly accessible web platform with substantial utility.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"23 3","pages":"377-392"},"PeriodicalIF":0.7000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10547279","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10547279/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Food research is uniquely intertwined with everyday life and necessitates the utilization of big data. Within this domain, the research data consist of various forms and formats, encompassing biological experiment results, chemical analysis data, nutritional information, microbiological data, sensor data, images, and videos. This diversity stems from the integration of data from various subdomains within the larger field. With recent advancements in deep learning technology, the importance of data has grown significantly, resulting in increased reliance on data-driven research. Although specialized platforms for sharing and utilizing data have been established at the national level, particularly in the bioscience field, food research lacks a dedicated infrastructure and specialized data-sharing platforms. In this study, we develop a platform that leverages Hadoop-based distributed file systems to create a data lake. This platform enables data storage and sharing through a web-based interface. The distributed file system supports scalability by adding data nodes, making it an effective solution for capacity expansion. In addition, the web-based platform ensures high accessibility, allowing users access from anywhere, at any time, using any device. Finally, we introduce the establishment of a 1.8 PB Hadoop-based physical storage system and present an approach for building a highly accessible web platform with substantial utility.
数据湖概念化食品研究数据收集网络平台
食品研究与日常生活息息相关,需要利用大数据。在这一领域,研究数据由各种形式和格式组成,包括生物实验结果、化学分析数据、营养信息、微生物数据、传感器数据、图像和视频。这种多样性源于更大领域内不同子领域数据的整合。随着最近深度学习技术的进步,数据的重要性显著增加,导致对数据驱动型研究的依赖性增加。虽然国家层面已经建立了专门的数据共享和利用平台,特别是在生物科学领域,但粮食研究缺乏专门的基础设施和专门的数据共享平台。在本研究中,我们开发了一个利用基于 Hadoop 的分布式文件系统创建数据湖的平台。该平台通过基于网络的界面实现数据存储和共享。分布式文件系统支持通过添加数据节点来实现可扩展性,是扩容的有效解决方案。此外,基于网络的平台确保了高度的可访问性,允许用户随时随地使用任何设备进行访问。最后,我们介绍了基于 Hadoop 的 1.8 PB 物理存储系统的建立情况,并提出了一种建立高度可访问的、实用的网络平台的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Web Engineering
Journal of Web Engineering 工程技术-计算机:理论方法
CiteScore
1.80
自引率
12.50%
发文量
62
审稿时长
9 months
期刊介绍: The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信