Developing a big data analytics platform using Apache Hadoop Ecosystem for delivering big data services in libraries

IF 1.2 Q3 INFORMATION SCIENCE & LIBRARY SCIENCE

Digital Library Perspectives Pub Date : 2024-02-22 DOI:10.1108/dlp-10-2022-0079

Ranjeet Kumar Singh

{"title":"Developing a big data analytics platform using Apache Hadoop Ecosystem for delivering big data services in libraries","authors":"Ranjeet Kumar Singh","doi":"10.1108/dlp-10-2022-0079","DOIUrl":null,"url":null,"abstract":"<h3>Purpose</h3>\n<p>Although the challenges associated with big data are increasing, the question of the most suitable big data analytics (BDA) platform in libraries is always significant. The purpose of this study is to propose a solution to this problem.</p>\n<h3>Design/methodology/approach</h3>\n<p>The current study identifies relevant literature and provides a review of big data adoption in libraries. It also presents a step-by-step guide for the development of a BDA platform using the Apache Hadoop Ecosystem. To test the system, an analysis of library big data using Apache Pig, which is a tool from the Apache Hadoop Ecosystem, was performed. It establishes the effectiveness of Apache Hadoop Ecosystem as a powerful BDA solution in libraries.</p>\n<h3>Findings</h3>\n<p>It can be inferred from the literature that libraries and librarians have not taken the possibility of big data services in libraries very seriously. Also, the literature suggests that there is no significant effort made to establish any BDA architecture in libraries. This study establishes the Apache Hadoop Ecosystem as a possible solution for delivering BDA services in libraries.</p>\n<h3>Research limitations/implications</h3>\n<p>The present work suggests adapting the idea of providing various big data services in a library by developing a BDA platform, for instance, providing assistance to the researchers in understanding the big data, cleaning and curation of big data by skilled and experienced data managers and providing the infrastructural support to store, process, manage, analyze and visualize the big data.</p>\n<h3>Practical implications</h3>\n<p>The study concludes that Apache Hadoops’ Hadoop Distributed File System and MapReduce components significantly reduce the complexities of big data storage and processing, respectively, and Apache Pig, using Pig Latin scripting language, is very efficient in processing big data and responding to queries with a quick response time.</p>\n<h3>Originality/value</h3>\n<p>According to the study, there are significantly fewer efforts made to analyze big data from libraries. Furthermore, it has been discovered that acceptance of the Apache Hadoop Ecosystem as a solution to big data problems in libraries are not widely discussed in the literature, although Apache Hadoop is regarded as one of the best frameworks for big data handling.</p>","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":"32 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Library Perspectives","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/dlp-10-2022-0079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Although the challenges associated with big data are increasing, the question of the most suitable big data analytics (BDA) platform in libraries is always significant. The purpose of this study is to propose a solution to this problem.

Design/methodology/approach

The current study identifies relevant literature and provides a review of big data adoption in libraries. It also presents a step-by-step guide for the development of a BDA platform using the Apache Hadoop Ecosystem. To test the system, an analysis of library big data using Apache Pig, which is a tool from the Apache Hadoop Ecosystem, was performed. It establishes the effectiveness of Apache Hadoop Ecosystem as a powerful BDA solution in libraries.

Findings

It can be inferred from the literature that libraries and librarians have not taken the possibility of big data services in libraries very seriously. Also, the literature suggests that there is no significant effort made to establish any BDA architecture in libraries. This study establishes the Apache Hadoop Ecosystem as a possible solution for delivering BDA services in libraries.

Research limitations/implications

The present work suggests adapting the idea of providing various big data services in a library by developing a BDA platform, for instance, providing assistance to the researchers in understanding the big data, cleaning and curation of big data by skilled and experienced data managers and providing the infrastructural support to store, process, manage, analyze and visualize the big data.

Practical implications

The study concludes that Apache Hadoops’ Hadoop Distributed File System and MapReduce components significantly reduce the complexities of big data storage and processing, respectively, and Apache Pig, using Pig Latin scripting language, is very efficient in processing big data and responding to queries with a quick response time.

Originality/value

According to the study, there are significantly fewer efforts made to analyze big data from libraries. Furthermore, it has been discovered that acceptance of the Apache Hadoop Ecosystem as a solution to big data problems in libraries are not widely discussed in the literature, although Apache Hadoop is regarded as one of the best frameworks for big data handling.

查看原文本刊更多论文

利用 Apache Hadoop 生态系统开发大数据分析平台，为图书馆提供大数据服务

目的虽然与大数据相关的挑战与日俱增，但图书馆中最合适的大数据分析（BDA）平台始终是个重要问题。本研究的目的是针对这一问题提出解决方案。本研究确定了相关文献，并对图书馆采用大数据的情况进行了回顾。本研究还提供了使用 Apache Hadoop 生态系统开发 BDA 平台的分步指南。为了测试该系统，使用 Apache Hadoop 生态系统中的工具 Apache Pig 对图书馆大数据进行了分析。研究结果从文献中可以推断出，图书馆和图书馆员并没有认真对待图书馆大数据服务的可能性。此外，文献还表明，图书馆在建立任何 BDA 架构方面都没有做出重大努力。本研究将 Apache Hadoop 生态系统作为在图书馆提供 BDA 服务的可能解决方案。研究局限性/启示本研究建议通过开发 BDA 平台，调整在图书馆提供各种大数据服务的想法，例如，协助研究人员理解大数据，由技术熟练、经验丰富的数据管理人员清理和整理大数据，以及提供基础设施支持以存储、处理、管理、分析和可视化大数据。实践意义研究得出结论，Apache Hadoops 的 Hadoop 分布式文件系统和 MapReduce 组件分别大大降低了大数据存储和处理的复杂性，而使用 Pig Latin 脚本语言的 Apache Pig 在处理大数据和快速响应查询方面非常高效。此外，研究还发现，尽管 Apache Hadoop 被视为处理大数据的最佳框架之一，但文献中并未广泛讨论接受 Apache Hadoop 生态系统作为图书馆大数据问题的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Library Perspectives INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

3.90

自引率

11.80%

发文量

期刊介绍： Digital Library Perspectives (DLP) is a peer-reviewed journal concerned with digital content collections. It publishes research related to the curation and web-based delivery of digital objects collected for the advancement of scholarship, teaching and learning. And which advance the digital information environment as it relates to global knowledge, communication and world memory. The journal aims to keep readers informed about current trends, initiatives, and developments. Including those in digital libraries and digital repositories, along with their standards and technologies. The editor invites contributions on the following, as well as other related topics: Digitization, Data as information, Archives and manuscripts, Digital preservation and digital archiving, Digital cultural memory initiatives, Usability studies, K-12 and higher education uses of digital collections.