Massive Data HBase Storage Method for Electronic Archive Management

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Huaquan Su, Junwei Li, Li Guo, Wanshuo Wang, Yongjiao Yang, You Wen, Kai Li, Pingyan Mo
{"title":"Massive Data HBase Storage Method for Electronic Archive Management","authors":"Huaquan Su,&nbsp;Junwei Li,&nbsp;Li Guo,&nbsp;Wanshuo Wang,&nbsp;Yongjiao Yang,&nbsp;You Wen,&nbsp;Kai Li,&nbsp;Pingyan Mo","doi":"10.1002/nem.2308","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The acceleration of the digitalization process in enterprise and university education management has generated a massive amount of electronic archive data. In order to improve the intelligence, storage quality, and efficiency of electronic records management and achieve efficient storage and fast retrieval of data storage models, this study proposes a massive data storage model based on HBase and its retrieval optimization scheme design. In addition, HDFS is introduced to construct a two-level storage structure and optimize values to improve the scalability and load balancing of HBase, and the retrieval efficiency of the HBase storage model is improved through SL-TCR and BF filters. The results indicated that HDFS could automatically recover data after node, network partition, and NameNode failures. The write time of HBase was 56 s, which was 132 and 246 s less than Cassandra and CockroachDB. The query latency was reduced by 23% and 32%, and the query time was reduced by 9988.51 ms, demonstrating high reliability and efficiency. The delay of BF-SL-TCL was 1379.28 s after 1000 searches, which was 224.78 and 212.74 s less than SL-TCL and Blockchain Retrieval Acceleration and reduced the delay under high search times. In summary, this storage model has obvious advantages in storing massive amounts of electronic archive data and has high security and retrieval efficiency, which provides important reference for the design of storage models for future electronic archive management. The storage model designed by the research institute has obvious advantages in storing massive electronic archive data, solving the problem of lack of scalability in electronic archive management when facing massive data, and has high security and retrieval efficiency. It has important reference for the design of storage models for future electronic archive management.</p>\n </div>","PeriodicalId":14154,"journal":{"name":"International Journal of Network Management","volume":"35 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Network Management","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/nem.2308","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The acceleration of the digitalization process in enterprise and university education management has generated a massive amount of electronic archive data. In order to improve the intelligence, storage quality, and efficiency of electronic records management and achieve efficient storage and fast retrieval of data storage models, this study proposes a massive data storage model based on HBase and its retrieval optimization scheme design. In addition, HDFS is introduced to construct a two-level storage structure and optimize values to improve the scalability and load balancing of HBase, and the retrieval efficiency of the HBase storage model is improved through SL-TCR and BF filters. The results indicated that HDFS could automatically recover data after node, network partition, and NameNode failures. The write time of HBase was 56 s, which was 132 and 246 s less than Cassandra and CockroachDB. The query latency was reduced by 23% and 32%, and the query time was reduced by 9988.51 ms, demonstrating high reliability and efficiency. The delay of BF-SL-TCL was 1379.28 s after 1000 searches, which was 224.78 and 212.74 s less than SL-TCL and Blockchain Retrieval Acceleration and reduced the delay under high search times. In summary, this storage model has obvious advantages in storing massive amounts of electronic archive data and has high security and retrieval efficiency, which provides important reference for the design of storage models for future electronic archive management. The storage model designed by the research institute has obvious advantages in storing massive electronic archive data, solving the problem of lack of scalability in electronic archive management when facing massive data, and has high security and retrieval efficiency. It has important reference for the design of storage models for future electronic archive management.

用于电子档案管理的海量数据 HBase 存储方法
随着企业和高校教育管理数字化进程的加快,产生了海量的电子档案数据。为了提高电子档案管理的智能化、存储质量和效率,实现高效存储、快速检索的数据存储模型,本研究提出了基于HBase的海量数据存储模型及其检索优化方案设计。此外,引入 HDFS 构建两级存储结构并进行优化取值,以提高 HBase 的可扩展性和负载均衡性,并通过 SL-TCR 和 BF 过滤器提高 HBase 存储模型的检索效率。结果表明,HDFS能在节点、网络分区和NameNode故障后自动恢复数据。HBase 的写入时间为 56 秒,分别比 Cassandra 和 CockroachDB 短 132 秒和 246 秒。查询延迟分别减少了 23% 和 32%,查询时间减少了 9988.51 毫秒,表现出很高的可靠性和效率。BF-SL-TCL在1000次搜索后的延迟为1379.28 s,比SL-TCL和区块链检索加速分别减少了224.78和212.74 s,减少了高搜索次数下的延迟。综上所述,该存储模型在存储海量电子档案数据方面优势明显,具有较高的安全性和检索效率,为未来电子档案管理的存储模型设计提供了重要参考。该研究所设计的存储模型在存储海量电子档案数据方面具有明显优势,解决了电子档案管理在面对海量数据时缺乏可扩展性的问题,具有较高的安全性和检索效率。它对未来电子档案管理的存储模型设计具有重要的借鉴意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Network Management
International Journal of Network Management COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS
CiteScore
5.10
自引率
6.70%
发文量
25
审稿时长
>12 weeks
期刊介绍: Modern computer networks and communication systems are increasing in size, scope, and heterogeneity. The promise of a single end-to-end technology has not been realized and likely never will occur. The decreasing cost of bandwidth is increasing the possible applications of computer networks and communication systems to entirely new domains. Problems in integrating heterogeneous wired and wireless technologies, ensuring security and quality of service, and reliably operating large-scale systems including the inclusion of cloud computing have all emerged as important topics. The one constant is the need for network management. Challenges in network management have never been greater than they are today. The International Journal of Network Management is the forum for researchers, developers, and practitioners in network management to present their work to an international audience. The journal is dedicated to the dissemination of information, which will enable improved management, operation, and maintenance of computer networks and communication systems. The journal is peer reviewed and publishes original papers (both theoretical and experimental) by leading researchers, practitioners, and consultants from universities, research laboratories, and companies around the world. Issues with thematic or guest-edited special topics typically occur several times per year. Topic areas for the journal are largely defined by the taxonomy for network and service management developed by IFIP WG6.6, together with IEEE-CNOM, the IRTF-NMRG and the Emanics Network of Excellence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信