使用hadoop分布式文件系统在云存储中实现top-n文件检索

2016 Second International Conference on Science Technology Engineering and Management (ICONSTEM) Pub Date : 2016-03-01 DOI:10.1109/ICONSTEM.2016.7560944

J. Jeya, E. Kannan

{"title":"使用hadoop分布式文件系统在云存储中实现top-n文件检索","authors":"J. Jeya, E. Kannan","doi":"10.1109/ICONSTEM.2016.7560944","DOIUrl":null,"url":null,"abstract":"A storage system in cloud is well thought-out as a very big scale storage system that has independent storage servers. The service that cloud storage provides is, that can store user's data from remote through network and other authenticated users can access the data easily. Hadoop distributed file system is used to store large files consistently and to retrieve those files at very high bandwidth to user applications. Hadoop splits the files into large blocks and distributes them amongst the nodes in the cluster. When we retrieve data from the cloud, it is very important that the computation and communication overhead should be reduced. To reduce the communication overhead the server should send only the top-n files based on the keyword when the user asks for the data files. Since the owner need not maintain the copy of the files, it is all the more necessary to make check on the files available and also check the originality of the files stored in the server periodically. In HDFS the computation is done in parallel so that the execution time is drastically reduced. In the proposed system for retrieving top-n files we use Hadoop Distributed File System, so that the search time and the communication overhead is greatly reduced.","PeriodicalId":256750,"journal":{"name":"2016 Second International Conference on Science Technology Engineering and Management (ICONSTEM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Enabling top-n file retrieval in cloud storage using hadoop distributed file system\",\"authors\":\"J. Jeya, E. Kannan\",\"doi\":\"10.1109/ICONSTEM.2016.7560944\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A storage system in cloud is well thought-out as a very big scale storage system that has independent storage servers. The service that cloud storage provides is, that can store user's data from remote through network and other authenticated users can access the data easily. Hadoop distributed file system is used to store large files consistently and to retrieve those files at very high bandwidth to user applications. Hadoop splits the files into large blocks and distributes them amongst the nodes in the cluster. When we retrieve data from the cloud, it is very important that the computation and communication overhead should be reduced. To reduce the communication overhead the server should send only the top-n files based on the keyword when the user asks for the data files. Since the owner need not maintain the copy of the files, it is all the more necessary to make check on the files available and also check the originality of the files stored in the server periodically. In HDFS the computation is done in parallel so that the execution time is drastically reduced. In the proposed system for retrieving top-n files we use Hadoop Distributed File System, so that the search time and the communication overhead is greatly reduced.\",\"PeriodicalId\":256750,\"journal\":{\"name\":\"2016 Second International Conference on Science Technology Engineering and Management (ICONSTEM)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Second International Conference on Science Technology Engineering and Management (ICONSTEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICONSTEM.2016.7560944\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Second International Conference on Science Technology Engineering and Management (ICONSTEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICONSTEM.2016.7560944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

云中的存储系统被认为是一个非常大规模的存储系统，拥有独立的存储服务器。云存储提供的服务是，可以通过网络远程存储用户的数据，其他经过认证的用户可以方便地访问这些数据。Hadoop分布式文件系统用于一致地存储大文件，并以非常高的带宽检索这些文件给用户应用程序。Hadoop将文件分割成大块，并将它们分布在集群中的节点之间。当我们从云中检索数据时，减少计算和通信开销是非常重要的。为了减少通信开销，当用户请求数据文件时，服务器应该只发送基于关键字的top-n文件。由于所有者不需要维护文件的副本，因此更有必要对可用文件进行检查，并定期检查存储在服务器中的文件的原创性。在HDFS中，计算是并行完成的，因此执行时间大大减少。在本文提出的top-n文件检索系统中，我们使用Hadoop分布式文件系统，从而大大减少了搜索时间和通信开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enabling top-n file retrieval in cloud storage using hadoop distributed file system

A storage system in cloud is well thought-out as a very big scale storage system that has independent storage servers. The service that cloud storage provides is, that can store user's data from remote through network and other authenticated users can access the data easily. Hadoop distributed file system is used to store large files consistently and to retrieve those files at very high bandwidth to user applications. Hadoop splits the files into large blocks and distributes them amongst the nodes in the cluster. When we retrieve data from the cloud, it is very important that the computation and communication overhead should be reduced. To reduce the communication overhead the server should send only the top-n files based on the keyword when the user asks for the data files. Since the owner need not maintain the copy of the files, it is all the more necessary to make check on the files available and also check the originality of the files stored in the server periodically. In HDFS the computation is done in parallel so that the execution time is drastically reduced. In the proposed system for retrieving top-n files we use Hadoop Distributed File System, so that the search time and the communication overhead is greatly reduced.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 Second International Conference on Science Technology Engineering and Management (ICONSTEM)

自引率

0.00%

发文量