LHCb存储系统的分布式数据复制和访问优化:立场文件

2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K) Pub Date : 2015-11-12 DOI:10.5220/0005647105370540

M. Hushchyn, P. Charpentier, A. Ustyuzhanin

{"title":"LHCb存储系统的分布式数据复制和访问优化:立场文件","authors":"M. Hushchyn, P. Charpentier, A. Ustyuzhanin","doi":"10.5220/0005647105370540","DOIUrl":null,"url":null,"abstract":"This paper presents how machine learning algorithms and methods of statistics can be implemented to data management in hybrid data storage systems. Basicly, two different storage types are used to store data in the hybrid data storage systems. Keeping rarely used data on cheap and slow storages of type one and often used data on fast and expensive storages of type two helps to achieve optimal performance/cost ratio for the system. We use classification algorithms to estimate probability that the data will often used in future. Then, using the risks analysis we define where the data should be stored. We show how to estimate optimal number of replicas of the data using regression algorithms and Hidden Markov Model. Based on the probability, risks and the optimal number of data replicas our system finds optimal data distribution in the hybrid data storage system. We present the results of simulation of our method for LHCb hybrid data storage.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributed data replication and access optimization for LHCb storage system: A position paper\",\"authors\":\"M. Hushchyn, P. Charpentier, A. Ustyuzhanin\",\"doi\":\"10.5220/0005647105370540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents how machine learning algorithms and methods of statistics can be implemented to data management in hybrid data storage systems. Basicly, two different storage types are used to store data in the hybrid data storage systems. Keeping rarely used data on cheap and slow storages of type one and often used data on fast and expensive storages of type two helps to achieve optimal performance/cost ratio for the system. We use classification algorithms to estimate probability that the data will often used in future. Then, using the risks analysis we define where the data should be stored. We show how to estimate optimal number of replicas of the data using regression algorithms and Hidden Markov Model. Based on the probability, risks and the optimal number of data replicas our system finds optimal data distribution in the hybrid data storage system. We present the results of simulation of our method for LHCb hybrid data storage.\",\"PeriodicalId\":102743,\"journal\":{\"name\":\"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0005647105370540\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0005647105370540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了如何将机器学习算法和统计学方法应用到混合数据存储系统的数据管理中。基本上，混合数据存储系统中使用两种不同的存储类型来存储数据。将很少使用的数据保存在便宜而缓慢的第一类存储上，而将经常使用的数据保存在快速而昂贵的第二类存储上，这有助于实现系统的最佳性能/成本比。我们使用分类算法来估计数据在未来经常使用的概率。然后，使用风险分析，我们定义数据应该存储在哪里。我们展示了如何使用回归算法和隐马尔可夫模型估计数据的最佳副本数量。根据数据副本的概率、风险和最优数量，找到混合数据存储系统中最优的数据分布。给出了该方法在LHCb混合数据存储中的仿真结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distributed data replication and access optimization for LHCb storage system: A position paper

This paper presents how machine learning algorithms and methods of statistics can be implemented to data management in hybrid data storage systems. Basicly, two different storage types are used to store data in the hybrid data storage systems. Keeping rarely used data on cheap and slow storages of type one and often used data on fast and expensive storages of type two helps to achieve optimal performance/cost ratio for the system. We use classification algorithms to estimate probability that the data will often used in future. Then, using the risks analysis we define where the data should be stored. We show how to estimate optimal number of replicas of the data using regression algorithms and Hidden Markov Model. Based on the probability, risks and the optimal number of data replicas our system finds optimal data distribution in the hybrid data storage system. We present the results of simulation of our method for LHCb hybrid data storage.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)

自引率

0.00%

发文量