Enhancing and optimizing a data protection solution

L. Cherkasova, R. Lau, Harald Burose, Bernhard Kappler
{"title":"Enhancing and optimizing a data protection solution","authors":"L. Cherkasova, R. Lau, Harald Burose, Bernhard Kappler","doi":"10.1109/MASCOT.2009.5367043","DOIUrl":null,"url":null,"abstract":"Analyzing and managing large amounts of unstructured information is a high priority task for many companies. For implementing content management solutions, companies need a comprehensive view of their unstructured data. In order to provide a new level of intelligence and control over data resident within the enterprise, one needs to build a chain of tools and automated processes that enable the evaluation, analysis, and visibility into information assets and their dynamics during the information life-cycle. We propose a novel framework to utilize the existing backup infrastructure by integrating additional content analysis routines and extracting already available filesystem metadata over time. This is used to perform data analysis and trending to add performance optimization and self-management capabilities to backup and information management tasks. Backup management faces serious challenges on its own: processing ever increasing amount of data while meeting the timing constraints of backup windows could require adaptive changes in backup scheduling routines. We revisit a traditional backup job scheduling and demonstrate that random job scheduling may lead to inefficient backup processing and an increased backup time. In this work, we use a historic information about the object backup processing time and suggest an additional job scheduling, and automated parameter tuning which may significantly optimize the overall backup time. Under this scheduling, called LBF, the longest backups (the objects with longest backup time) are scheduled first. We evaluate the performance benefits of the introduced scheduling using a realistic workload collected from the seven backup servers at HP Labs. Significant reduction of the backup time (up to 30%) and improved quality of service can be achieved under the proposed job assignment policy.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MASCOT.2009.5367043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Analyzing and managing large amounts of unstructured information is a high priority task for many companies. For implementing content management solutions, companies need a comprehensive view of their unstructured data. In order to provide a new level of intelligence and control over data resident within the enterprise, one needs to build a chain of tools and automated processes that enable the evaluation, analysis, and visibility into information assets and their dynamics during the information life-cycle. We propose a novel framework to utilize the existing backup infrastructure by integrating additional content analysis routines and extracting already available filesystem metadata over time. This is used to perform data analysis and trending to add performance optimization and self-management capabilities to backup and information management tasks. Backup management faces serious challenges on its own: processing ever increasing amount of data while meeting the timing constraints of backup windows could require adaptive changes in backup scheduling routines. We revisit a traditional backup job scheduling and demonstrate that random job scheduling may lead to inefficient backup processing and an increased backup time. In this work, we use a historic information about the object backup processing time and suggest an additional job scheduling, and automated parameter tuning which may significantly optimize the overall backup time. Under this scheduling, called LBF, the longest backups (the objects with longest backup time) are scheduled first. We evaluate the performance benefits of the introduced scheduling using a realistic workload collected from the seven backup servers at HP Labs. Significant reduction of the backup time (up to 30%) and improved quality of service can be achieved under the proposed job assignment policy.
增强和优化数据保护解决方案
分析和管理大量非结构化信息是许多公司的首要任务。为了实现内容管理解决方案,公司需要一个非结构化数据的全面视图。为了对企业内驻留的数据提供更高级别的智能和控制,需要构建一系列工具和自动化流程,以便在信息生命周期中对信息资产及其动态进行评估、分析和可见性。我们提出了一个新的框架,通过集成额外的内容分析例程和提取已经可用的文件系统元数据来利用现有的备份基础设施。它用于执行数据分析和趋势分析,以便为备份和信息管理任务添加性能优化和自我管理功能。备份管理本身就面临着严峻的挑战:在处理不断增加的数据量的同时满足备份窗口的时间限制,可能需要对备份调度例程进行自适应更改。我们重新讨论传统的备份作业调度,并证明随机作业调度可能导致低效的备份处理和增加的备份时间。在这项工作中,我们使用关于对象备份处理时间的历史信息,并建议额外的作业调度和自动参数调优,这可能会显着优化总体备份时间。在这种称为LBF的调度下,首先调度时间最长的备份(备份时间最长的对象)。我们使用从HP Labs的七个备份服务器收集的实际工作负载来评估引入的调度的性能优势。在建议的工作分配政策下,可大幅减少备份时间(最多可达30%),并改善服务质素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信