Enhancing and optimizing a data protection solution

2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems Pub Date : 2009-12-28 DOI:10.1109/MASCOT.2009.5367043

L. Cherkasova, R. Lau, Harald Burose, Bernhard Kappler

{"title":"Enhancing and optimizing a data protection solution","authors":"L. Cherkasova, R. Lau, Harald Burose, Bernhard Kappler","doi":"10.1109/MASCOT.2009.5367043","DOIUrl":null,"url":null,"abstract":"Analyzing and managing large amounts of unstructured information is a high priority task for many companies. For implementing content management solutions, companies need a comprehensive view of their unstructured data. In order to provide a new level of intelligence and control over data resident within the enterprise, one needs to build a chain of tools and automated processes that enable the evaluation, analysis, and visibility into information assets and their dynamics during the information life-cycle. We propose a novel framework to utilize the existing backup infrastructure by integrating additional content analysis routines and extracting already available filesystem metadata over time. This is used to perform data analysis and trending to add performance optimization and self-management capabilities to backup and information management tasks. Backup management faces serious challenges on its own: processing ever increasing amount of data while meeting the timing constraints of backup windows could require adaptive changes in backup scheduling routines. We revisit a traditional backup job scheduling and demonstrate that random job scheduling may lead to inefficient backup processing and an increased backup time. In this work, we use a historic information about the object backup processing time and suggest an additional job scheduling, and automated parameter tuning which may significantly optimize the overall backup time. Under this scheduling, called LBF, the longest backups (the objects with longest backup time) are scheduled first. We evaluate the performance benefits of the introduced scheduling using a realistic workload collected from the seven backup servers at HP Labs. Significant reduction of the backup time (up to 30%) and improved quality of service can be achieved under the proposed job assignment policy.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MASCOT.2009.5367043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Analyzing and managing large amounts of unstructured information is a high priority task for many companies. For implementing content management solutions, companies need a comprehensive view of their unstructured data. In order to provide a new level of intelligence and control over data resident within the enterprise, one needs to build a chain of tools and automated processes that enable the evaluation, analysis, and visibility into information assets and their dynamics during the information life-cycle. We propose a novel framework to utilize the existing backup infrastructure by integrating additional content analysis routines and extracting already available filesystem metadata over time. This is used to perform data analysis and trending to add performance optimization and self-management capabilities to backup and information management tasks. Backup management faces serious challenges on its own: processing ever increasing amount of data while meeting the timing constraints of backup windows could require adaptive changes in backup scheduling routines. We revisit a traditional backup job scheduling and demonstrate that random job scheduling may lead to inefficient backup processing and an increased backup time. In this work, we use a historic information about the object backup processing time and suggest an additional job scheduling, and automated parameter tuning which may significantly optimize the overall backup time. Under this scheduling, called LBF, the longest backups (the objects with longest backup time) are scheduled first. We evaluate the performance benefits of the introduced scheduling using a realistic workload collected from the seven backup servers at HP Labs. Significant reduction of the backup time (up to 30%) and improved quality of service can be achieved under the proposed job assignment policy.

查看原文本刊更多论文

增强和优化数据保护解决方案

分析和管理大量非结构化信息是许多公司的首要任务。为了实现内容管理解决方案，公司需要一个非结构化数据的全面视图。为了对企业内驻留的数据提供更高级别的智能和控制，需要构建一系列工具和自动化流程，以便在信息生命周期中对信息资产及其动态进行评估、分析和可见性。我们提出了一个新的框架，通过集成额外的内容分析例程和提取已经可用的文件系统元数据来利用现有的备份基础设施。它用于执行数据分析和趋势分析，以便为备份和信息管理任务添加性能优化和自我管理功能。备份管理本身就面临着严峻的挑战:在处理不断增加的数据量的同时满足备份窗口的时间限制，可能需要对备份调度例程进行自适应更改。我们重新讨论传统的备份作业调度，并证明随机作业调度可能导致低效的备份处理和增加的备份时间。在这项工作中，我们使用关于对象备份处理时间的历史信息，并建议额外的作业调度和自动参数调优，这可能会显着优化总体备份时间。在这种称为LBF的调度下，首先调度时间最长的备份(备份时间最长的对象)。我们使用从HP Labs的七个备份服务器收集的实际工作负载来评估引入的调度的性能优势。在建议的工作分配政策下，可大幅减少备份时间(最多可达30%)，并改善服务质素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems

自引率

0.00%

发文量