Donag:生成有效的补丁和差异压缩档案

Michael J. May
{"title":"Donag:生成有效的补丁和差异压缩档案","authors":"Michael J. May","doi":"10.1145/3507919","DOIUrl":null,"url":null,"abstract":"Differencing between compressed archives is a common task in file management and synchronization. Applications include source code distribution, application updates, and document synchronization. General purpose binary differencing tools can create and apply patches to compressed archives, but don’t consider the internal structure of the compressed archive or the file lifecycle. Therefore, they miss opportunities to save space based on the archive’s internal structure and metadata. To address the gap, we develop a content-aware, format independent theory for differencing on compressed archives and propose a canonical form and digest for compressed archives. Based on them, we present Donag, a content-aware differencing and patching algorithm that produces smaller patches than general purpose binary differencing tools on versioned archives by exploiting the compressed archives’ internal structure. Donag uses the VCDiff and BSDiff engines internally. We compare Donag’s patches to ones produced by bsdiff, xdelta3, and Delta++ on three classes of compressed archives: open-source code repositories, large and small applications, and office productivity documents (DOCX, XLSX, PPTX). Donag’s patches are typically 10% to 89% smaller than those produced by bsdiff, xdelta3, and Delta++, with reasonable memory overhead and throughput on commodity hardware. In the worst case, Donag’s patches are negligibly larger.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Donag: Generating Efficient Patches and Diffs for Compressed Archives\",\"authors\":\"Michael J. May\",\"doi\":\"10.1145/3507919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Differencing between compressed archives is a common task in file management and synchronization. Applications include source code distribution, application updates, and document synchronization. General purpose binary differencing tools can create and apply patches to compressed archives, but don’t consider the internal structure of the compressed archive or the file lifecycle. Therefore, they miss opportunities to save space based on the archive’s internal structure and metadata. To address the gap, we develop a content-aware, format independent theory for differencing on compressed archives and propose a canonical form and digest for compressed archives. Based on them, we present Donag, a content-aware differencing and patching algorithm that produces smaller patches than general purpose binary differencing tools on versioned archives by exploiting the compressed archives’ internal structure. Donag uses the VCDiff and BSDiff engines internally. We compare Donag’s patches to ones produced by bsdiff, xdelta3, and Delta++ on three classes of compressed archives: open-source code repositories, large and small applications, and office productivity documents (DOCX, XLSX, PPTX). Donag’s patches are typically 10% to 89% smaller than those produced by bsdiff, xdelta3, and Delta++, with reasonable memory overhead and throughput on commodity hardware. In the worst case, Donag’s patches are negligibly larger.\",\"PeriodicalId\":273014,\"journal\":{\"name\":\"ACM Transactions on Storage (TOS)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Storage (TOS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3507919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage (TOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3507919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在文件管理和同步中,区分压缩档案是一项常见的任务。应用程序包括源代码分发、应用程序更新和文档同步。通用的二进制差异工具可以创建并应用补丁到压缩归档,但不考虑压缩归档的内部结构或文件生命周期。因此,它们错过了基于归档的内部结构和元数据节省空间的机会。为了解决这一差距,我们提出了一种内容感知的、格式独立的压缩档案差异理论,并提出了压缩档案的规范形式和摘要。在此基础上,我们提出了Donag,一种内容感知差分和补丁算法,通过利用压缩档案的内部结构,在版本化档案上产生比通用二进制差分工具更小的补丁。Donag在内部使用VCDiff和BSDiff引擎。我们将Donag的补丁与bsdiff、xdelta3和Delta++在三类压缩归档上生成的补丁进行了比较:开源代码存储库、大型和小型应用程序以及办公生产力文档(DOCX、XLSX、PPTX)。dong的补丁通常比bsdiff、xdelta3和dell++产生的补丁小10%到89%,并且在商用硬件上具有合理的内存开销和吞吐量。在最坏的情况下,Donag的补丁要大得可以忽略不计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Donag: Generating Efficient Patches and Diffs for Compressed Archives
Differencing between compressed archives is a common task in file management and synchronization. Applications include source code distribution, application updates, and document synchronization. General purpose binary differencing tools can create and apply patches to compressed archives, but don’t consider the internal structure of the compressed archive or the file lifecycle. Therefore, they miss opportunities to save space based on the archive’s internal structure and metadata. To address the gap, we develop a content-aware, format independent theory for differencing on compressed archives and propose a canonical form and digest for compressed archives. Based on them, we present Donag, a content-aware differencing and patching algorithm that produces smaller patches than general purpose binary differencing tools on versioned archives by exploiting the compressed archives’ internal structure. Donag uses the VCDiff and BSDiff engines internally. We compare Donag’s patches to ones produced by bsdiff, xdelta3, and Delta++ on three classes of compressed archives: open-source code repositories, large and small applications, and office productivity documents (DOCX, XLSX, PPTX). Donag’s patches are typically 10% to 89% smaller than those produced by bsdiff, xdelta3, and Delta++, with reasonable memory overhead and throughput on commodity hardware. In the worst case, Donag’s patches are negligibly larger.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信