XML Stream Data Reduction by Shared KST Signatures

S. Böttcher, Rita Hartel, C. Messinger
{"title":"XML Stream Data Reduction by Shared KST Signatures","authors":"S. Böttcher, Rita Hartel, C. Messinger","doi":"10.1109/HICSS.2009.1029","DOIUrl":null,"url":null,"abstract":"Within XML data streams, markup as defined e.g. in a DTD is not only being used for structuring large amounts of data, but also for efficiently searching, accessing, and processing the required parts of the data streams. However when huge amounts of XML data are involved, data reduction or compression techniques that still allow finding the required parts of the data fast may become crucial to handle data processing. We present a data reduction and compression technique for XML data streams that not only significantly reduces the amount of data, but also allows for efficient data processing without requiring a full data decompression. Our data reduction technique combines sub-tree sharing with removing structure that is known by a DTD. We have done extensive performance evaluations to compare our compression technique with other approaches to XML compression, and we show that we not only outperform the other techniques, but also outperform string compression techniques like gzip that do not support query processing on compressed data.","PeriodicalId":211759,"journal":{"name":"2009 42nd Hawaii International Conference on System Sciences","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 42nd Hawaii International Conference on System Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HICSS.2009.1029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Within XML data streams, markup as defined e.g. in a DTD is not only being used for structuring large amounts of data, but also for efficiently searching, accessing, and processing the required parts of the data streams. However when huge amounts of XML data are involved, data reduction or compression techniques that still allow finding the required parts of the data fast may become crucial to handle data processing. We present a data reduction and compression technique for XML data streams that not only significantly reduces the amount of data, but also allows for efficient data processing without requiring a full data decompression. Our data reduction technique combines sub-tree sharing with removing structure that is known by a DTD. We have done extensive performance evaluations to compare our compression technique with other approaches to XML compression, and we show that we not only outperform the other techniques, but also outperform string compression techniques like gzip that do not support query processing on compressed data.
共享KST签名的XML流数据缩减
在XML数据流中,如DTD中定义的标记不仅用于构造大量数据,而且还用于有效地搜索、访问和处理数据流中所需的部分。但是,当涉及到大量XML数据时,仍然允许快速找到所需数据部分的数据缩减或压缩技术可能对处理数据处理至关重要。我们提出了一种XML数据流的数据缩减和压缩技术,它不仅显著减少了数据量,而且允许在不需要完全数据解压缩的情况下进行有效的数据处理。我们的数据缩减技术结合了子树共享和删除DTD已知的结构。我们进行了广泛的性能评估,将我们的压缩技术与其他XML压缩方法进行比较,结果表明,我们不仅优于其他技术,而且优于不支持对压缩数据进行查询处理的gzip等字符串压缩技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信