{"title":"Compression-Aware Algorithms for Massive Datasets","authors":"Nathan Brunelle, G. Robins, Abhi Shelat","doi":"10.1109/DCC.2015.74","DOIUrl":null,"url":null,"abstract":"While massive datasets are often stored in compressed format, most algorithms are designed to operate on uncompressed data. We address this growing disconnect by developing a framework for compression-aware algorithms that operate directly on compressed datasets. Synergistically, we also propose new algorithmically-aware compression schemes that enable algorithms to efficiently process the compressed data. In particular, we apply this general methodology to geometric / CAD datasets that are ubiquitous in areas such as graphics, VLSI, and geographic information systems. We develop example algorithms and corresponding compression schemes that address different types of datasets, including point sets and graphs. Our methods are more efficient than their classical counterparts, and they extend to both lossless and lossy compression scenarios. This motivates further investigation of how this approach can enable algorithms to process ever-increasing big data volumes.","PeriodicalId":313156,"journal":{"name":"2015 Data Compression Conference","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2015.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
While massive datasets are often stored in compressed format, most algorithms are designed to operate on uncompressed data. We address this growing disconnect by developing a framework for compression-aware algorithms that operate directly on compressed datasets. Synergistically, we also propose new algorithmically-aware compression schemes that enable algorithms to efficiently process the compressed data. In particular, we apply this general methodology to geometric / CAD datasets that are ubiquitous in areas such as graphics, VLSI, and geographic information systems. We develop example algorithms and corresponding compression schemes that address different types of datasets, including point sets and graphs. Our methods are more efficient than their classical counterparts, and they extend to both lossless and lossy compression scenarios. This motivates further investigation of how this approach can enable algorithms to process ever-increasing big data volumes.