Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)最新文献_第9页

Preprocessing text to improve compression ratios 预处理文本以提高压缩比

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672295

H. Kruse, A. Mukherjee

{"title":"Preprocessing text to improve compression ratios","authors":"H. Kruse, A. Mukherjee","doi":"10.1109/DCC.1998.672295","DOIUrl":"https://doi.org/10.1109/DCC.1998.672295","url":null,"abstract":"Summary form only given. We discuss the use of a text preprocessing algorithm that can improve the compression ratio of standard data compression algorithms, in particular 'bzip2', when used on text files, by up to 20%. The text preprocessing algorithm uses a static dictionary of the English language that is kept separately from the compressed file. The method in which the dictionary is used by the algorithm to transform the text is based on earlier work of Holger Kruse, Amar Mukherjee (see Proc. Data Comp. Conf., IEEE Comp. Society Press, p.447, 1997). The idea is to replace each word in the input text by a character sequence which encodes the position of the original word in the dictionary. The character sequences used for this encoding are chosen carefully in such a way that specific back-end compression algorithms can often compress these sequences more easily than the original words, increasing the overall compression ratio for the input text. In addition to the original method, this paper describes a variation of the method specifically for the 'bzip2' data compression algorithm. The new method yields an improvements in compression ratio of up to 20% over bzip2. We also describe methods how our algorithm can be used on wide area networks such as the Internet, and in particular how dictionaries can automatically be synchronized and kept up to date in a distributed environment, by using the existing system of URLs, caching and document types, and applying it to dictionaries and text files.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128853686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

A multimode context-based lossless wavelet image coder 基于上下文的多模无损小波图像编码器

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672300

Tetra Lindarto

{"title":"A multimode context-based lossless wavelet image coder","authors":"Tetra Lindarto","doi":"10.1109/DCC.1998.672300","DOIUrl":"https://doi.org/10.1109/DCC.1998.672300","url":null,"abstract":"Summary form only given. Currently, the most difficult part of a context-based lossless image compression system is how to determine the contexts. There is a tendency that the techniques for choosing contexts are based on practical experiences. For example, Wu et al. proposed the gradient of the current pixel as the base to choose the contexts for CALIC. Weinberger et al. proposed a similar strategy for LOGO-I. The main idea of this paper is to establish a general concept for the context selection mechanism. One way to do it is to employ an adaptive system that can switch between context schemes if there is enough evidence that switching will give us some performance gains. To be able to do this, some statistics such as the message length for each context scheme and the occurrence of each scheme-have to be collected along the way. Given several possible context selection schemes, the proposed system chooses the scheme that provides the best total message length in the immediate past. The first objective of this switching mechanism is to improve the compression even further. This new concept is combined with wavelet transformations to form a new lossless image compression system. Golomb-Rice code is used to encode the symbols, to reduce the overhead in updating the statistics and actually encoding the symbols. The current results of the experiments indicate that switching can improve the performance only slightly. This is due to the fact that each possible context performs almost the same. The overall results are slightly worse than CALIC or LOGO.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125015109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Breakpoint skeletal representation and compression of document images 断点骨架表示和文档图像的压缩

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672317

David Tam, W. Barrett, B. Morse, Eric N. Mortensen

引用次数: 1

The context trees of block sorting compression 块排序压缩的上下文树

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672147

N. Larsson

引用次数: 52

Multiple pattern matching in LZW compressed text LZW压缩文本中的多模式匹配

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672136

T. Kida, M. Takeda, A. Shinohara, Masamichi Miyazaki, S. Arikawa

引用次数: 69

Efficient algorithms for optimal video transmission 优化视频传输的高效算法

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672151

D. Kozen, Y. Minsky, B. Smith

引用次数: 22

Fast wavelet packet image compression 快速小波包图像压缩

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672305

François G. Meyer, A. Averbuch, J. Strömberg, R. Coifman

引用次数: 21

Pattern matching in text compressed with the ID heuristic 使用ID启发式压缩文本中的模式匹配

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672137

Piera Barcaccia, A. Cresti, S. Agostino

引用次数: 11

On optimality of variants of the block sorting compression 块排序压缩变体的最优性

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672312

K. Sadakane

{"title":"On optimality of variants of the block sorting compression","authors":"K. Sadakane","doi":"10.1109/DCC.1998.672312","DOIUrl":"https://doi.org/10.1109/DCC.1998.672312","url":null,"abstract":"Summary form only given. Block sorting uses the Burrows-Wheeler transformation (BWT) which permutes an input string. The permutation is defined by the lexicographic order of contexts of symbols. If we assume that symbol probability is defined by preceding k symbols called context, symbols whose contexts are the same are collected in consecutive regions after the BWT. Sadakane (1997) proposed a variant of the block sorting and it is asymptotically optimal for any finite-order Markov source if permutation of symbols whose contexts are the same is random. However, the variant encodes 1 symbols as a block and therefore it is not practical because 1 is large. We propose two compression schemes not using blocks but encoding symbols one by one by using arithmetic codes. The move-to-front transformation is not used. The former encodes symbols by different codes defined by symbol frequencies in contexts. It is asymptotically optimal for k-th order Markov sources. However, it is available only if the order k of the source is already known. The latter divides the permuted string into many parts and encodes symbols using different arithmetic codes by the parts. Each part, has symbols whose contexts are the same. If the permutation is random, the scheme is asymptotically optimal for any finite-order Markov source. The permutation in the BWT is not completely random. However, we conjecture that the permuted string is memoryless and our schemes work.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124895533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Simple pre-processors significantly improve LZ 1 compression 简单的预处理器显著改善了lz1压缩

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI: 10.1109/DCC.1998.672261

D. J. Craft

引用次数: 0