{"title":"A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation","authors":"K. Sadakane","doi":"10.1109/DCC.1998.672139","DOIUrl":"https://doi.org/10.1109/DCC.1998.672139","url":null,"abstract":"We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an array of indexes of suffixes is called a suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the Burrows-Wheeler (see Technical Report 124, Digital SRC Research Report, 1994) transformation in the block sorting text compression, therefore fast sorting algorithms are desired. We compare algorithms for making suffix arrays of Bentley-Sedgewick (see Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, p.360-9, 1997), Andersson-Nilsson (see 35th Symp. on Foundations of Computer Science, p.714-21, 1994) and Karp-Miller-Rosenberg (1972) and making suffix trees of Larsson (see Data Compression Conference, p.190-9, 1996) on the speed and required memory and propose a new algorithm which is fast and memory efficient by combining them. We also define a measure of difficulty of sorting suffixes: average match length. Our algorithm is effective when the average match length of a text is large, especially for large databases.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127800907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extending TMW for near lossless compression of greyscale images","authors":"B. Meyer, P. Tischer","doi":"10.1109/DCC.1998.672194","DOIUrl":"https://doi.org/10.1109/DCC.1998.672194","url":null,"abstract":"We present a general purpose lossless greyscale image compression method, TMW, that is based on the use of linear predictors and implicit segmentation. We then proceed to extend the presented methods to cover near lossless image compression. In order to achieve competitive compression, the compression process is split into an analysis step and a coding step. In the first step, a set of linear predictors and other parameters suitable for the image is calculated, which is included in the compressed file and subsequently used for the coding step. This adaption allows TMW to perform well over a very wide range of image types. Other significant features of TMW are the use of a one-parameter probability distribution, probability calculations based on unquantized prediction values, blending of multiple probability distributions instead of prediction values, and implicit image segmentation. For lossless image compression, the method has been compared to CALIC on a selection of test images, and typically outperforms it by between 2 and 10 percent. For near lossless image compression, the method has been compared to LOCO (Weinberger et al. 1996). Especially for larger allowed deviations from the original image the proposed method can significantly outperform LOCO. In both cases the improvement in compression is achieved at the cost of considerably higher computational complexity.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal decoding of entropy coded memoryless sources over binary symmetric channels","authors":"K.P. Subbalakshmi, J. Vaisey","doi":"10.1109/DCC.1998.672315","DOIUrl":"https://doi.org/10.1109/DCC.1998.672315","url":null,"abstract":"Summary form only given. Entropy codes (e.g. Huffman codes) are often used to improve the rate-distortion performance of codecs for most sources. However, transmitting entropy coded sources over noisy channels can cause the encoder and decoder to lose synchronization, because the codes tend to be of variable length. Designing optimal decoders to deal with this problem is nontrivial since it is no longer optimal to process the data in fixed-length blocks, as is done with fixed-length codes. This paper deals with the design of an optimal decoder (MAPD), in the maximum a posteriori (MAP) sense, for an entropy coded memoryless source transmitted over a binary symmetric channel (BSC) with channel cross over probability /spl epsiv/. The MAP problem is cast in a dynamic programming framework and a Viterbi like implementation of the decoder is presented. At each stage the MAPD performs two operations: the metric-update and the merger-check operations. A stream of 40,000 samples of a zero mean, unit variance, Gaussian source, quantized with uniform, N-level quantizers was Huffman encoded and the resulting bit stream was transmitted over a BSC. Experiments were performed for values of N ranging from 128 to 1024 and for four different random error patterns, obtained using a random number generator. The results demonstrate that the MAPD performs better than the HD on an average, whenever /spl epsiv/ is comparable to the source probabilities. A maximum reduction of 2.94% in the bits that are out of synchronization, was achieved for the 1024 level quantizer.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134559244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the performance of vector quantizers empirically designed from dependent sources","authors":"A. Zeevi","doi":"10.1109/DCC.1998.672133","DOIUrl":"https://doi.org/10.1109/DCC.1998.672133","url":null,"abstract":"Suppose we are given n real valued samples Z/sub 1/, Z/sub 2/, ..., Z/sub n/ from a stationary source P. We consider the following question. For a compression scheme that uses blocks of length k, what is the minimal distortion (for encoding the true source P) induced by a vector quantizer of fixed rate R, designed from the training sequence. For a certain class of dependent sources, we derive conditions ensuring that the empirically designed quantizer performs as well (on the average) as the optimal quantizer, for almost every training sequence emitted by the source. In particular, we observe that for a code rate R, the optimal way to choose the dimension of the quantizer is k/sub n/=[(1-/spl delta/)R/sup -1/ log n]. The problem of empirical design of a vector quantizer of fixed dimension k based on a vector valued training sequence X/sub 1/, X/sub 2/, ..., X/sub n/ is also considered. For a class of dependent sources, it is shown that the mean squared error (MSE) of the empirically designed quantizer w.r.t the true source distribution converges to the minimum possible MSE at a rate of O(/spl radic/(log n/n)), for almost every training sequence emitted by the source. In addition, the expected value of the distortion redundancy-the difference between the MSEs of the quantizers-converges to zero for a sequence of increasing block lengths k, if we have at our disposal corresponding training sequences whose length grows as n=2/sup (R+/spl delta/)k/. Some of the derivations extend results in empirical quantizer design using an i.i.d. Training sequence, obtained by Linder et al. (see IEEE Trans. on Info. Theory, vol.40, p.1728-40, 1994) and Merhav and Ziv (see IEEE Trans. on Info. Theory, vol.43, p.1112-23, 1997). Proof of the techniques rely on the results in the theory of empirical processes, indexed by VC function classes.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133374996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some theory and practice of greedy off-line textual substitution","authors":"A. Apostolico, S. Lonardi","doi":"10.1109/DCC.1998.672138","DOIUrl":"https://doi.org/10.1109/DCC.1998.672138","url":null,"abstract":"Greedy off-line textual substitution refers to the following steepest descent approach to compression or structural inference. Given a long text string x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted text string, until substrings capable of producing contractions can no longer be found. This paper examines the computational issues and performance resulting from implementations of this paradigm in preliminary applications and experiments. Apart from intrinsic interest, these methods may find use in the compression of massively disseminated data, and lend themselves to efficient parallel implementation, perhaps on dedicated architectures.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114768271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A locally optimal design algorithm for block-based multi-hypothesis motion-compensated prediction","authors":"M. Flierl, T. Wiegand, B. Girod","doi":"10.1109/DCC.1998.672152","DOIUrl":"https://doi.org/10.1109/DCC.1998.672152","url":null,"abstract":"Multi-hypothesis motion-compensated prediction extends traditional motion-compensated prediction used in video coding schemes. Known algorithms for block-based multi-hypothesis motion-compensated prediction are, for example, overlapped block motion compensation (OBMC) and bidirectionally predicted frames (B-frames). This paper presents a generalization of these algorithms in a rate-distortion framework. All blocks which are available for prediction are called hypotheses. Further, we explicitly distinguish between the search space and the superposition of hypotheses. Hypotheses are selected from a search space and their spatio-temporal positions are transmitted by means of spatio-temporal displacement codewords. Constant predictor coefficients are used to combine linearly hypotheses of a multi-hypothesis. The presented design algorithm provides an estimation criterion for optimal multi-hypotheses, a rule for optimal displacement codes, and a condition for optimal predictor coefficients. Statistically dependent hypotheses of a multi-hypothesis are determined by an iterative algorithm. Experimental results show that Increasing the number of hypotheses from 1 to 8 provides prediction gains up to 3 dB in prediction error.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123543593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tag based models of English text","authors":"W. Teahan, J. Cleary","doi":"10.1109/DCC.1998.672130","DOIUrl":"https://doi.org/10.1109/DCC.1998.672130","url":null,"abstract":"The problem of compressing English text is important both because of the ubiquity of English as a target for compression and because of the light that compression can shed on the structure of English. English text is examined in conjunction with additional information about the parts of speech of each word in the text (these are referred to as \"tags\"). It is shown that the tags plus the text can be compressed more than the text alone. Essentially the tags can be compressed for nothing or even a small net saving in size. A comparison is made of a number of different ways of integrating compression of tags and text using an escape mechanism similar to PPM. These are also compared with standard word based and character based compression programs. The result is that the tag and word based schemes always outperform the character based schemes. Overall, the tag based schemes outperform the word based schemes. We conclude by conjecturing that tags chosen for compression rather than linguistic purposes would perform even better.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122450310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantization, classification, and density estimation for Kohonen's Gaussian mixture","authors":"R. Gray, K. Perlmutter, R. Olshen","doi":"10.1109/DCC.1998.672132","DOIUrl":"https://doi.org/10.1109/DCC.1998.672132","url":null,"abstract":"We consider the problem of joint quantization and classification for the example of a simple Gaussian mixture used by Kohonen (1988) to demonstrate the performance of his \"learning vector quantization\" (LVQ). Implicit in the problem is the issue of estimating the underlying densities, which is accomplished by CART/sup TM/ and by an inverse halftoning method.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122747412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-speed software implementation of Huffman coding","authors":"Mikio Kawahara, Yi-jen Chiu, T. Berger","doi":"10.1109/DCC.1998.672291","DOIUrl":"https://doi.org/10.1109/DCC.1998.672291","url":null,"abstract":"Summary form only given. Huffman coding has been applied in many disciplines including text compression, still image compression, and video compression. In the case of video, several software-only codecs have been developed aiming at real-time performance. However, they fall well short of achieving full screen, high-quality, and full-motion. A key bottleneck in such software implementations occurs during writing and reading of the Huffman-coded bit stream. The reason is that the minimum unit for software operations is a byte rather than a bit. Therefore, the need to read/write data in bits necessitates devising means to bridge the bits-to-bytes and and bytes-to-bits gaps efficiently. We introduce new tables for software-only Huffman coding that enable us to write/read data in M-bit units, thereby virtually eliminating the need for inefficient software simulation of bit-based operations. We present an offset-based Huffman encoding table containing information for each number of bits by which the length of a Huffman word is offset from an integral number of bytes. The number of rows in the offset-based Huffman encoding table is M times that of the general Huffman table. The offset-based Huffman encoding table eliminates most of the bit operations, but it still requires some bit-based operations to update the offset and the condition of the current unit. In order to avoid bit operations entirely, we extend the offset-based table into a byte-based table by associating a unique subtable with each offset and unit condition. A byte-based Huffman decoding table has also been studied.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125405819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of a joint video coding system","authors":"L. Teixeira","doi":"10.1109/DCC.1998.672318","DOIUrl":"https://doi.org/10.1109/DCC.1998.672318","url":null,"abstract":"Summary form only given. When studying the dynamic of joint video coding systems several issues need to be considered. Video quality estimation and multiplexing gain are usually based on simple scenarios, and in general consider separately networking and source coding. However, the key to successful implementation of VBR video transmission lies in the interface between video and network, specifically in the rules used to determine the bit rate that can be allowed into the network from each source. Once we have specified the interface we can produce models for the video bit rate under the chosen set of rate constraints and use these models to estimate network utilisation. In our work, the different encoders share the same bit rate control. This joint bit rate control allocates bits to each program in such a way that the sum of all the bit rates meets the negotiated connection parameters. Experiments have been performed keeping the global bandwidth constant. Video sources are grouped into three different classes, each one exhibiting different combined levels of spatial detail and amount of movement. Simulation results show that bandwidth gains/quality improvements are more significant when heterogeneous sources are multiplexed together, especially when video sources from classes C and B are present. Our current work includes video sequences with shot changes and variable GOP sizes. Developments are being done on the implementation of the protocol to improve the management of the communications between video encoders and multiplexer.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116418617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}