Shirou Maruyama, M. Takeda, Masaya Nakahara, H. Sakamoto
{"title":"An Online Algorithm for Lightweight Grammar-Based Compression","authors":"Shirou Maruyama, M. Takeda, Masaya Nakahara, H. Sakamoto","doi":"10.1109/CCP.2011.40","DOIUrl":"https://doi.org/10.1109/CCP.2011.40","url":null,"abstract":"Grammar-based compression is a well-studied technique for constructing a small context-free grammar (CFG) uniquely deriving a given text. In this paper, we present an online algorithm for lightweight grammar-based compression. Our algorithm is based on the LCA algorithm [Sakamoto et al. 2004]which guarantees nearly optimum compression ratio and space. LCA, however, is an offline algorithm and requires external space to save space consumption. Therefore, we present its online version which inherits most characteristics of the original LCA. Our algorithm guarantees $O(log^2 n)$-approximation ratio for an optimum grammar size, and all work is carried out on a main memory space which is bounded by the output size. In addition, we propose more practical encoding based on parentheses representation of a binary tree. Experimental results for repetitive texts demonstrate that our algorithm achieves effective compression compared to other practical compressors and the space consumption of our algorithm is smaller than the input text size.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121376914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Generic Intrusion Detection and Diagnoser System Based on Complex Event Processing","authors":"M. Ficco, L. Romano","doi":"10.1109/CCP.2011.43","DOIUrl":"https://doi.org/10.1109/CCP.2011.43","url":null,"abstract":"This work presents a generic Intrusion Detection and Diagnosis System, which implements a comprehensive alert correlation workflow for detection and diagnosis of complex intrusion scenarios in Large scale Complex Critical Infrastructures. The on-line detection and diagnosis process is based on an hybrid and hierarchical approach, which allows to detect intrusion scenarios by collecting diverse information at several architectural levels, using distributed security probes, as well as perform complex event correlation based on a Complex Event Processing Engine. The escalation process from intrusion symptoms to the identified target and cause of the intrusion is driven by a knowledge-base represented by an ontology. A prototype implementation of the proposed Intrusion Detection and Diagnosis framework is also presented.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115533272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intrusion Tolerant Approach for Denial of Service Attacks to Web Services","authors":"M. Ficco, M. Rak","doi":"10.1109/CCP.2011.44","DOIUrl":"https://doi.org/10.1109/CCP.2011.44","url":null,"abstract":"Intrusion Detection Systems are the major technology used for protecting information systems. However, they do not directly detect intrusion, but they only monitor the attack symptoms. Therefore, no assumption can be made on the outcome of the attack, no assurance can be assumed once the system is compromised. The intrusion tolerance techniques focus on providing minimal level of services, even when the system has been partially compromised. This paper presents an intrusion tolerant approach for Denial of Service attacks to Web Services. It focuses on the detection of attack symptoms as well as the diagnosis of intrusion effects in order to perform a proper reaction only if the attack succeeds. In particular, this work focuses on a specific Denial of Service attack, called Deeply-Nested XML. Preliminary experimental results show that the proposed approach results in a better performance of the Intrusion Detection Systems, in terms of increasing diagnosis capacity as well as reducing the service unavailability during an intrusion.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130290173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture","authors":"Zhi Tang, Y. Won","doi":"10.1109/CCP.2011.20","DOIUrl":"https://doi.org/10.1109/CCP.2011.20","url":null,"abstract":"the fast development of Graphics Processing Unit (GPU) leads to the popularity of General-purpose usage of GPU (GPGPU). So far, most modern computers are CPU-GPGPU heterogeneous architecture and CPU is used as host processor. In this work, we promote a multithread file chunking prototype system, which is able to exploit the hardware organization of the CPU-GPGPU heterogeneous computer and determine which device should be used to chunk the file to accelerate the content based file chunking operation of deduplication. We built rules for the system to choose which device should be used to chunk file and also found the optimal choice of other related parameters of both CPU and GPGPU subsystem like segment size and block dimension. This prototype was implemented and tested. The result of using GTX460(336 cores) and Intel i5 (four cores) shows that this system can increase the chunking speed 63% compared to using GPGPU alone and 80% compared to using CPU alone.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122043683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Natural Language Compression per Blocks","authors":"P. Procházka, J. Holub","doi":"10.1109/CCP.2011.25","DOIUrl":"https://doi.org/10.1109/CCP.2011.25","url":null,"abstract":"We present a new natural language compression method: Semi-adaptive Two Byte Dense Code (STBDC). STBDC performs compression per blocks. It means that the input is divided into the several blocks and each of the blocks is compressed separately according to its own statistical model. To avoid the redundancy the final vocabulary file is composed as the sequence of the changes in the model of the two consecutive blocks. STBDC belongs to the family of Dense codes and keeps all their attractive properties including very high compression and decompression speed and acceptable compression ratio around 32 % on natural language text. Moreover STBDC provides other properties applicable in digital libraries and other textual databases. The compression method allows direct searching on the compressed text, whereas the vocabulary can be used as a block index. STBDC is very easy on limited bandwidth in the client/server architecture. It can send namely single compressed blocks only with corresponding part of the vocabulary. Further STBDC enables various approaches of updating and extending of the compressed text.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127488581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Hernández-Cabronero, Ian Blanes, J. Serra-Sagristà, M. Marcellin
{"title":"A Review of DNA Microarray Image Compression","authors":"Miguel Hernández-Cabronero, Ian Blanes, J. Serra-Sagristà, M. Marcellin","doi":"10.1109/CCP.2011.21","DOIUrl":"https://doi.org/10.1109/CCP.2011.21","url":null,"abstract":"We review the state of the art in DNA micro array image compression. First, we describe the most relevant approaches published in the literature and classify them according to the stage of the typical image compression process where each approach makes its contribution. We then summarize the compression results reported for these specific-specific image compression schemes. In a set of experiments conducted for this paper, we obtain results for several popular image coding techniques, including the most recent coding standards. Prediction-based schemes CALIC and JPEG-LS, and JPEG2000 using zero wavelet decomposition levels are the best performing standard compressors, but are all outperformed by the best micro array-specific technique, Battiato's CNN-based scheme.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128618246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Type Classes and Their Entropy Functions","authors":"J. Kieffer","doi":"10.1109/CCP.2011.36","DOIUrl":"https://doi.org/10.1109/CCP.2011.36","url":null,"abstract":"For each $j geq 1$, if $T_j$ is the finite rooted binary tree with $2^j$ leaves, the hierarchical type class of binary string $x$ of length $2^j$ is obtained by placing the entries of $x$ as label son the leaves of $T_j$ and then forming all permutations of $x$according to the permutations of the leaf labels under all isomorphisms of tree $T_j$ into itself. The set of binary strings of length $2^j$ is partitioned into hierarchical type classes, and in each such class, all of the strings have the same type $(n_0^j, n_1^j)$, where $n_0^j, n_1^j$ are respectively the numbers of zeroes and ones in the strings. Let $p(n_0^j, n_1^j)$ be the probability vector $(n_0^j/2^j, n_1^j/2^j)$belonging to the set ${cal P}_2$ of all two-dimensional probability vectors. For each $j geq 1$, and each of the $2^j+1$ possible types $(n_0^j, n_1^j)$, a hierarchical type class ${cal S}(n_0^j, n_1^j)$is specified. Conditions are investigated under which there will exist a function $h:{cal P}_2to [0, infty)$ such that for each $pin {cal P}_2$, if ${(n_0^j, n_1^j):jgeq 1}$ is any sequence of types for which $p(n_0^j, n_1^j) to p$, then the sequence ${2^{-j}log_2({rm card}({cal S}(n_0^j, n_1^j))):j geq 1}$converges to $h(p)$. Such functions $h$, called hierarchical entropy functions, play the same role in hierarchical type class coding theory that the Shannon entropy function on ${cal P}_2$ does in traditional type class coding theory, except that there are infinitely many hierarchical entropy functions but only one Shannon entropy function. One of the hierarchical entropy functions $h$ that is studied is a self-affine function for which a closed-form expression is obtained making use of an iterated function system whose attractor is the graph of $h$.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130835446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Electrophysiological Data Processing Using a Dynamic Range Compressor Coupled to a Ten Bits A/D Convertion Port","authors":"F. Babarada, C. Ravariu, A. Janel","doi":"10.1109/CCP.2011.24","DOIUrl":"https://doi.org/10.1109/CCP.2011.24","url":null,"abstract":"The paper presents a hardware solution of the in vivo electrophysiological signals processing, using a continuous data acquisition on PC. The originality of the paper comes from some blocks proposal, which selective amplify the bio signals. One of the major problems in the electrophysiological signals monitoring is the impossibility to record the weak signals from deep organs that are covered by noise or by strong cardiac or muscular signals. An automatic gain control block is used, so that the high power skin signals are less amplified than the low components. The analog processing block is based on a dynamic range compressor, containing the automatic gain control block. The following block is a clipper since to capture all the transitions that escape from the dynamic range compressor. At clipper output a low-pass filter is connected since to abruptly cut the high frequencies, like 50Hz, ECG. The data vector recording is performing by strong internal resources micro controller including ten bits A/D conversion port.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114969853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pattern Matching on Sparse Suffix Trees","authors":"R. Kolpakov, G. Kucherov, Tatiana Starikovskaya","doi":"10.1109/CCP.2011.45","DOIUrl":"https://doi.org/10.1109/CCP.2011.45","url":null,"abstract":"We consider a compact text index based on evenly spaced sparse suffix trees of a text [9]. Such a tree is defined by partitioning the text into blocks of equal size and constructing the suffix tree only for those suffixes that start at block boundaries. We propose a new pattern matching algorithm on this structure. The algorithm is based on a notion of suffix links different from that of [9] and on the packing of several letters into one computer word.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"408 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116035739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Axiomatic Approach to the Notion of Similarity of Individual Sequences and Their Classification","authors":"J. Ziv","doi":"10.1109/CCP.2011.29","DOIUrl":"https://doi.org/10.1109/CCP.2011.29","url":null,"abstract":"An axiomatic approach to the notion of similarity of sequences, that seems to be natural in many cases (e.g. Phylogenetic analysis), is proposed. Despite of the fact that it is not assume that the sequences are a realization of a probabilistic process (e.g. a variable-order Markov process), it is demonstrated that any classifier that fully complies with the proposed similarity axioms must be based on modeling of the training data that is contained in a (long) individual training sequence via a suffix tree with no more than O(N) leaves (or, alternatively, a table with O(N) entries) where N is the length of the test sequence. Some common classification algorithms may be slightly modified to comply with the proposed axiomatic conditions and the resulting organization of the training data, thus yielding a formal justification for their good empirical performance without relying on any a-priori (sometimes unjustified)probabilistic assumption. One such case is discussed in details.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123550957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}