Fadlelmoula Mohamed Baloul, Mohsin Hassan Abdullah, E. A. Babikir
{"title":"ETAOSD: Static dictionary-based transformation method for text compression","authors":"Fadlelmoula Mohamed Baloul, Mohsin Hassan Abdullah, E. A. Babikir","doi":"10.1109/ICCEEE.2013.6633967","DOIUrl":null,"url":null,"abstract":"The aim of this paper is to present a new static dictionary-based algorithm for text transformation to increase the data compression ratio when using standard compression tools. The basic idea of the new algorithm is to define a pattern for each word in a static dictionary by replacing all or most of the characters in the words of the dictionary by the most frequently used character in any text file. The proposed algorithm transforms any text file into another encrypted file with a size almost the same as that of the original text file but with different statistical properties. The new transformation method has been designed, implemented, and tested using Gutenburg Corpus. Generally, the output result has shown different levels of enhancements on different common standard data compression tools such as Arithmetic, Huffman, Bzip2, Gzip and WinZip. The compression performance of all common compression tools has been enhanced especially when the patterns of the transformed words passed through costless running length encoding (RLE) algorithm. On using Bzip2, the resultant output files produced about 76.75% as compression ratio with 1.88 as average code length. The final result is very promising and it could be enhanced more in case of applying dynamic dictionary-based text transformation technique.","PeriodicalId":256793,"journal":{"name":"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEEE.2013.6633967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The aim of this paper is to present a new static dictionary-based algorithm for text transformation to increase the data compression ratio when using standard compression tools. The basic idea of the new algorithm is to define a pattern for each word in a static dictionary by replacing all or most of the characters in the words of the dictionary by the most frequently used character in any text file. The proposed algorithm transforms any text file into another encrypted file with a size almost the same as that of the original text file but with different statistical properties. The new transformation method has been designed, implemented, and tested using Gutenburg Corpus. Generally, the output result has shown different levels of enhancements on different common standard data compression tools such as Arithmetic, Huffman, Bzip2, Gzip and WinZip. The compression performance of all common compression tools has been enhanced especially when the patterns of the transformed words passed through costless running length encoding (RLE) algorithm. On using Bzip2, the resultant output files produced about 76.75% as compression ratio with 1.88 as average code length. The final result is very promising and it could be enhanced more in case of applying dynamic dictionary-based text transformation technique.