A Dictionary based Compression Scheme for Natural Language Text with Reduced Bit Encoding

Md. Ashiq Mahmood, K. Hasan
{"title":"A Dictionary based Compression Scheme for Natural Language Text with Reduced Bit Encoding","authors":"Md. Ashiq Mahmood, K. Hasan","doi":"10.1109/RAAICON48939.2019.62","DOIUrl":null,"url":null,"abstract":"Data compression, also called compaction, the process of reducing the amount of data needed for the storage or transmission of a given piece of information, typically by the use of encoding techniques. Character encoding is genuinely related to data compression which represents characters with a type of encoding technique. Encoding characterizes the way toward putting a movement of characters into a specific arrangement for incredible transmission or point of confinement. Compression of data covers a goliath space of employments including data correspondence, data securing and database improvement. For the most part two surely understood compression procedures named Huffman and LZW are really utilized for text compression. In this paper, we propose an effective and straightforward compression techniques for huge common text by a 5 bit encoding scheme which can convert 8 bit characters to 5 bit named 5 Bit Encoding Scheme (5BE). It can most likely beat Huffman and LZW regarding compression proportion. This plan gives an encoding calculation changing over any 8 bit characters in English and Bangla by 5 bit by using a look up table. The look up table is created by utilizing Zipf dissemination which is a discrete circulation of generally utilized characters in various dialects. In the wake of changing over the characters into 5 bit, we consistently ascertain a k-Series scheme to build a database dictionary. With the penalty of storage for the dictionary, we compress a natural text by 87%. This dictionary will be used by the compression and decompression algorithms and to be employed in the client side. Therefore, constructed only once. Hence the facilities provided by the compression technique will be found without interruption. The reverse algorithm to recuperate the genuine data is additionally illustrated. We compare our algorithm to both the known Huffman and LZW technique. Promising efficiency is exhibited by our experimental result.","PeriodicalId":102214,"journal":{"name":"2019 IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things (RAAICON)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things (RAAICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAAICON48939.2019.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Data compression, also called compaction, the process of reducing the amount of data needed for the storage or transmission of a given piece of information, typically by the use of encoding techniques. Character encoding is genuinely related to data compression which represents characters with a type of encoding technique. Encoding characterizes the way toward putting a movement of characters into a specific arrangement for incredible transmission or point of confinement. Compression of data covers a goliath space of employments including data correspondence, data securing and database improvement. For the most part two surely understood compression procedures named Huffman and LZW are really utilized for text compression. In this paper, we propose an effective and straightforward compression techniques for huge common text by a 5 bit encoding scheme which can convert 8 bit characters to 5 bit named 5 Bit Encoding Scheme (5BE). It can most likely beat Huffman and LZW regarding compression proportion. This plan gives an encoding calculation changing over any 8 bit characters in English and Bangla by 5 bit by using a look up table. The look up table is created by utilizing Zipf dissemination which is a discrete circulation of generally utilized characters in various dialects. In the wake of changing over the characters into 5 bit, we consistently ascertain a k-Series scheme to build a database dictionary. With the penalty of storage for the dictionary, we compress a natural text by 87%. This dictionary will be used by the compression and decompression algorithms and to be employed in the client side. Therefore, constructed only once. Hence the facilities provided by the compression technique will be found without interruption. The reverse algorithm to recuperate the genuine data is additionally illustrated. We compare our algorithm to both the known Huffman and LZW technique. Promising efficiency is exhibited by our experimental result.
基于字典的自然语言文本降位压缩方案
数据压缩,也称为压缩,通常通过使用编码技术来减少存储或传输给定信息所需的数据量的过程。字符编码实际上与数据压缩有关,它用一种编码技术表示字符。编码描述了将字符的运动置于一种特定的安排中,以实现不可思议的传输或限制点的方式。数据压缩涵盖了大量的应用领域,包括数据通信、数据保护和数据库改进。在大多数情况下,两个很容易理解的压缩过程,即Huffman和LZW,实际上用于文本压缩。在本文中,我们提出了一种有效和直接的5位编码方案,该方案可以将8位字符转换为5位字符,称为5位编码方案(5BE)。在压缩比方面,它很可能击败霍夫曼和LZW。该方案使用查找表对英语和孟加拉语中任意8位字符进行5位的编码计算。查找表是利用Zipf传播创建的,Zipf传播是各种方言中常用字符的离散循环。在将字符转换为5位之后,我们始终确定k-Series方案来构建数据库字典。以字典的存储空间为代价,我们将自然文本压缩了87%。该字典将被压缩和解压缩算法使用,并将在客户端使用。因此,只构造一次。因此,压缩技术提供的便利将不会中断。另外还说明了恢复真实数据的反向算法。我们将我们的算法与已知的霍夫曼和LZW技术进行了比较。实验结果显示了良好的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信