多级阿拉伯语和土耳其语文本压缩通过字符编码和7-Zip

J. Ubiquitous Syst. Pervasive Networks Pub Date : 2021-03-01 DOI:10.5383/JUSPN.15.01.002

Tariq Abu Hilal, H. A. Hilal, Ala Abu Hilal

{"title":"多级阿拉伯语和土耳其语文本压缩通过字符编码和7-Zip","authors":"Tariq Abu Hilal, H. A. Hilal, Ala Abu Hilal","doi":"10.5383/JUSPN.15.01.002","DOIUrl":null,"url":null,"abstract":"Turkish lossless text compression was proposed by converting the character’s from UTF-8 to ANSI system for space-preserving. Likewise, we present a decoding method that transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, some of the Turkish alphabets are being stored in 2 bytes size. All that space comes at a price. The developed sequential encoding technique will reduce the size of the text file up to 9%. Moreover, the Turkish encoded text will retain its original form after decoding. According to our proposal, it is considered as a lossless text compression, where it’s a common concern today. Thus, many parties have become interested in Unicode compression. Basically, our algorithm is mapping Unicode Turkish characters into ANSI, by using the available 8-bit legacy. For Arabic Text Compression, a sequential encoding technique was suggested that efficiently converts Arabic characters string from UTF-8 to ANSI characters coding. The encoding algorithm presented in this paper significantly reduces the file size. The decoding method transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, Arabic alphabets are currently being stored in 2 bytes size which leads to inefficient space utilization. The newly developed sequential encoding technique reduces the space required for storage up to fifty percent. In addition, the proposed technique will retain the Arabic encoded text to its original form after decoding, which is leading to a lossless text compression. Thus, addressing the common concern of the currently available Arabic characters compression techniques. In this research, a multistage compression process was implemented on Turkish and Arabic languages, by using the new encoding technique, in addition to the 7-Zip application, which has shown a significant file size reduction.","PeriodicalId":376249,"journal":{"name":"J. Ubiquitous Syst. Pervasive Networks","volume":"67 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multistage Arabic and Turkish Text Compression via Characters Encoding and 7-Zip\",\"authors\":\"Tariq Abu Hilal, H. A. Hilal, Ala Abu Hilal\",\"doi\":\"10.5383/JUSPN.15.01.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Turkish lossless text compression was proposed by converting the character’s from UTF-8 to ANSI system for space-preserving. Likewise, we present a decoding method that transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, some of the Turkish alphabets are being stored in 2 bytes size. All that space comes at a price. The developed sequential encoding technique will reduce the size of the text file up to 9%. Moreover, the Turkish encoded text will retain its original form after decoding. According to our proposal, it is considered as a lossless text compression, where it’s a common concern today. Thus, many parties have become interested in Unicode compression. Basically, our algorithm is mapping Unicode Turkish characters into ANSI, by using the available 8-bit legacy. For Arabic Text Compression, a sequential encoding technique was suggested that efficiently converts Arabic characters string from UTF-8 to ANSI characters coding. The encoding algorithm presented in this paper significantly reduces the file size. The decoding method transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, Arabic alphabets are currently being stored in 2 bytes size which leads to inefficient space utilization. The newly developed sequential encoding technique reduces the space required for storage up to fifty percent. In addition, the proposed technique will retain the Arabic encoded text to its original form after decoding, which is leading to a lossless text compression. Thus, addressing the common concern of the currently available Arabic characters compression techniques. In this research, a multistage compression process was implemented on Turkish and Arabic languages, by using the new encoding technique, in addition to the 7-Zip application, which has shown a significant file size reduction.\",\"PeriodicalId\":376249,\"journal\":{\"name\":\"J. Ubiquitous Syst. Pervasive Networks\",\"volume\":\"67 8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Ubiquitous Syst. Pervasive Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5383/JUSPN.15.01.002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Ubiquitous Syst. Pervasive Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5383/JUSPN.15.01.002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

提出了土耳其语无损文本压缩，将字符从UTF-8转换为保留空间的ANSI系统。同样，我们提出了一种解码方法，将编码的ANSI字符串转换回其原始格式。与单字节的ANSI字符不同，一些土耳其字母以2字节的大小存储。所有这些空间都是有代价的。所开发的顺序编码技术可以将文本文件的大小减少9%。此外，土耳其语编码文本在解码后将保留其原始形式。根据我们的建议，它被认为是一种无损文本压缩，这是当今普遍关注的问题。因此，很多人都对Unicode压缩感兴趣。基本上，我们的算法是通过使用可用的8位传统将Unicode土耳其字符映射到ANSI。对于阿拉伯文本压缩，提出了一种顺序编码技术，可以有效地将阿拉伯字符串从UTF-8转换为ANSI字符编码。本文提出的编码算法显著减小了文件大小。解码方法将编码的ANSI字符串转换回其原始格式。与单字节的ANSI字符不同，阿拉伯字母目前以2字节的大小存储，这导致了低效的空间利用。新开发的顺序编码技术将所需的存储空间减少了50%。此外，该技术将在解码后保留阿拉伯语编码文本的原始形式，从而实现无损文本压缩。因此，解决了目前可用的阿拉伯字符压缩技术中普遍存在的问题。在本研究中，除了7-Zip应用程序外，还使用新的编码技术对土耳其语和阿拉伯语实施了多阶段压缩过程，这显示了文件大小的显着减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multistage Arabic and Turkish Text Compression via Characters Encoding and 7-Zip

Turkish lossless text compression was proposed by converting the character’s from UTF-8 to ANSI system for space-preserving. Likewise, we present a decoding method that transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, some of the Turkish alphabets are being stored in 2 bytes size. All that space comes at a price. The developed sequential encoding technique will reduce the size of the text file up to 9%. Moreover, the Turkish encoded text will retain its original form after decoding. According to our proposal, it is considered as a lossless text compression, where it’s a common concern today. Thus, many parties have become interested in Unicode compression. Basically, our algorithm is mapping Unicode Turkish characters into ANSI, by using the available 8-bit legacy. For Arabic Text Compression, a sequential encoding technique was suggested that efficiently converts Arabic characters string from UTF-8 to ANSI characters coding. The encoding algorithm presented in this paper significantly reduces the file size. The decoding method transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, Arabic alphabets are currently being stored in 2 bytes size which leads to inefficient space utilization. The newly developed sequential encoding technique reduces the space required for storage up to fifty percent. In addition, the proposed technique will retain the Arabic encoded text to its original form after decoding, which is leading to a lossless text compression. Thus, addressing the common concern of the currently available Arabic characters compression techniques. In this research, a multistage compression process was implemented on Turkish and Arabic languages, by using the new encoding technique, in addition to the 7-Zip application, which has shown a significant file size reduction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Ubiquitous Syst. Pervasive Networks

自引率

0.00%

发文量