Using inverted files to compress text

S. Ristov
{"title":"Using inverted files to compress text","authors":"S. Ristov","doi":"10.1109/ITI.2002.1024714","DOIUrl":null,"url":null,"abstract":"This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed with original dictionary compression method that gives better compression results than existing procedures. Compression procedure is complex, but decompression time is linear with the file size, although it requires two passes and hence can not be performed online. First experiments show that this method, when refined, can be competitive for larger texts that only need to be decompressed in the real time.","PeriodicalId":420216,"journal":{"name":"ITI 2002. Proceedings of the 24th International Conference on Information Technology Interfaces (IEEE Cat. No.02EX534)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ITI 2002. Proceedings of the 24th International Conference on Information Technology Interfaces (IEEE Cat. No.02EX534)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITI.2002.1024714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed with original dictionary compression method that gives better compression results than existing procedures. Compression procedure is complex, but decompression time is linear with the file size, although it requires two passes and hence can not be performed online. First experiments show that this method, when refined, can be competitive for larger texts that only need to be decompressed in the real time.
使用反向文件压缩文本
这是关于文本压缩新方法的第一份报告。它包括用压缩的反向文件索引表示文本文件,并结合非常紧凑的词典,其中词典包括文本中的每个单词。索引使用标准索引压缩技术进行压缩,词典使用原始字典压缩方法进行压缩,该方法的压缩效果比现有过程更好。压缩过程是复杂的,但解压缩时间与文件大小是线性的,尽管它需要两个通道,因此不能在线执行。第一个实验表明,这种方法经过改进后,可以与只需要实时解压的大型文本相竞争。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信