高效存储和检索非常大的文档数据库

F. Matsuo, S. Futamura, T. Shinohara
{"title":"高效存储和检索非常大的文档数据库","authors":"F. Matsuo, S. Futamura, T. Shinohara","doi":"10.1109/ICDE.1986.7266252","DOIUrl":null,"url":null,"abstract":"The authors have developed an information retrieval system named AIR (Augmented Information Retrieval system), which might be one of the most efficient systems for very large document databases. AIR can store the document data compactly and retrieve them quickly. The techniques bringing AIR to the high efficiency, the data compression, the quick keyword index, and the automatic keyword selection, are discussed. These techniques, which are based on the statistical properties of word occurrence, are fairly simple, so that the information retrieval systems employing them can be implemented with ease. The data compression technique reduces English text by a factor of 4. The quick keyword index decreases the average number of disk accesses to retrieve a keyword to about 0.3. The automatic keyword selection technique roughly halves both the number of different keywords and the size of the inverted file with only 2% loss of retrieval power.","PeriodicalId":415748,"journal":{"name":"1986 IEEE Second International Conference on Data Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1986-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Efficient storage and retrieval of very large document databases\",\"authors\":\"F. Matsuo, S. Futamura, T. Shinohara\",\"doi\":\"10.1109/ICDE.1986.7266252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The authors have developed an information retrieval system named AIR (Augmented Information Retrieval system), which might be one of the most efficient systems for very large document databases. AIR can store the document data compactly and retrieve them quickly. The techniques bringing AIR to the high efficiency, the data compression, the quick keyword index, and the automatic keyword selection, are discussed. These techniques, which are based on the statistical properties of word occurrence, are fairly simple, so that the information retrieval systems employing them can be implemented with ease. The data compression technique reduces English text by a factor of 4. The quick keyword index decreases the average number of disk accesses to retrieve a keyword to about 0.3. The automatic keyword selection technique roughly halves both the number of different keywords and the size of the inverted file with only 2% loss of retrieval power.\",\"PeriodicalId\":415748,\"journal\":{\"name\":\"1986 IEEE Second International Conference on Data Engineering\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1986-02-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"1986 IEEE Second International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.1986.7266252\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"1986 IEEE Second International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1986.7266252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

作者开发了一个名为AIR (Augmented information retrieval system,增强信息检索系统)的信息检索系统,它可能是处理超大型文档数据库最有效的系统之一。AIR可以紧凑地存储文档数据并快速检索它们。讨论了使AIR具有高效率、数据压缩、快速检索关键字和自动选择关键字的技术。这些基于单词出现的统计特性的技术相当简单,因此使用它们的信息检索系统可以很容易地实现。数据压缩技术将英文文本减少了1 / 4。快速关键字索引将检索关键字的平均磁盘访问次数减少到0.3左右。自动关键字选择技术大致将不同关键字的数量和倒排文件的大小减半,而检索能力仅损失2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Efficient storage and retrieval of very large document databases
The authors have developed an information retrieval system named AIR (Augmented Information Retrieval system), which might be one of the most efficient systems for very large document databases. AIR can store the document data compactly and retrieve them quickly. The techniques bringing AIR to the high efficiency, the data compression, the quick keyword index, and the automatic keyword selection, are discussed. These techniques, which are based on the statistical properties of word occurrence, are fairly simple, so that the information retrieval systems employing them can be implemented with ease. The data compression technique reduces English text by a factor of 4. The quick keyword index decreases the average number of disk accesses to retrieve a keyword to about 0.3. The automatic keyword selection technique roughly halves both the number of different keywords and the size of the inverted file with only 2% loss of retrieval power.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信