Concurrency and recovery in full-text indexing

E. Soisalon-Soininen, P. Widmayer
{"title":"Concurrency and recovery in full-text indexing","authors":"E. Soisalon-Soininen, P. Widmayer","doi":"10.1109/SPIRE.1999.796595","DOIUrl":null,"url":null,"abstract":"An important feature of a document database system is that the documents can be retrieved by searching for words from their contents. In a full-text index, each word of the stored documents can be used as a search key. Inserting a new document into the database automatically triggers a transaction that inserts the words together with their occurrence information into the index. We present solutions to problems that arise when full-text indexing is applied for constantly changing document data, such as WWW pages. We present and analyze an algorithm for full-text indexing with the following properties: concurrent searches are possible and efficient, and the algorithm can be designed such that several indexing processes can be performed concurrently. Moreover, the algorithm allows efficient recovery of the index after failures that can occur while the index is modified. This is important for large indices, because when not prepared for failures, the index may need to be reconstructed from original documents.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIRE.1999.796595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

An important feature of a document database system is that the documents can be retrieved by searching for words from their contents. In a full-text index, each word of the stored documents can be used as a search key. Inserting a new document into the database automatically triggers a transaction that inserts the words together with their occurrence information into the index. We present solutions to problems that arise when full-text indexing is applied for constantly changing document data, such as WWW pages. We present and analyze an algorithm for full-text indexing with the following properties: concurrent searches are possible and efficient, and the algorithm can be designed such that several indexing processes can be performed concurrently. Moreover, the algorithm allows efficient recovery of the index after failures that can occur while the index is modified. This is important for large indices, because when not prepared for failures, the index may need to be reconstructed from original documents.
全文索引中的并发性和恢复
文档数据库系统的一个重要特性是可以通过从文档的内容中搜索单词来检索文档。在全文索引中,存储文档的每个单词都可以用作搜索关键字。将新文档插入数据库会自动触发一个事务,该事务将单词及其出现信息一起插入索引。我们提出了全文索引应用于不断变化的文档数据(如WWW页面)时出现的问题的解决方案。本文提出并分析了一种全文索引算法,该算法具有以下特性:并发搜索是可能的和高效的,并且该算法可以设计成多个索引过程可以并发执行。此外,该算法允许在修改索引时发生故障后有效地恢复索引。这对于大型索引很重要,因为如果没有为失败做好准备,则可能需要从原始文档重新构建索引。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信