{"title":"Concurrency and recovery in full-text indexing","authors":"E. Soisalon-Soininen, P. Widmayer","doi":"10.1109/SPIRE.1999.796595","DOIUrl":null,"url":null,"abstract":"An important feature of a document database system is that the documents can be retrieved by searching for words from their contents. In a full-text index, each word of the stored documents can be used as a search key. Inserting a new document into the database automatically triggers a transaction that inserts the words together with their occurrence information into the index. We present solutions to problems that arise when full-text indexing is applied for constantly changing document data, such as WWW pages. We present and analyze an algorithm for full-text indexing with the following properties: concurrent searches are possible and efficient, and the algorithm can be designed such that several indexing processes can be performed concurrently. Moreover, the algorithm allows efficient recovery of the index after failures that can occur while the index is modified. This is important for large indices, because when not prepared for failures, the index may need to be reconstructed from original documents.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIRE.1999.796595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
An important feature of a document database system is that the documents can be retrieved by searching for words from their contents. In a full-text index, each word of the stored documents can be used as a search key. Inserting a new document into the database automatically triggers a transaction that inserts the words together with their occurrence information into the index. We present solutions to problems that arise when full-text indexing is applied for constantly changing document data, such as WWW pages. We present and analyze an algorithm for full-text indexing with the following properties: concurrent searches are possible and efficient, and the algorithm can be designed such that several indexing processes can be performed concurrently. Moreover, the algorithm allows efficient recovery of the index after failures that can occur while the index is modified. This is important for large indices, because when not prepared for failures, the index may need to be reconstructed from original documents.