{"title":"Index structures for information filtering under the vector space model","authors":"T. Yan, H. Garcia-Molina","doi":"10.1109/ICDE.1994.283049","DOIUrl":null,"url":null,"abstract":"The authors study what data structures and algorithms can be used to efficiently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. They apply the idea of the standard inverted index to index user profiles. They devise an alternative to the standard inverted index, in which they, instead of indexing every term in a profile, select only the significant ones to index. They evaluate their performance and show that the indexing methods require orders of magnitude fewer I/Os to process a document than when no index is used. They also show that the proposed alternative performs better in terms of I/O and CPU processing time in many cases.<<ETX>>","PeriodicalId":142465,"journal":{"name":"Proceedings of 1994 IEEE 10th International Conference on Data Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"89","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 IEEE 10th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1994.283049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 89
Abstract
The authors study what data structures and algorithms can be used to efficiently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. They apply the idea of the standard inverted index to index user profiles. They devise an alternative to the standard inverted index, in which they, instead of indexing every term in a profile, select only the significant ones to index. They evaluate their performance and show that the indexing methods require orders of magnitude fewer I/Os to process a document than when no index is used. They also show that the proposed alternative performs better in terms of I/O and CPU processing time in many cases.<>