{"title":"长期相关反馈使用简单的PCA和线性变换","authors":"Xiaoying Tai, F. Ren, K. Kita","doi":"10.1109/DEXA.2002.1045909","DOIUrl":null,"url":null,"abstract":"This paper proposes a new method to improve information retrieval performance of the vector space model (VSM) in part by preserving user-supplied relevance information in the long term in the system. The proposed method incorporates user relevance feedback information and original document similarity information into the retrieval model that is built using a sequence of linear transformations. High-dimensional and sparse vectors are mapped into the a low-dimensional vector space, namely the space representing the latent semantic meanings of words, by using SPCA (simple principal component analysis). An experimental information retrieval system based on the proposed method has been built. Experiments on the Medline collection and Cranfield collection have been carried out. Improved average precision compared with the LSI (latent semantic indexing) model, are 6.80% (Medline) and 67.46% (Cranfield) for the two training data sets, and 4.71% (Medline) and 8.12% (Cranfield) for the test data, respectively. The results of our experiments show that the proposed method has better retrieval performance and provides an approach that makes it possible to preserve user-supplied relevance information in the long term in the system in order to use it later.","PeriodicalId":254550,"journal":{"name":"Proceedings. 13th International Workshop on Database and Expert Systems Applications","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Long-term relevance feedback using simple PCA and linear transformation\",\"authors\":\"Xiaoying Tai, F. Ren, K. Kita\",\"doi\":\"10.1109/DEXA.2002.1045909\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a new method to improve information retrieval performance of the vector space model (VSM) in part by preserving user-supplied relevance information in the long term in the system. The proposed method incorporates user relevance feedback information and original document similarity information into the retrieval model that is built using a sequence of linear transformations. High-dimensional and sparse vectors are mapped into the a low-dimensional vector space, namely the space representing the latent semantic meanings of words, by using SPCA (simple principal component analysis). An experimental information retrieval system based on the proposed method has been built. Experiments on the Medline collection and Cranfield collection have been carried out. Improved average precision compared with the LSI (latent semantic indexing) model, are 6.80% (Medline) and 67.46% (Cranfield) for the two training data sets, and 4.71% (Medline) and 8.12% (Cranfield) for the test data, respectively. The results of our experiments show that the proposed method has better retrieval performance and provides an approach that makes it possible to preserve user-supplied relevance information in the long term in the system in order to use it later.\",\"PeriodicalId\":254550,\"journal\":{\"name\":\"Proceedings. 13th International Workshop on Database and Expert Systems Applications\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 13th International Workshop on Database and Expert Systems Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEXA.2002.1045909\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 13th International Workshop on Database and Expert Systems Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2002.1045909","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Long-term relevance feedback using simple PCA and linear transformation
This paper proposes a new method to improve information retrieval performance of the vector space model (VSM) in part by preserving user-supplied relevance information in the long term in the system. The proposed method incorporates user relevance feedback information and original document similarity information into the retrieval model that is built using a sequence of linear transformations. High-dimensional and sparse vectors are mapped into the a low-dimensional vector space, namely the space representing the latent semantic meanings of words, by using SPCA (simple principal component analysis). An experimental information retrieval system based on the proposed method has been built. Experiments on the Medline collection and Cranfield collection have been carried out. Improved average precision compared with the LSI (latent semantic indexing) model, are 6.80% (Medline) and 67.46% (Cranfield) for the two training data sets, and 4.71% (Medline) and 8.12% (Cranfield) for the test data, respectively. The results of our experiments show that the proposed method has better retrieval performance and provides an approach that makes it possible to preserve user-supplied relevance information in the long term in the system in order to use it later.