{"title":"Merging Results from Overlapping Databases in Distributed Information Retrieval","authors":"Shengli Wu, Jieyu Li","doi":"10.1109/PDP.2013.22","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the problem of results merging in distributed information retrieval when overlapping databases are used. We focus on two issues: score normalization and weights assignment for each of the component results. Empirical study with the TREC data has the following three findings: 1. The cubic regression model and logistic regression model are better than the commonly used zero-one score normalization method, 2. The weighting scheme of uneven similarity is an effective method of weights assignment. 3. Score normalization and weights assignment can be used separately or together in a results merging method to improve effectiveness. The findings obtained in this paper are very useful for effectiveness improvement when implementing a distributed information retrieval system.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"84 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2013.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In this paper, we investigate the problem of results merging in distributed information retrieval when overlapping databases are used. We focus on two issues: score normalization and weights assignment for each of the component results. Empirical study with the TREC data has the following three findings: 1. The cubic regression model and logistic regression model are better than the commonly used zero-one score normalization method, 2. The weighting scheme of uneven similarity is an effective method of weights assignment. 3. Score normalization and weights assignment can be used separately or together in a results merging method to improve effectiveness. The findings obtained in this paper are very useful for effectiveness improvement when implementing a distributed information retrieval system.