{"title":"分布式信息检索中重叠数据库结果合并","authors":"Shengli Wu, Jieyu Li","doi":"10.1109/PDP.2013.22","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the problem of results merging in distributed information retrieval when overlapping databases are used. We focus on two issues: score normalization and weights assignment for each of the component results. Empirical study with the TREC data has the following three findings: 1. The cubic regression model and logistic regression model are better than the commonly used zero-one score normalization method, 2. The weighting scheme of uneven similarity is an effective method of weights assignment. 3. Score normalization and weights assignment can be used separately or together in a results merging method to improve effectiveness. The findings obtained in this paper are very useful for effectiveness improvement when implementing a distributed information retrieval system.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"84 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Merging Results from Overlapping Databases in Distributed Information Retrieval\",\"authors\":\"Shengli Wu, Jieyu Li\",\"doi\":\"10.1109/PDP.2013.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate the problem of results merging in distributed information retrieval when overlapping databases are used. We focus on two issues: score normalization and weights assignment for each of the component results. Empirical study with the TREC data has the following three findings: 1. The cubic regression model and logistic regression model are better than the commonly used zero-one score normalization method, 2. The weighting scheme of uneven similarity is an effective method of weights assignment. 3. Score normalization and weights assignment can be used separately or together in a results merging method to improve effectiveness. The findings obtained in this paper are very useful for effectiveness improvement when implementing a distributed information retrieval system.\",\"PeriodicalId\":202977,\"journal\":{\"name\":\"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing\",\"volume\":\"84 12\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP.2013.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2013.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Merging Results from Overlapping Databases in Distributed Information Retrieval
In this paper, we investigate the problem of results merging in distributed information retrieval when overlapping databases are used. We focus on two issues: score normalization and weights assignment for each of the component results. Empirical study with the TREC data has the following three findings: 1. The cubic regression model and logistic regression model are better than the commonly used zero-one score normalization method, 2. The weighting scheme of uneven similarity is an effective method of weights assignment. 3. Score normalization and weights assignment can be used separately or together in a results merging method to improve effectiveness. The findings obtained in this paper are very useful for effectiveness improvement when implementing a distributed information retrieval system.