{"title":"Towards a collection-based results diversification","authors":"J. A. Akinyemi, C. Clarke, M. Kolla","doi":"10.5555/1937055.1937105","DOIUrl":null,"url":null,"abstract":"We present a method that introduces diversity into document retrieval using clusters of top-m terms obtained from the top-k retrieved documents through pseudo-relevance feedback. Terms from each cluster are used to automatically expand the original query. We evaluate the effectiveness of our method using a non-traditional effectiveness evaluation method, which directly measures the level of diversification by computing the cosine similarity between top-k retrieved documents based on (i) the original query and (ii) the expanded queries. Our results indicate that we can increase diversity without compromising retrieval quality.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RIAO Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/1937055.1937105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
We present a method that introduces diversity into document retrieval using clusters of top-m terms obtained from the top-k retrieved documents through pseudo-relevance feedback. Terms from each cluster are used to automatically expand the original query. We evaluate the effectiveness of our method using a non-traditional effectiveness evaluation method, which directly measures the level of diversification by computing the cosine similarity between top-k retrieved documents based on (i) the original query and (ii) the expanded queries. Our results indicate that we can increase diversity without compromising retrieval quality.