{"title":"Information retrieval using Hellinger distance and sqrt-cos similarity","authors":"Shunzhi Zhu, Lizhao Liu, Yan Wang","doi":"10.1109/ICCSE.2012.6295217","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a similarity measurement method based on the Hellinger distance and square-root cosine. Then use Hellinger distance as the distance metric for document clustering and a new square-root cosine similarity for query information retrieval. This new similarity/distance also bridges between traditional tf_idf weighting to binary weighting in vector space model. Finally, we conduct a comparison on performance between this method and the one based on Euclidean distance and cosine similarity. And from the results, we clearly observe that the precision and recall are improved by using the sqrt-cos similarity.","PeriodicalId":264063,"journal":{"name":"2012 7th International Conference on Computer Science & Education (ICCSE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 7th International Conference on Computer Science & Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2012.6295217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
In this paper, we propose a similarity measurement method based on the Hellinger distance and square-root cosine. Then use Hellinger distance as the distance metric for document clustering and a new square-root cosine similarity for query information retrieval. This new similarity/distance also bridges between traditional tf_idf weighting to binary weighting in vector space model. Finally, we conduct a comparison on performance between this method and the one based on Euclidean distance and cosine similarity. And from the results, we clearly observe that the precision and recall are improved by using the sqrt-cos similarity.