{"title":"基于Hellinger距离和sqrt-cos相似度的信息检索","authors":"Shunzhi Zhu, Lizhao Liu, Yan Wang","doi":"10.1109/ICCSE.2012.6295217","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a similarity measurement method based on the Hellinger distance and square-root cosine. Then use Hellinger distance as the distance metric for document clustering and a new square-root cosine similarity for query information retrieval. This new similarity/distance also bridges between traditional tf_idf weighting to binary weighting in vector space model. Finally, we conduct a comparison on performance between this method and the one based on Euclidean distance and cosine similarity. And from the results, we clearly observe that the precision and recall are improved by using the sqrt-cos similarity.","PeriodicalId":264063,"journal":{"name":"2012 7th International Conference on Computer Science & Education (ICCSE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Information retrieval using Hellinger distance and sqrt-cos similarity\",\"authors\":\"Shunzhi Zhu, Lizhao Liu, Yan Wang\",\"doi\":\"10.1109/ICCSE.2012.6295217\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a similarity measurement method based on the Hellinger distance and square-root cosine. Then use Hellinger distance as the distance metric for document clustering and a new square-root cosine similarity for query information retrieval. This new similarity/distance also bridges between traditional tf_idf weighting to binary weighting in vector space model. Finally, we conduct a comparison on performance between this method and the one based on Euclidean distance and cosine similarity. And from the results, we clearly observe that the precision and recall are improved by using the sqrt-cos similarity.\",\"PeriodicalId\":264063,\"journal\":{\"name\":\"2012 7th International Conference on Computer Science & Education (ICCSE)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 7th International Conference on Computer Science & Education (ICCSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSE.2012.6295217\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 7th International Conference on Computer Science & Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2012.6295217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Information retrieval using Hellinger distance and sqrt-cos similarity
In this paper, we propose a similarity measurement method based on the Hellinger distance and square-root cosine. Then use Hellinger distance as the distance metric for document clustering and a new square-root cosine similarity for query information retrieval. This new similarity/distance also bridges between traditional tf_idf weighting to binary weighting in vector space model. Finally, we conduct a comparison on performance between this method and the one based on Euclidean distance and cosine similarity. And from the results, we clearly observe that the precision and recall are improved by using the sqrt-cos similarity.