{"title":"Word Semantic Similarity Based on Document's Title","authors":"Mohamed Said Hamani, R. Maamri","doi":"10.1109/DEXA.2013.12","DOIUrl":null,"url":null,"abstract":"Measuring similarity between words using a search engine based on page counts alone is a challenging task. Search engines consider a document as a bag of words, ignoring the position of words in a document. In order to measure semantic similarity between two given words, this paper proposes a transformation function for web measures along with a new approach that exploits the document's title attribute and uses page counts alone returned by Web search engines. Experimental results on benchmark datasets show that the proposed approach outperforms snippets alone methods, achieving a correlation coefficient up to 71%.","PeriodicalId":428515,"journal":{"name":"2013 24th International Workshop on Database and Expert Systems Applications","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 24th International Workshop on Database and Expert Systems Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2013.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Measuring similarity between words using a search engine based on page counts alone is a challenging task. Search engines consider a document as a bag of words, ignoring the position of words in a document. In order to measure semantic similarity between two given words, this paper proposes a transformation function for web measures along with a new approach that exploits the document's title attribute and uses page counts alone returned by Web search engines. Experimental results on benchmark datasets show that the proposed approach outperforms snippets alone methods, achieving a correlation coefficient up to 71%.