M. A. Hadj Taieb, Mohamed Ben Aouicha, M. Tmar, Abdelmajid Ben Hamadou
{"title":"New information content metric and nominalization relation for a new WordNet-based method to measure the semantic relatedness","authors":"M. A. Hadj Taieb, Mohamed Ben Aouicha, M. Tmar, Abdelmajid Ben Hamadou","doi":"10.1109/CIS.2011.6169134","DOIUrl":null,"url":null,"abstract":"Semantic similarity techniques are used to compute the semantic similarity (common shared information) between two concepts according to certain language or domain resources like ontologies, taxonomies, corpora, etc. Semantic similarity techniques constitute important components in most Information Retrieval (IR) and knowledge-based systems. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity measurements to carry out comparisons between concepts. This paper presents a new approach for measuring semantic relatedness between words and concepts. It combines a new information content (IC) metric using the WordNet thesaurus and the nominalization relation provided by the Java WordNet Library (JWNL). Specifically, the proposed method offers a thorough use of the relation hypernym/hyponym (noun and verb “is a” taxonomy) without external corpus statistical information. Mainly, we use the subgraph formed by hypernyms of the concerned concept which inherits the whole features of its hypernyms and we quantify the contribution of each concept pertaining to this subgraph in its information content. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value 0.70 with a benchmark based on human similarity judgments and especially a large dataset composed of 260 Finkelstein word pairs (Appendix 1 and 2).","PeriodicalId":286889,"journal":{"name":"2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS.2011.6169134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Semantic similarity techniques are used to compute the semantic similarity (common shared information) between two concepts according to certain language or domain resources like ontologies, taxonomies, corpora, etc. Semantic similarity techniques constitute important components in most Information Retrieval (IR) and knowledge-based systems. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity measurements to carry out comparisons between concepts. This paper presents a new approach for measuring semantic relatedness between words and concepts. It combines a new information content (IC) metric using the WordNet thesaurus and the nominalization relation provided by the Java WordNet Library (JWNL). Specifically, the proposed method offers a thorough use of the relation hypernym/hyponym (noun and verb “is a” taxonomy) without external corpus statistical information. Mainly, we use the subgraph formed by hypernyms of the concerned concept which inherits the whole features of its hypernyms and we quantify the contribution of each concept pertaining to this subgraph in its information content. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value 0.70 with a benchmark based on human similarity judgments and especially a large dataset composed of 260 Finkelstein word pairs (Appendix 1 and 2).