Tommy Wijaya Sagala, Theresia Wati, Solikin, N. Budi, A. Hidayanto
{"title":"Analysis and Implementation Measurement of Semantic Similarity Using Content Management Information on WordNet","authors":"Tommy Wijaya Sagala, Theresia Wati, Solikin, N. Budi, A. Hidayanto","doi":"10.1109/ICACSIS.2018.8618181","DOIUrl":null,"url":null,"abstract":"In natural language processing (NLP), measuring semantic similarity plays an important role. The results of these measurements are often used as the basis for performing natural language processing tasks such as question answering, document classification, machine translation, and so on. This paper analyses the test results using the latest dataset on the implementation of content management utilization on WordNet in the form of taxonomy in measuring semantic similarity values. Further implementation results are compared with Gold Standard datasets for measured performance. The dataset used for testing is SimLex-999. In performance measurement, Pearson Correlation and Spearman Correlation are used. The use of these two correlations because each correlation has several advantages and disadvantages. Based on the test results, Seco Formula resulted in Pearson Correlation and Spearman Correlation of 0.583 and 0.582 respectively. While New Formula resulted in Pearson Correlation and Spearman Correlation respectively of 0.602 and 0.594. The correlation results show strong positive correlation relationship. Therefore, the method of information content in WordNet is feasible to be used to measure the value of semantic similarity.","PeriodicalId":207227,"journal":{"name":"2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS.2018.8618181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In natural language processing (NLP), measuring semantic similarity plays an important role. The results of these measurements are often used as the basis for performing natural language processing tasks such as question answering, document classification, machine translation, and so on. This paper analyses the test results using the latest dataset on the implementation of content management utilization on WordNet in the form of taxonomy in measuring semantic similarity values. Further implementation results are compared with Gold Standard datasets for measured performance. The dataset used for testing is SimLex-999. In performance measurement, Pearson Correlation and Spearman Correlation are used. The use of these two correlations because each correlation has several advantages and disadvantages. Based on the test results, Seco Formula resulted in Pearson Correlation and Spearman Correlation of 0.583 and 0.582 respectively. While New Formula resulted in Pearson Correlation and Spearman Correlation respectively of 0.602 and 0.594. The correlation results show strong positive correlation relationship. Therefore, the method of information content in WordNet is feasible to be used to measure the value of semantic similarity.