PageRank based semantic similarity measure on a graph based Turkish WordNet

C. Tulu, Umut Orhan
{"title":"PageRank based semantic similarity measure on a graph based Turkish WordNet","authors":"C. Tulu, Umut Orhan","doi":"10.1109/UBMK.2017.8093438","DOIUrl":null,"url":null,"abstract":"Semantic similarity of texts is one of the important areas of Natural Language Processing, and there are several approaches to measure similarity: statistical, WordNet based, and hybrid. For all of these approaches, a lexical knowledge is used such as corpus or semantic network. WordNet is one of the most preferred and mature lexical knowledge base. In this study, we have focused on measuring semantic similarity of Turkish words with a graph based Turkish WordNet. In order to measure semantic similarities, a PageRank based application was chosen. For testing the success of the proposed system, RG65 standard similarity dataset was translated to Turkish and used as benchmark data. Similarity results of the translated RG65 dataset are computed using Turkish WordNet. Result of the computation shows ρ=0.543 correlation with human judgement. Taking into account that Turkish WordNet is very limited in term of number of words and there is no study in this area for Turkish language, it is considered that also the low success for this study is acceptable.","PeriodicalId":201903,"journal":{"name":"2017 International Conference on Computer Science and Engineering (UBMK)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK.2017.8093438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Semantic similarity of texts is one of the important areas of Natural Language Processing, and there are several approaches to measure similarity: statistical, WordNet based, and hybrid. For all of these approaches, a lexical knowledge is used such as corpus or semantic network. WordNet is one of the most preferred and mature lexical knowledge base. In this study, we have focused on measuring semantic similarity of Turkish words with a graph based Turkish WordNet. In order to measure semantic similarities, a PageRank based application was chosen. For testing the success of the proposed system, RG65 standard similarity dataset was translated to Turkish and used as benchmark data. Similarity results of the translated RG65 dataset are computed using Turkish WordNet. Result of the computation shows ρ=0.543 correlation with human judgement. Taking into account that Turkish WordNet is very limited in term of number of words and there is no study in this area for Turkish language, it is considered that also the low success for this study is acceptable.
基于PageRank的基于图的土耳其语WordNet语义相似度度量
文本的语义相似度是自然语言处理的重要领域之一,测量相似度的方法有几种:统计方法、基于WordNet的方法和混合方法。对于所有这些方法,使用词汇知识,如语料库或语义网络。WordNet是最受欢迎和最成熟的词汇知识库之一。在本研究中,我们主要利用基于图的土耳其语WordNet来测量土耳其语单词的语义相似度。为了测量语义相似度,选择了基于PageRank的应用程序。为了测试所提出系统的成功,将RG65标准相似度数据集翻译成土耳其语并用作基准数据。使用土耳其语WordNet计算翻译后的RG65数据集的相似度结果。计算结果表明,与人的判断相关系数ρ=0.543。考虑到土耳其语WordNet在单词数量方面非常有限,并且没有针对土耳其语的这一领域的研究,我们认为本研究的低成功率也是可以接受的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信