A Novel Syntactic-Based Approach to Calculate Similarities Among Languages

Metin Bilgin
{"title":"A Novel Syntactic-Based Approach to Calculate Similarities Among Languages","authors":"Metin Bilgin","doi":"10.19113/sdufenbed.1168260","DOIUrl":null,"url":null,"abstract":"The approach presented in this study is about the calculation of the similarities among languages by using the new feature template to be obtained from the syntactic analysis phase. Studies were conducted on 6 different language sets from two different language families in order to show the calculability of similarity of languages with the help of the recommended new feature template. In the first study, triplet-pattern template which is obtained from the syntactic analysis of Turkey, Kazakh, and Uyghur Turkish languages from Turkic languages families belonging to the Ural-Altaic linguistic family, could be formed automatically through developed software, and also similarity analysis of the desired languages could be made thanks to a different module developed within the same software. Consequently, not only similar structural relations of the languages from the same language family but also structural differences among the languages can also be revealed. Even if the first aim is to determine the similarities among languages when developing an approach, the real aim of the desired structure is to form a system independent from the language. In order to show that the formed system has a structure independent from the language, another study was carried out. In the second study, the similarities among the languages were determined by using treebanks of English, Swedish and Norwegian from the Germen language family. When the language family is Turkic and the metrics are Jaccard, Dice, and Similarity Matching, the highest similarity is Turkish-Uyghur, and the values of the metrics are 25.21%, 40.27%, and 50.42%, respectively. When the language family is Germen, the highest similarity is Norwegian-Swedish, and the values of the metrics are 37.15%, 54.17%, and 74.3, respectively.","PeriodicalId":30858,"journal":{"name":"Suleyman Demirel Universitesi Fen Bilimleri Enstitusu Dergisi","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Suleyman Demirel Universitesi Fen Bilimleri Enstitusu Dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.19113/sdufenbed.1168260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The approach presented in this study is about the calculation of the similarities among languages by using the new feature template to be obtained from the syntactic analysis phase. Studies were conducted on 6 different language sets from two different language families in order to show the calculability of similarity of languages with the help of the recommended new feature template. In the first study, triplet-pattern template which is obtained from the syntactic analysis of Turkey, Kazakh, and Uyghur Turkish languages from Turkic languages families belonging to the Ural-Altaic linguistic family, could be formed automatically through developed software, and also similarity analysis of the desired languages could be made thanks to a different module developed within the same software. Consequently, not only similar structural relations of the languages from the same language family but also structural differences among the languages can also be revealed. Even if the first aim is to determine the similarities among languages when developing an approach, the real aim of the desired structure is to form a system independent from the language. In order to show that the formed system has a structure independent from the language, another study was carried out. In the second study, the similarities among the languages were determined by using treebanks of English, Swedish and Norwegian from the Germen language family. When the language family is Turkic and the metrics are Jaccard, Dice, and Similarity Matching, the highest similarity is Turkish-Uyghur, and the values of the metrics are 25.21%, 40.27%, and 50.42%, respectively. When the language family is Germen, the highest similarity is Norwegian-Swedish, and the values of the metrics are 37.15%, 54.17%, and 74.3, respectively.
一种新的基于句法的语言相似度计算方法
本文提出的方法是利用句法分析阶段获得的新的特征模板计算语言之间的相似度。为了证明在推荐的新特征模板的帮助下语言相似性的可计算性,我们对来自两个不同语系的6个不同的语言集进行了研究。在第一项研究中,通过开发软件自动生成了乌拉尔-阿尔泰语系突厥语族中土耳其语、哈萨克语和维吾尔语的三重模式模板,并通过在同一软件中开发不同的模块对所需语言进行相似性分析。由此可见,同一语系语言之间不仅存在着相似的结构关系,而且也存在着结构上的差异。即使在开发一种方法时,第一个目标是确定语言之间的相似性,所期望结构的真正目标是形成一个独立于语言的系统。为了证明所形成的系统具有独立于语言的结构,又进行了另一项研究。在第二项研究中,通过使用德语语系的英语、瑞典语和挪威语的树库来确定语言之间的相似性。当语族为突厥语,指标为Jaccard、Dice和Similarity Matching时,相似度最高的是突厥-维吾尔语,指标值分别为25.21%、40.27%和50.42%。当语族为德语时,挪威-瑞典语的相似性最高,其指标值分别为37.15%、54.17%和74.3。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
37
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信