Linguistic features in Turkish word representations

Onur Güngör, Eray Yildiz
{"title":"Linguistic features in Turkish word representations","authors":"Onur Güngör, Eray Yildiz","doi":"10.1109/SIU.2017.7960223","DOIUrl":null,"url":null,"abstract":"Distributed word representations which are learned using unsupervised methods are employed in many Natural Language Processing (NLP) tasks. They have led to state-of-the-art results in many NLP tasks for many languages. There have been studies reporting that word representations include morphological and semantical information. There are also work that aim to propose word representations which handle the morphological and syntactical information better. However, studies that evaluate the quality of the word representations for morphologically rich languages like Turkish are limited. In this study, we aim to explore the syntactic and morphological information captured by the Turkish word representations which are learned using skip-gram method on a large corpus. To assess the quality of information found in relations between Turkish word embeddings, analogical reasoning task is performed using couples consisting of root words and their inflected or derivative forms. We contribute with detailed experiments and show that word embeddings trained with skip-gram method have differing capabilities in capturing information for inflection and derivation groups in Turkish. We make the test sets and word embeddings publicly available to other researchers for further research.","PeriodicalId":217576,"journal":{"name":"2017 25th Signal Processing and Communications Applications Conference (SIU)","volume":"322 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 25th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2017.7960223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Distributed word representations which are learned using unsupervised methods are employed in many Natural Language Processing (NLP) tasks. They have led to state-of-the-art results in many NLP tasks for many languages. There have been studies reporting that word representations include morphological and semantical information. There are also work that aim to propose word representations which handle the morphological and syntactical information better. However, studies that evaluate the quality of the word representations for morphologically rich languages like Turkish are limited. In this study, we aim to explore the syntactic and morphological information captured by the Turkish word representations which are learned using skip-gram method on a large corpus. To assess the quality of information found in relations between Turkish word embeddings, analogical reasoning task is performed using couples consisting of root words and their inflected or derivative forms. We contribute with detailed experiments and show that word embeddings trained with skip-gram method have differing capabilities in capturing information for inflection and derivation groups in Turkish. We make the test sets and word embeddings publicly available to other researchers for further research.
土耳其语词表示的语言特征
许多自然语言处理(NLP)任务都采用了使用无监督方法学习的分布式词表示。它们已经在许多语言的许多NLP任务中产生了最先进的结果。有研究报道,词表征包括形态和语义信息。也有一些工作旨在提出更好地处理形态和句法信息的词表示。然而,评估词形丰富的语言(如土耳其语)的单词表示质量的研究是有限的。在这项研究中,我们的目的是探索在一个大型语料库上使用跳格法学习的土耳其语单词表示所捕获的句法和形态信息。为了评估在土耳其语词嵌入之间的关系中发现的信息质量,使用由词根及其屈折或衍生形式组成的对进行类比推理任务。我们提供了详细的实验,并表明用skip-gram方法训练的词嵌入在捕获土耳其语屈变和派生组的信息方面具有不同的能力。我们将测试集和词嵌入公开提供给其他研究人员进行进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信