土耳其语词嵌入的内在评价

Hayri Volkan Agun, Ozgur Yilmazel
{"title":"土耳其语词嵌入的内在评价","authors":"Hayri Volkan Agun, Ozgur Yilmazel","doi":"10.1145/3440084.3441184","DOIUrl":null,"url":null,"abstract":"Word embeddings are evaluated through intrinsic and extrinsic tests. Similarity and analogy test are mainly preferred for intrinsic evaluation and natural language processing tasks such as named entity recognition and question answering are prefferred for extrinsic evaluation. Although there are various intrinsic evaluation datasets for English, the datasets for Turkish are very limited and measuring the degree of similarity and relatedness between words without specifying the type of semantic relation. In this paper, we propose an intrinsic evaluation dataset for evaluating different semantic relations other than a synonym, antonym, hypernym, and meronym as well as morphological relations of individual Turkish words. Moreover, we benchmark three publicly available word-embedding models on the proposed dataset and discuss agglutinative characteristics of the Turkish language for language modeling.","PeriodicalId":250100,"journal":{"name":"Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Intrinsic Evaluation of Word Embeddings for Turkish\",\"authors\":\"Hayri Volkan Agun, Ozgur Yilmazel\",\"doi\":\"10.1145/3440084.3441184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word embeddings are evaluated through intrinsic and extrinsic tests. Similarity and analogy test are mainly preferred for intrinsic evaluation and natural language processing tasks such as named entity recognition and question answering are prefferred for extrinsic evaluation. Although there are various intrinsic evaluation datasets for English, the datasets for Turkish are very limited and measuring the degree of similarity and relatedness between words without specifying the type of semantic relation. In this paper, we propose an intrinsic evaluation dataset for evaluating different semantic relations other than a synonym, antonym, hypernym, and meronym as well as morphological relations of individual Turkish words. Moreover, we benchmark three publicly available word-embedding models on the proposed dataset and discuss agglutinative characteristics of the Turkish language for language modeling.\",\"PeriodicalId\":250100,\"journal\":{\"name\":\"Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3440084.3441184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440084.3441184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

词嵌入通过内在和外在测试进行评估。内在评价以相似性和类比测试为主,外在评价以命名实体识别和问答等自然语言处理任务为主。虽然英语有各种各样的内在评价数据集,但土耳其语的数据集非常有限,并且在没有指定语义关系类型的情况下测量单词之间的相似度和相关性。在本文中,我们提出了一个内在评价数据集,用于评价除同义词、反义词、上义和反义外的不同语义关系以及单个土耳其语单词的形态关系。此外,我们在提出的数据集上对三个公开可用的词嵌入模型进行基准测试,并讨论了用于语言建模的土耳其语的粘合特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Intrinsic Evaluation of Word Embeddings for Turkish
Word embeddings are evaluated through intrinsic and extrinsic tests. Similarity and analogy test are mainly preferred for intrinsic evaluation and natural language processing tasks such as named entity recognition and question answering are prefferred for extrinsic evaluation. Although there are various intrinsic evaluation datasets for English, the datasets for Turkish are very limited and measuring the degree of similarity and relatedness between words without specifying the type of semantic relation. In this paper, we propose an intrinsic evaluation dataset for evaluating different semantic relations other than a synonym, antonym, hypernym, and meronym as well as morphological relations of individual Turkish words. Moreover, we benchmark three publicly available word-embedding models on the proposed dataset and discuss agglutinative characteristics of the Turkish language for language modeling.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信