句子嵌入对土耳其语释义检测的影响

B. Karaoglan, Hakki Engin Yorgancioglu, T. Kışla, S. K. Metin
{"title":"句子嵌入对土耳其语释义检测的影响","authors":"B. Karaoglan, Hakki Engin Yorgancioglu, T. Kışla, S. K. Metin","doi":"10.1109/SIU.2019.8806506","DOIUrl":null,"url":null,"abstract":"In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as “using average vector for word embeddings” (AWE), “concatenated vectors for word embeddings” (CWE) and “word mover's distance word embeddings” (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.","PeriodicalId":326275,"journal":{"name":"2019 27th Signal Processing and Communications Applications Conference (SIU)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Impact of Sentence Embeddings in Turkish Paraphrase Detection\",\"authors\":\"B. Karaoglan, Hakki Engin Yorgancioglu, T. Kışla, S. K. Metin\",\"doi\":\"10.1109/SIU.2019.8806506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as “using average vector for word embeddings” (AWE), “concatenated vectors for word embeddings” (CWE) and “word mover's distance word embeddings” (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.\",\"PeriodicalId\":326275,\"journal\":{\"name\":\"2019 27th Signal Processing and Communications Applications Conference (SIU)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 27th Signal Processing and Communications Applications Conference (SIU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIU.2019.8806506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 27th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2019.8806506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来的研究表明,词嵌入在一些自然语言处理(NLP)任务中可以实现。尽管传统的统计NLP方法对土耳其语的释义识别进行了很好的研究,但据我们所知,没有研究使用单词和/或句子嵌入。本文研究了“使用平均向量进行词嵌入”(AWE)、“连接向量进行词嵌入”(CWE)和“词动者距离词嵌入”(WMDWE)三种从词嵌入中构建句子嵌入的方法,并测量了它们在释义识别中的效果。比较了英语(MSRP)和土耳其语(PARDER和TuPC)意译语料库的翻译结果。本研究没有涉及词嵌入训练中使用的参数优化,也没有考虑到土耳其语特有的特征。尽管采用了这种幼稚的方法,但从PARDER语料库中获得的测试结果令人鼓舞,涉及此类改进的更详细的研究可能会产生更令人信服的性能值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Impact of Sentence Embeddings in Turkish Paraphrase Detection
In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as “using average vector for word embeddings” (AWE), “concatenated vectors for word embeddings” (CWE) and “word mover's distance word embeddings” (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信