句子嵌入对土耳其语释义检测的影响

2019 27th Signal Processing and Communications Applications Conference (SIU) Pub Date : 2019-04-24 DOI:10.1109/SIU.2019.8806506

B. Karaoglan, Hakki Engin Yorgancioglu, T. Kışla, S. K. Metin

{"title":"句子嵌入对土耳其语释义检测的影响","authors":"B. Karaoglan, Hakki Engin Yorgancioglu, T. Kışla, S. K. Metin","doi":"10.1109/SIU.2019.8806506","DOIUrl":null,"url":null,"abstract":"In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as “using average vector for word embeddings” (AWE), “concatenated vectors for word embeddings” (CWE) and “word mover's distance word embeddings” (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.","PeriodicalId":326275,"journal":{"name":"2019 27th Signal Processing and Communications Applications Conference (SIU)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Impact of Sentence Embeddings in Turkish Paraphrase Detection\",\"authors\":\"B. Karaoglan, Hakki Engin Yorgancioglu, T. Kışla, S. K. Metin\",\"doi\":\"10.1109/SIU.2019.8806506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as “using average vector for word embeddings” (AWE), “concatenated vectors for word embeddings” (CWE) and “word mover's distance word embeddings” (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.\",\"PeriodicalId\":326275,\"journal\":{\"name\":\"2019 27th Signal Processing and Communications Applications Conference (SIU)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 27th Signal Processing and Communications Applications Conference (SIU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIU.2019.8806506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 27th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2019.8806506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来的研究表明，词嵌入在一些自然语言处理(NLP)任务中可以实现。尽管传统的统计NLP方法对土耳其语的释义识别进行了很好的研究，但据我们所知，没有研究使用单词和/或句子嵌入。本文研究了“使用平均向量进行词嵌入”(AWE)、“连接向量进行词嵌入”(CWE)和“词动者距离词嵌入”(WMDWE)三种从词嵌入中构建句子嵌入的方法，并测量了它们在释义识别中的效果。比较了英语(MSRP)和土耳其语(PARDER和TuPC)意译语料库的翻译结果。本研究没有涉及词嵌入训练中使用的参数优化，也没有考虑到土耳其语特有的特征。尽管采用了这种幼稚的方法，但从PARDER语料库中获得的测试结果令人鼓舞，涉及此类改进的更详细的研究可能会产生更令人信服的性能值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Impact of Sentence Embeddings in Turkish Paraphrase Detection

In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as “using average vector for word embeddings” (AWE), “concatenated vectors for word embeddings” (CWE) and “word mover's distance word embeddings” (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 27th Signal Processing and Communications Applications Conference (SIU)

自引率

0.00%

发文量