基于扩散标签传播的词义消歧转导分类算法

2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA) Pub Date : 2019-07-03 DOI:10.1109/INISTA.2019.8778218

Gökhan Kocaman, Bilge Sipal, Aydın Gerek, B. Altinel, M. Ganiz

{"title":"基于扩散标签传播的词义消歧转导分类算法","authors":"Gökhan Kocaman, Bilge Sipal, Aydın Gerek, B. Altinel, M. Ganiz","doi":"10.1109/INISTA.2019.8778218","DOIUrl":null,"url":null,"abstract":"A major natural language processing problem, word sense disambiguation is the task of identifying the correct sense of a polysemous word based on its context. In terms of machine learning, this can be considered as a supervised classification problem. A better alternative can be the use of semi-supervised classifiers since labeled data is usually scarce yet we can access large quantities of unlabeled textual data. We propose an improvement to Label Propagation which is a well-known transductive classification algorithm for word sense disambiguation. Our approach make use of a semantic diffusion kernel. We name this new algorithm as diffused label propagation algorithm (DILP). We evaluate our proposed algorithm with experiments utilizing various sizes of training sets of disambiguated corpora. With these experiments we try to answer the following questions: 1. Does our algorithm with semantic kernel formulation yield higher classification performance than the popular kernels? 2. Under which conditions does a kernel design perform better than others? 3. What kind of regularization methods result with better performance? Our experiments demonstrate that our approach can outperform baseline in terms of accuracy in several conditions.","PeriodicalId":262143,"journal":{"name":"2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Diffused Label Propagation based Transductive Classification Algorithm for Word Sense Disambiguation\",\"authors\":\"Gökhan Kocaman, Bilge Sipal, Aydın Gerek, B. Altinel, M. Ganiz\",\"doi\":\"10.1109/INISTA.2019.8778218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A major natural language processing problem, word sense disambiguation is the task of identifying the correct sense of a polysemous word based on its context. In terms of machine learning, this can be considered as a supervised classification problem. A better alternative can be the use of semi-supervised classifiers since labeled data is usually scarce yet we can access large quantities of unlabeled textual data. We propose an improvement to Label Propagation which is a well-known transductive classification algorithm for word sense disambiguation. Our approach make use of a semantic diffusion kernel. We name this new algorithm as diffused label propagation algorithm (DILP). We evaluate our proposed algorithm with experiments utilizing various sizes of training sets of disambiguated corpora. With these experiments we try to answer the following questions: 1. Does our algorithm with semantic kernel formulation yield higher classification performance than the popular kernels? 2. Under which conditions does a kernel design perform better than others? 3. What kind of regularization methods result with better performance? Our experiments demonstrate that our approach can outperform baseline in terms of accuracy in several conditions.\",\"PeriodicalId\":262143,\"journal\":{\"name\":\"2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INISTA.2019.8778218\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INISTA.2019.8778218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

词义消歧是一个主要的自然语言处理问题，它是根据上下文识别多义词的正确词义的任务。在机器学习方面，这可以被认为是一个监督分类问题。一个更好的选择是使用半监督分类器，因为标记的数据通常是稀缺的，而我们可以访问大量未标记的文本数据。我们提出了一种改进的标签传播算法，这是一种众所周知的用于词义消歧的转导分类算法。我们的方法利用了语义扩散核。我们将这种新算法命名为扩散标签传播算法(DILP)。我们利用不同大小的消歧语料库训练集来评估我们提出的算法。通过这些实验，我们试图回答以下问题:我们使用语义核公式的算法是否比流行的核产生更高的分类性能?2. 在什么条件下，内核设计比其他设计表现得更好?3.什么样的正则化方法能带来更好的性能?我们的实验表明，在几种情况下，我们的方法在准确性方面优于基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Diffused Label Propagation based Transductive Classification Algorithm for Word Sense Disambiguation

A major natural language processing problem, word sense disambiguation is the task of identifying the correct sense of a polysemous word based on its context. In terms of machine learning, this can be considered as a supervised classification problem. A better alternative can be the use of semi-supervised classifiers since labeled data is usually scarce yet we can access large quantities of unlabeled textual data. We propose an improvement to Label Propagation which is a well-known transductive classification algorithm for word sense disambiguation. Our approach make use of a semantic diffusion kernel. We name this new algorithm as diffused label propagation algorithm (DILP). We evaluate our proposed algorithm with experiments utilizing various sizes of training sets of disambiguated corpora. With these experiments we try to answer the following questions: 1. Does our algorithm with semantic kernel formulation yield higher classification performance than the popular kernels? 2. Under which conditions does a kernel design perform better than others? 3. What kind of regularization methods result with better performance? Our experiments demonstrate that our approach can outperform baseline in terms of accuracy in several conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA)

自引率

0.00%

发文量