Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value

Meshesha Legesse, G. Gianini, Dereje Teferi
{"title":"Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value","authors":"Meshesha Legesse, G. Gianini, Dereje Teferi","doi":"10.1109/SITIS.2016.45","DOIUrl":null,"url":null,"abstract":"In tag-word disambiguation, a word is assigned to a specific context chosen among the different ones to which it is related. Relatedness to a context is often defined based on the co-occurrence of the target word with other words (context words) in sentences of a specific corpus. The overall disambiguation process can be thought as a classification process, where the context words play the role of features for the target. A problem with this approach is that the large number of possible context words can reduce the classification performance, both in terms of computational effort and in terms of quality of the outcome. Feature selection can improve the process in both regards, by reducing the overall feature space to a manageable size with high information content. In this work we propose to use, in disambiguation, a feature selection approach based on the Shapley Value (SV) - a Coalitional Game Theory related metrics, measuring the importance of a component within a coalition. By including in the feature set only the words with the highest Shapley Value, we obtain remarkable quality and performance improvements. The problem of the exponential complexity in the exact SV computation is avoided by an approximate computation based on sampling. We demonstrate the effectiveness of this method and of the sampling approach results, by using both a synthetic language corpus and a real world linguistic corpus.","PeriodicalId":403704,"journal":{"name":"2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SITIS.2016.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In tag-word disambiguation, a word is assigned to a specific context chosen among the different ones to which it is related. Relatedness to a context is often defined based on the co-occurrence of the target word with other words (context words) in sentences of a specific corpus. The overall disambiguation process can be thought as a classification process, where the context words play the role of features for the target. A problem with this approach is that the large number of possible context words can reduce the classification performance, both in terms of computational effort and in terms of quality of the outcome. Feature selection can improve the process in both regards, by reducing the overall feature space to a manageable size with high information content. In this work we propose to use, in disambiguation, a feature selection approach based on the Shapley Value (SV) - a Coalitional Game Theory related metrics, measuring the importance of a component within a coalition. By including in the feature set only the words with the highest Shapley Value, we obtain remarkable quality and performance improvements. The problem of the exponential complexity in the exact SV computation is avoided by an approximate computation based on sampling. We demonstrate the effectiveness of this method and of the sampling approach results, by using both a synthetic language corpus and a real world linguistic corpus.
基于Shapley值的标记义消歧特征词选择
在标签词消歧中,一个词被分配到从与之相关的不同上下文中选择的特定上下文中。与上下文的相关性通常是根据目标词与特定语料库句子中的其他词(上下文词)的共现来定义的。整个消歧过程可以看作是一个分类过程,其中语境词对目标词起着特征作用。这种方法的一个问题是,大量可能的上下文词会降低分类性能,无论是在计算工作量方面还是在结果质量方面。特征选择可以通过将整体特征空间减小到具有高信息量的可管理大小来改善这两个方面的过程。在这项工作中,我们建议在消除歧义时使用基于Shapley值(SV)的特征选择方法-一种与联盟博弈论相关的度量,测量联盟中组件的重要性。通过在特征集中只包含Shapley值最高的单词,我们获得了显著的质量和性能改进。通过基于采样的近似计算,避免了精确SV计算中的指数复杂度问题。我们通过使用一个合成语料库和一个真实世界的语料库来证明这种方法和抽样方法结果的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信