Disentangled Representation for Long-tail Senses of Word Sense Disambiguation

Junwei Zhang, Ruifang He, Fengyu Guo, Jinsong Ma, Mengnan Xiao
{"title":"Disentangled Representation for Long-tail Senses of Word Sense Disambiguation","authors":"Junwei Zhang, Ruifang He, Fengyu Guo, Jinsong Ma, Mengnan Xiao","doi":"10.1145/3511808.3557288","DOIUrl":null,"url":null,"abstract":"The long-tailed distribution, also called the heavy-tailed distribution, is common in nature. Since both words and their senses in natural language have long-tailed phenomenon in usage frequency, the Word Sense Disambiguation (WSD) task faces serious data imbalance. The existing learning strategies or data augmentation methods are difficult to deal with the lack of training samples caused by the single application scenario of long-tail senses, and the word sense representations caused by unique word sense definitions. Considering that the features extracted from the Disentangled Representation (DR) independently describe the essential properties of things, and DR does not require deep feature extraction and fusion processes, it alleviates the dependence of the representation learning on the training samples. We propose a novel DR by constraining the covariance matrix of a multivariate Gaussian distribution, which can enhance the strength of independence among features compared to β-VAE. The WSD model implemented by the reinforced DR outperforms the baselines on the English all-words WSD evaluation framework, the constructed long-tail word sense datasets, and the latest cross-lingual datasets.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511808.3557288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The long-tailed distribution, also called the heavy-tailed distribution, is common in nature. Since both words and their senses in natural language have long-tailed phenomenon in usage frequency, the Word Sense Disambiguation (WSD) task faces serious data imbalance. The existing learning strategies or data augmentation methods are difficult to deal with the lack of training samples caused by the single application scenario of long-tail senses, and the word sense representations caused by unique word sense definitions. Considering that the features extracted from the Disentangled Representation (DR) independently describe the essential properties of things, and DR does not require deep feature extraction and fusion processes, it alleviates the dependence of the representation learning on the training samples. We propose a novel DR by constraining the covariance matrix of a multivariate Gaussian distribution, which can enhance the strength of independence among features compared to β-VAE. The WSD model implemented by the reinforced DR outperforms the baselines on the English all-words WSD evaluation framework, the constructed long-tail word sense datasets, and the latest cross-lingual datasets.
词义消歧的长尾义解纠缠表示
长尾分布,又称重尾分布,在自然界中很常见。由于自然语言中的词和词义在使用频率上都存在长尾现象,因此词义消歧任务面临着严重的数据不平衡。现有的学习策略或数据增强方法难以处理长尾感官应用场景单一导致的训练样本不足,以及独特的词义定义导致的词义表示。由于从解纠缠表示(DR)中提取的特征独立地描述了事物的本质属性,并且DR不需要进行深度特征提取和融合过程,因此减轻了表征学习对训练样本的依赖。我们通过约束多元高斯分布的协方差矩阵提出了一种新的DR,与β-VAE相比,它可以增强特征之间的独立性。基于增强DR实现的WSD模型在英语全词WSD评估框架、构建的长尾词语义数据集和最新的跨语言数据集上均优于基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信