“If you can’t beat them, join them”: A Word Transformation based Generalized Skip-gram for Embedding Compound Words

Debasis Ganguly, Shripad Bhat, Chandan Biswas
{"title":"“If you can’t beat them, join them”: A Word Transformation based Generalized Skip-gram for Embedding Compound Words","authors":"Debasis Ganguly, Shripad Bhat, Chandan Biswas","doi":"10.1145/3574318.3574346","DOIUrl":null,"url":null,"abstract":"While a class of data-driven approaches has been shown to be effective in embedding words of languages that are relatively simple as per inflections and compounding characteristics (e.g. English), an open area of investigation is ways of integrating language-specific characteristics within the framework of an embedding model. Standard word embedding approaches, such as word2vec, Glove etc. embed each word into a high dimensional dense vector. However, these approaches may not adequately capture the inherent linguistic phenomenon namely that of word compounding. We propose a stochastic word transformation based generalization of the skip-gram algorithm, which seeks to potentially improve the representation of the compositional compound words by leveraging information from the contexts of their constituents. Our experiments show that addressing the compounding effect of a language as a part of the word embedding objective outperforms existing methods of compounding-specific post-transformation based approaches on word semantics prediction and word polarity prediction tasks.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3574318.3574346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

While a class of data-driven approaches has been shown to be effective in embedding words of languages that are relatively simple as per inflections and compounding characteristics (e.g. English), an open area of investigation is ways of integrating language-specific characteristics within the framework of an embedding model. Standard word embedding approaches, such as word2vec, Glove etc. embed each word into a high dimensional dense vector. However, these approaches may not adequately capture the inherent linguistic phenomenon namely that of word compounding. We propose a stochastic word transformation based generalization of the skip-gram algorithm, which seeks to potentially improve the representation of the compositional compound words by leveraging information from the contexts of their constituents. Our experiments show that addressing the compounding effect of a language as a part of the word embedding objective outperforms existing methods of compounding-specific post-transformation based approaches on word semantics prediction and word polarity prediction tasks.
“打不过他们,就加入他们”:一种基于词变换的复合词嵌入广义跳跃图
虽然一类数据驱动的方法已被证明在嵌入相对简单的屈折和复合特征(例如英语)的语言单词方面是有效的,但一个开放的研究领域是在嵌入模型的框架内集成语言特定特征的方法。标准的词嵌入方法,如word2vec, Glove等,将每个词嵌入到一个高维密集向量中。然而,这些方法可能没有充分捕捉到固有的语言现象,即词的合成。我们提出了一种基于skip-gram算法的随机词变换泛化,该算法试图通过利用其成分上下文的信息来潜在地改善组合复合词的表示。我们的实验表明,将语言的复合效应作为词嵌入目标的一部分,在词语义预测和词极性预测任务上优于现有的基于转换后的特定复合方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信