基于字典和嵌入的词汇替换方法

Isaias Frederick Januario, Álvaro R. Pereira
{"title":"基于字典和嵌入的词汇替换方法","authors":"Isaias Frederick Januario, Álvaro R. Pereira","doi":"10.1145/3428658.3430982","DOIUrl":null,"url":null,"abstract":"Lexical Substitution has a noticeable evolution in the literature, mainly in the data sources used for the generation of substitutes that feed the process. Of course, dictionaries and thesauri are widely used for grouping synonyms in their structure, but the polysemy of words prevents a direct exchange of terms without analyzing the context. Vector space models, such as embeddings, are used to represent substitutes and also contexts. However, the representation of words considering only contextual factors, in many cases, may incur the approximation of terms that are not exactly synonyms. The characteristics mentioned above suggest that the simultaneous use of dictionaries and embeddings is a promising alternative for the process. Thus, we present a method using information contained in merged dictionaries, in addition to their linguistic relations structured in taxonomies. The method measures the preservation of the meaning of the sentence with the potential synonym by observing its frequency of application in small contexts. In addition, we also consider a complete context to generate input from vector operations highlighting the best synonyms in a previously selected set. The results show the efficiency of the method, surpassing many methods consolidated in the literature in the prediction of the best substitute for words contained in instances of a known benchmark.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MSL-DE: A Lexical Substitution Method based on Dictionaries and Embeddings\",\"authors\":\"Isaias Frederick Januario, Álvaro R. Pereira\",\"doi\":\"10.1145/3428658.3430982\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lexical Substitution has a noticeable evolution in the literature, mainly in the data sources used for the generation of substitutes that feed the process. Of course, dictionaries and thesauri are widely used for grouping synonyms in their structure, but the polysemy of words prevents a direct exchange of terms without analyzing the context. Vector space models, such as embeddings, are used to represent substitutes and also contexts. However, the representation of words considering only contextual factors, in many cases, may incur the approximation of terms that are not exactly synonyms. The characteristics mentioned above suggest that the simultaneous use of dictionaries and embeddings is a promising alternative for the process. Thus, we present a method using information contained in merged dictionaries, in addition to their linguistic relations structured in taxonomies. The method measures the preservation of the meaning of the sentence with the potential synonym by observing its frequency of application in small contexts. In addition, we also consider a complete context to generate input from vector operations highlighting the best synonyms in a previously selected set. The results show the efficiency of the method, surpassing many methods consolidated in the literature in the prediction of the best substitute for words contained in instances of a known benchmark.\",\"PeriodicalId\":350776,\"journal\":{\"name\":\"Proceedings of the Brazilian Symposium on Multimedia and the Web\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Brazilian Symposium on Multimedia and the Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3428658.3430982\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Brazilian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3428658.3430982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

词汇替代在文献中有一个明显的演变,主要是在用于生成代用品的数据源中。当然,字典和同义词典被广泛用于对其结构中的同义词进行分组,但是单词的多义性阻止了在不分析上下文的情况下直接交换术语。向量空间模型,如嵌入,用于表示替代品和上下文。然而,在许多情况下,仅考虑上下文因素的单词表示可能会导致不完全是同义词的术语的近似。上述特征表明,同时使用字典和嵌入是该过程的一个有希望的替代方案。因此,我们提出了一种使用合并字典中包含的信息的方法,以及它们在分类法中结构的语言关系。该方法通过观察潜在同义词在小语境中的使用频率来衡量其对句子意义的保留。此外,我们还考虑了一个完整的上下文,以从矢量操作中生成输入,突出显示先前选择的集合中的最佳同义词。结果表明了该方法的有效性,在预测已知基准实例中包含的单词的最佳替代方面超过了文献中整合的许多方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MSL-DE: A Lexical Substitution Method based on Dictionaries and Embeddings
Lexical Substitution has a noticeable evolution in the literature, mainly in the data sources used for the generation of substitutes that feed the process. Of course, dictionaries and thesauri are widely used for grouping synonyms in their structure, but the polysemy of words prevents a direct exchange of terms without analyzing the context. Vector space models, such as embeddings, are used to represent substitutes and also contexts. However, the representation of words considering only contextual factors, in many cases, may incur the approximation of terms that are not exactly synonyms. The characteristics mentioned above suggest that the simultaneous use of dictionaries and embeddings is a promising alternative for the process. Thus, we present a method using information contained in merged dictionaries, in addition to their linguistic relations structured in taxonomies. The method measures the preservation of the meaning of the sentence with the potential synonym by observing its frequency of application in small contexts. In addition, we also consider a complete context to generate input from vector operations highlighting the best synonyms in a previously selected set. The results show the efficiency of the method, surpassing many methods consolidated in the literature in the prediction of the best substitute for words contained in instances of a known benchmark.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信