On the Utility of Word Embeddings for Enriching OpenWordNet-PT

Hugo Gonçalo Oliveira, Fredson Silva de Souza Aguiar, Alexandre Rademaker
{"title":"On the Utility of Word Embeddings for Enriching OpenWordNet-PT","authors":"Hugo Gonçalo Oliveira, Fredson Silva de Souza Aguiar, Alexandre Rademaker","doi":"10.4230/OASIcs.LDK.2021.21","DOIUrl":null,"url":null,"abstract":"The maintenance of wordnets and lexical knwoledge bases typically relies on time-consuming manual effort. In order to minimise this issue, we propose the exploitation of models of distributional semantics, namely word embeddings learned from corpora, in the automatic identification of relation instances missing in a wordnet. Analogy-solving methods are first used for learning a set of relations from analogy tests focused on each relation. Despite their low accuracy, we noted that a portion of the top-given answers are good suggestions of relation instances that could be included in the wordnet. This procedure is applied to the enrichment of OpenWordNet-PT, a public Portuguese wordnet. Relations are learned from data acquired from this resource, and illustrative examples are provided. Results are promising for accelerating the identification of missing relation instances, as we estimate that about 17% of the potential suggestions are good, a proportion that almost doubles if some are automatically invalidated. 2012 ACM Subject Classification Computing methodologies → Lexical semantics; Computing methodologies → Language resources","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Language, Data, and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/OASIcs.LDK.2021.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The maintenance of wordnets and lexical knwoledge bases typically relies on time-consuming manual effort. In order to minimise this issue, we propose the exploitation of models of distributional semantics, namely word embeddings learned from corpora, in the automatic identification of relation instances missing in a wordnet. Analogy-solving methods are first used for learning a set of relations from analogy tests focused on each relation. Despite their low accuracy, we noted that a portion of the top-given answers are good suggestions of relation instances that could be included in the wordnet. This procedure is applied to the enrichment of OpenWordNet-PT, a public Portuguese wordnet. Relations are learned from data acquired from this resource, and illustrative examples are provided. Results are promising for accelerating the identification of missing relation instances, as we estimate that about 17% of the potential suggestions are good, a proportion that almost doubles if some are automatically invalidated. 2012 ACM Subject Classification Computing methodologies → Lexical semantics; Computing methodologies → Language resources
论词嵌入对OpenWordNet-PT的丰富作用
词网和词汇知识库的维护通常依赖于耗时的人工工作。为了最大限度地减少这个问题,我们提出利用分布语义模型,即从语料库中学习的词嵌入,来自动识别wordnet中缺失的关系实例。类比求解方法首先用于从侧重于每个关系的类比测试中学习一组关系。尽管它们的准确率很低,但我们注意到,部分排名前几位的答案是可以包含在wordnet中的关系实例的好建议。此程序适用于丰富的OpenWordNet-PT,一个公共葡萄牙语wordnet。从该资源获得的数据中学习关系,并提供了说明性示例。结果有望加速对缺失关系实例的识别,因为我们估计大约17%的潜在建议是好的,如果有些建议自动无效,这个比例几乎翻了一番。2012 ACM主题分类计算方法→词汇语义;计算方法→语言资源
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信