Mikael Kågebäck, Fredrik D. Johansson, Richard Johansson, Devdatt P. Dubhashi
{"title":"Neural context embeddings for automatic discovery of word senses","authors":"Mikael Kågebäck, Fredrik D. Johansson, Richard Johansson, Devdatt P. Dubhashi","doi":"10.3115/v1/W15-1504","DOIUrl":null,"url":null,"abstract":"Word sense induction (WSI) is the problem of \nautomatically building an inventory of senses \nfor a set of target words using only a text \ncorpus. We introduce a new method for embedding word instances and their context, for use in WSI. The method, Instance-context embedding (ICE), leverages neural word embeddings, and the correlation statistics they capture, to compute high quality embeddings of word contexts. In WSI, these context embeddings are clustered to find the word senses present in the text. ICE is based on a novel method for combining word embeddings using continuous Skip-gram, based on both se- \nmantic and a temporal aspects of context \nwords. ICE is evaluated both in a new system, and in an extension to a previous system \nfor WSI. In both cases, we surpass previous \nstate-of-the-art, on the WSI task of SemEval-2013, which highlights the generality of ICE. Our proposed system achieves a 33% relative improvement.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VS@HLT-NAACL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/v1/W15-1504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
Word sense induction (WSI) is the problem of
automatically building an inventory of senses
for a set of target words using only a text
corpus. We introduce a new method for embedding word instances and their context, for use in WSI. The method, Instance-context embedding (ICE), leverages neural word embeddings, and the correlation statistics they capture, to compute high quality embeddings of word contexts. In WSI, these context embeddings are clustered to find the word senses present in the text. ICE is based on a novel method for combining word embeddings using continuous Skip-gram, based on both se-
mantic and a temporal aspects of context
words. ICE is evaluated both in a new system, and in an extension to a previous system
for WSI. In both cases, we surpass previous
state-of-the-art, on the WSI task of SemEval-2013, which highlights the generality of ICE. Our proposed system achieves a 33% relative improvement.